linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/22] AMD MCA interrupts rework
@ 2025-06-24 14:15 Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
                   ` (21 more replies)
  0 siblings, 22 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:15 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Hi all,

This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.

Based on feedback, this revision also include changes to the MCA init
flow.

Patches 1-8:
General fixes and cleanups.

Patches 9-14:
Add BSP-only init flow and related changes.

Patches 15-18:
Updates from v1 set.

Patches 19-21:
Interrupt storm handling rebased on current set.

Patch 22:
Add support to get threshold limit from APEI HEST.

Thanks,
Yazen

---
Changes in v4:
- Rebase on v6.16-rc3.
- Address comments from Boris about function names.
- Redo DFR handler integration.
- Drop AMD APIC LVT rework.
- Include more AMD thresholding reworks and fixes.
- Add support to get threshold limit from APEI HEST.
- Reorder patches so most fixes and reworks are at the beginning.
- Link to v3: https://lore.kernel.org/r/20250415-wip-mca-updates-v3-0-8ffd9eb4aa56@amd.com

Changes in v3:
- Rebased on tip/x86/merge rather than tip/master.
- Updated MSR access helpers (*msrl -> *msrq).
- Add patch to fix polling after a storm.
- Link to v2: https://lore.kernel.org/r/20250213-wip-mca-updates-v2-0-3636547fe05f@amd.com

Changes in v2:
- Add general cleanup pre-patches.
- Add changes for BSP-only init.
- Add interrupt storm handling for AMD.
- Link to v1: https://lore.kernel.org/r/20240523155641.2805411-1-yazen.ghannam@amd.com

---
Borislav Petkov (1):
      x86/mce: Cleanup bank processing on init

Smita Koralahalli (1):
      x86/mce: Handle AMD threshold interrupt storms

Yazen Ghannam (20):
      x86/mce: Don't remove sysfs if thresholding sysfs init fails
      x86/mce: Restore poll settings after storm subsides
      x86/mce/amd: Add default names for MCA banks and blocks
      x86/mce/amd: Fix threshold limit reset
      x86/mce/amd: Rename threshold restart function
      x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
      x86/mce/amd: Remove smca_banks_map
      x86/mce/amd: Put list_head in threshold_bank
      x86/mce: Remove __mcheck_cpu_init_early()
      x86/mce: Define BSP-only init
      x86/mce: Define BSP-only SMCA init
      x86/mce: Do 'UNKNOWN' vendor check early
      x86/mce: Separate global and per-CPU quirks
      x86/mce: Move machine_check_poll() status checks to helper functions
      x86/mce: Unify AMD THR handler with MCA Polling
      x86/mce: Unify AMD DFR handler with MCA Polling
      x86/mce/amd: Support SMCA Corrected Error Interrupt
      x86/mce/amd: Remove redundant reset_block()
      x86/mce/amd: Define threshold restart function for banks
      x86/mce: Save and use APEI corrected threshold limit

 arch/x86/include/asm/mce.h          |  23 ++-
 arch/x86/kernel/acpi/apei.c         |   2 +
 arch/x86/kernel/cpu/common.c        |   1 +
 arch/x86/kernel/cpu/mce/amd.c       | 366 ++++++++++++++----------------------
 arch/x86/kernel/cpu/mce/core.c      | 363 +++++++++++++++++------------------
 arch/x86/kernel/cpu/mce/intel.c     |  18 ++
 arch/x86/kernel/cpu/mce/internal.h  |  12 ++
 arch/x86/kernel/cpu/mce/threshold.c |  16 ++
 8 files changed, 379 insertions(+), 422 deletions(-)
---
base-commit: 86731a2a651e58953fc949573895f2fa6d456841
change-id: 20250210-wip-mca-updates-bed2a67c9c57


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
@ 2025-06-24 14:15 ` Yazen Ghannam
  2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:15 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Currently, the MCE subsystem sysfs interface will be removed if the
thresholding sysfs interface fails to be created. A common failure is
due to new MCA bank types that are not recognized and don't have a short
name set.

The MCA thresholding feature is optional and should not break the common
MCE sysfs interface. Also, new MCA bank types are occasionally
introduced, and updates will be needed to recognize them. But likewise,
this should not break the common sysfs interface.

Keep the MCE sysfs interface regardless of the status of the
thresholding sysfs interface.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-1-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Added tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.
    * Included stable tag but there's no specific commit for Fixes.

 arch/x86/kernel/cpu/mce/core.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e9b3c5d4a52e..07d61937427f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2801,15 +2801,9 @@ static int mce_cpu_dead(unsigned int cpu)
 static int mce_cpu_online(unsigned int cpu)
 {
 	struct timer_list *t = this_cpu_ptr(&mce_timer);
-	int ret;
 
 	mce_device_create(cpu);
-
-	ret = mce_threshold_create_device(cpu);
-	if (ret) {
-		mce_device_remove(cpu);
-		return ret;
-	}
+	mce_threshold_create_device(cpu);
 	mce_reenable_cpu();
 	mce_start_timer(t);
 	return 0;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
@ 2025-06-24 14:15 ` Yazen Ghannam
  2025-06-25 13:22   ` Nikolay Borisov
  2025-06-28 11:33   ` [tip: ras/urgent] x86/mce: Ensure user polling settings are honored when restarting timer tip-bot2 for Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks Yazen Ghannam
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:15 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.

If the user did not disable CMCI, then storms can occur. When these
happen, the MCE timer will be started with a fixed interval. After the
storm subsides, the timer's next interval is set to check_interval.

This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.

Create a new helper to check these conditions and use it when a CMCI
storm ends.

Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-17-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Update commit message.
    * Move to beginning of set.
    * Note: Polling vs thresholding use case updates not yet addressed.
    
    v2->v3:
    * New in v3.

 arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 07d61937427f..ae2e2d8ec99b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1740,6 +1740,11 @@ static void mc_poll_banks_default(void)
 
 void (*mc_poll_banks)(void) = mc_poll_banks_default;
 
+static bool should_enable_timer(unsigned long iv)
+{
+	return !mca_cfg.ignore_ce && iv;
+}
+
 static void mce_timer_fn(struct timer_list *t)
 {
 	struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
@@ -1763,7 +1768,7 @@ static void mce_timer_fn(struct timer_list *t)
 
 	if (mce_get_storm_mode()) {
 		__start_timer(t, HZ);
-	} else {
+	} else if (should_enable_timer(iv)) {
 		__this_cpu_write(mce_next_interval, iv);
 		__start_timer(t, iv);
 	}
@@ -2156,7 +2161,7 @@ static void mce_start_timer(struct timer_list *t)
 {
 	unsigned long iv = check_interval * HZ;
 
-	if (mca_cfg.ignore_ce || !iv)
+	if (!should_enable_timer(iv))
 		return;
 
 	this_cpu_write(mce_next_interval, iv);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
@ 2025-06-24 14:15 ` Yazen Ghannam
  2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
  2025-06-24 14:15 ` [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset Yazen Ghannam
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:15 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

...so sysfs init doesn't fail for new/unrecognized bank types or if a
bank has additional blocks available.

Most MCA banks have a single thresholding block, so the block takes the
same name as the bank.

Unified Memory Controllers (UMCs) are a special case where there are two
blocks and each has a unique name.

However, the microarchitecture allows for five blocks. Any new MCA bank
types with more than one block will be missing names for the extra
blocks. The MCE sysfs will fail to initialize in this case.

Fixes: 87a6d4091bd7 ("x86/mce/AMD: Update sysfs bank names for SMCA systems")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/kernel/cpu/mce/amd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9d852c3b2cb5..6820ebce5d46 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1113,13 +1113,20 @@ static const char *get_name(unsigned int cpu, unsigned int bank, struct threshol
 	}
 
 	bank_type = smca_get_bank_type(cpu, bank);
-	if (bank_type >= N_SMCA_BANK_TYPES)
-		return NULL;
 
 	if (b && (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2)) {
 		if (b->block < ARRAY_SIZE(smca_umc_block_names))
 			return smca_umc_block_names[b->block];
-		return NULL;
+	}
+
+	if (b && b->block) {
+		snprintf(buf_mcatype, MAX_MCATYPE_NAME_LEN, "th_block_%u", b->block);
+		return buf_mcatype;
+	}
+
+	if (bank_type >= N_SMCA_BANK_TYPES) {
+		snprintf(buf_mcatype, MAX_MCATYPE_NAME_LEN, "th_bank_%u", bank);
+		return buf_mcatype;
 	}
 
 	if (per_cpu(smca_bank_counts, cpu)[bank_type] == 1)

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (2 preceding siblings ...)
  2025-06-24 14:15 ` [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks Yazen Ghannam
@ 2025-06-24 14:15 ` Yazen Ghannam
  2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 05/22] x86/mce/amd: Rename threshold restart function Yazen Ghannam
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:15 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The MCA threshold limit must be reset after servicing the interrupt.

Currently, the restart function doesn't have an explicit check for this.
It makes some assumptions based on the current limit and what's in the
registers. These assumptions don't always hold, so the limit won't be
reset in some cases.

Make the reset condition explicit. Either an interrupt/overflow has
occurred. Or the bank is being initialized.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/kernel/cpu/mce/amd.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6820ebce5d46..5c4eb28c3ac9 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -350,7 +350,6 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 
 struct thresh_restart {
 	struct threshold_block	*b;
-	int			reset;
 	int			set_lvt_off;
 	int			lvt_off;
 	u16			old_limit;
@@ -432,13 +431,13 @@ static void threshold_restart_bank(void *_tr)
 
 	rdmsr(tr->b->address, lo, hi);
 
-	if (tr->b->threshold_limit < (hi & THRESHOLD_MAX))
-		tr->reset = 1;	/* limit cannot be lower than err count */
-
-	if (tr->reset) {		/* reset err count and overflow bit */
-		hi =
-		    (hi & ~(MASK_ERR_COUNT_HI | MASK_OVERFLOW_HI)) |
-		    (THRESHOLD_MAX - tr->b->threshold_limit);
+	/*
+	 * Reset error count and overflow bit.
+	 * This is done during init or after handling an interrupt.
+	 */
+	if (hi & MASK_OVERFLOW_HI || tr->set_lvt_off) {
+		hi &= ~(MASK_ERR_COUNT_HI | MASK_OVERFLOW_HI);
+		hi |= THRESHOLD_MAX - tr->b->threshold_limit;
 	} else if (tr->old_limit) {	/* change limit w/o reset */
 		int new_count = (hi & THRESHOLD_MAX) +
 		    (tr->old_limit - tr->b->threshold_limit);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 05/22] x86/mce/amd: Rename threshold restart function
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (3 preceding siblings ...)
  2025-06-24 14:15 ` [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() Yazen Ghannam
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

It operates per block rather than per bank. So rename it for clarity.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/kernel/cpu/mce/amd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5c4eb28c3ac9..9b980aecb6b3 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -419,8 +419,8 @@ static bool lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
 	return true;
 };
 
-/* Reprogram MCx_MISC MSR behind this threshold bank. */
-static void threshold_restart_bank(void *_tr)
+/* Reprogram MCx_MISC MSR behind this threshold block. */
+static void threshold_restart_block(void *_tr)
 {
 	struct thresh_restart *tr = _tr;
 	u32 hi, lo;
@@ -478,7 +478,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
 	};
 
 	b->threshold_limit		= THRESHOLD_MAX;
-	threshold_restart_bank(&tr);
+	threshold_restart_block(&tr);
 };
 
 static int setup_APIC_mce_threshold(int reserved, int new)
@@ -921,7 +921,7 @@ static void log_and_reset_block(struct threshold_block *block)
 	/* Reset threshold block after logging error. */
 	memset(&tr, 0, sizeof(tr));
 	tr.b = block;
-	threshold_restart_bank(&tr);
+	threshold_restart_block(&tr);
 }
 
 /*
@@ -995,7 +995,7 @@ store_interrupt_enable(struct threshold_block *b, const char *buf, size_t size)
 	memset(&tr, 0, sizeof(tr));
 	tr.b		= b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call_function_single(b->cpu, threshold_restart_block, &tr, 1))
 		return -ENODEV;
 
 	return size;
@@ -1020,7 +1020,7 @@ store_threshold_limit(struct threshold_block *b, const char *buf, size_t size)
 	b->threshold_limit = new;
 	tr.b = b;
 
-	if (smp_call_function_single(b->cpu, threshold_restart_bank, &tr, 1))
+	if (smp_call_function_single(b->cpu, threshold_restart_block, &tr, 1))
 		return -ENODEV;
 
 	return size;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (4 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 05/22] x86/mce/amd: Rename threshold restart function Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-25 14:57   ` Nikolay Borisov
  2025-06-24 14:16 ` [PATCH v4 07/22] x86/mce/amd: Remove smca_banks_map Yazen Ghannam
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The return values are not checked, so set return type to 'void'.

Also, move function declarations to internal.h, since these functions are
only used within the MCE subsystem.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-2-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Include mce_threshold_remove_device().
    
    v1->v2:
    * New in v2.

 arch/x86/include/asm/mce.h         |  6 ------
 arch/x86/kernel/cpu/mce/amd.c      | 22 ++++++++++------------
 arch/x86/kernel/cpu/mce/internal.h |  4 ++++
 3 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..752802bf966b 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -371,15 +371,9 @@ enum smca_bank_types {
 
 extern bool amd_mce_is_memory_error(struct mce *m);
 
-extern int mce_threshold_create_device(unsigned int cpu);
-extern int mce_threshold_remove_device(unsigned int cpu);
-
 void mce_amd_feature_init(struct cpuinfo_x86 *c);
 enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
 #else
-
-static inline int mce_threshold_create_device(unsigned int cpu)		{ return 0; };
-static inline int mce_threshold_remove_device(unsigned int cpu)		{ return 0; };
 static inline bool amd_mce_is_memory_error(struct mce *m)		{ return false; };
 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)		{ }
 #endif
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9b980aecb6b3..f429451cafc8 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1296,12 +1296,12 @@ static void __threshold_remove_device(struct threshold_bank **bp)
 	kfree(bp);
 }
 
-int mce_threshold_remove_device(unsigned int cpu)
+void mce_threshold_remove_device(unsigned int cpu)
 {
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
 
 	if (!bp)
-		return 0;
+		return;
 
 	/*
 	 * Clear the pointer before cleaning up, so that the interrupt won't
@@ -1310,7 +1310,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 	this_cpu_write(threshold_banks, NULL);
 
 	__threshold_remove_device(bp);
-	return 0;
+	return;
 }
 
 /**
@@ -1324,36 +1324,34 @@ int mce_threshold_remove_device(unsigned int cpu)
  * thread running on @cpu.  The callback is invoked on all CPUs which are
  * online when the callback is installed or during a real hotplug event.
  */
-int mce_threshold_create_device(unsigned int cpu)
+void mce_threshold_create_device(unsigned int cpu)
 {
 	unsigned int numbanks, bank;
 	struct threshold_bank **bp;
-	int err;
 
 	if (!mce_flags.amd_threshold)
-		return 0;
+		return;
 
 	bp = this_cpu_read(threshold_banks);
 	if (bp)
-		return 0;
+		return;
 
 	numbanks = this_cpu_read(mce_num_banks);
 	bp = kcalloc(numbanks, sizeof(*bp), GFP_KERNEL);
 	if (!bp)
-		return -ENOMEM;
+		return;
 
 	for (bank = 0; bank < numbanks; ++bank) {
 		if (!(this_cpu_read(bank_map) & BIT_ULL(bank)))
 			continue;
-		err = threshold_create_bank(bp, cpu, bank);
-		if (err) {
+		if (threshold_create_bank(bp, cpu, bank)) {
 			__threshold_remove_device(bp);
-			return err;
+			return;
 		}
 	}
 	this_cpu_write(threshold_banks, bp);
 
 	if (thresholding_irq_en)
 		mce_threshold_vector = amd_threshold_interrupt;
-	return 0;
+	return;
 }
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b5ba598e54cb..64ac25b95360 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -265,6 +265,8 @@ void mce_prep_record_common(struct mce *m);
 void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m);
 
 #ifdef CONFIG_X86_MCE_AMD
+void mce_threshold_create_device(unsigned int cpu);
+void mce_threshold_remove_device(unsigned int cpu);
 extern bool amd_filter_mce(struct mce *m);
 bool amd_mce_usable_address(struct mce *m);
 
@@ -293,6 +295,8 @@ static __always_inline void smca_extract_err_addr(struct mce *m)
 }
 
 #else
+static inline void mce_threshold_create_device(unsigned int cpu)	{ }
+static inline void mce_threshold_remove_device(unsigned int cpu)	{ }
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
 static inline void smca_extract_err_addr(struct mce *m) { }

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 07/22] x86/mce/amd: Remove smca_banks_map
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (5 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank Yazen Ghannam
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The MCx_MISC0[BlkPtr] field was used on legacy systems to hold a
register offset for the next MCx_MISC* register. In this way, an
implementation-specific number of registers can be discovered at
runtime.

The MCAX/SMCA register space simplifies this by always including
the MCx_MISC[1-4] registers. The MCx_MISC0[BlkPtr] field is used to
indicate (true/false) whether any MCx_MISC[1-4] registers are present.
But it indicates neither which ones nor how many. Therefore, all the
registers are accessed and their bits are checked.

AMD systems generally enforce a Read-as-Zero/Writes-Ignored policy for
unused registers. Therefore, there is no harm to read an unused
register. This is already done in practice for most of the MCx_MISC
registers.

Remove the smca_banks_map variable as it is effectively redundant.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-3-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Minor edit in commit message.
    * Added tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.

 arch/x86/kernel/cpu/mce/amd.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index f429451cafc8..0ffbee329a8c 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -252,9 +252,6 @@ static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
  */
 static DEFINE_PER_CPU(u64, bank_map);
 
-/* Map of banks that have more than MCA_MISC0 available. */
-static DEFINE_PER_CPU(u64, smca_misc_banks_map);
-
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);
 
@@ -264,28 +261,6 @@ static void default_deferred_error_interrupt(void)
 }
 void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
-static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
-{
-	u32 low, high;
-
-	/*
-	 * For SMCA enabled processors, BLKPTR field of the first MISC register
-	 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
-	 */
-	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
-		return;
-
-	if (!(low & MCI_CONFIG_MCAX))
-		return;
-
-	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high))
-		return;
-
-	if (low & MASK_BLKPTR_LO)
-		per_cpu(smca_misc_banks_map, cpu) |= BIT_ULL(bank);
-
-}
-
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
 	u8 *bank_counts = this_cpu_ptr(smca_bank_counts);
@@ -326,8 +301,6 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 		wrmsr(smca_config, low, high);
 	}
 
-	smca_set_misc_banks_map(bank, cpu);
-
 	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {
 		pr_warn("Failed to read MCA_IPID for bank %d\n", bank);
 		return;
@@ -531,9 +504,6 @@ static u32 smca_get_block_address(unsigned int bank, unsigned int block,
 	if (!block)
 		return MSR_AMD64_SMCA_MCx_MISC(bank);
 
-	if (!(per_cpu(smca_misc_banks_map, cpu) & BIT_ULL(bank)))
-		return 0;
-
 	return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 }
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (6 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 07/22] x86/mce/amd: Remove smca_banks_map Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-25 16:52   ` Nikolay Borisov
  2025-06-24 14:16 ` [PATCH v4 09/22] x86/mce: Cleanup bank processing on init Yazen Ghannam
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The threshold_bank structure is a container for one or more
threshold_block structures. Currently, the container has a single
pointer to the 'first' threshold_block structure which then has a linked
list of the remaining threshold_block structures.

This results in an extra level of indirection where the 'first' block is
checked before iterating over the remaining blocks.

Remove the indirection by including the head of the block list in the
threshold_bank structure which already acts as a container for all the
bank's thresholding blocks.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-4-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Added tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.

 arch/x86/kernel/cpu/mce/amd.c | 43 ++++++++++++-------------------------------
 1 file changed, 12 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 0ffbee329a8c..5d351ec863cd 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -241,7 +241,8 @@ struct threshold_block {
 
 struct threshold_bank {
 	struct kobject		*kobj;
-	struct threshold_block	*blocks;
+	/* List of threshold blocks within this MCA bank. */
+	struct list_head	miscj;
 };
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
@@ -900,9 +901,9 @@ static void log_and_reset_block(struct threshold_block *block)
  */
 static void amd_threshold_interrupt(void)
 {
-	struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
-	struct threshold_bank **bp = this_cpu_read(threshold_banks);
+	struct threshold_bank **bp = this_cpu_read(threshold_banks), *thr_bank;
 	unsigned int bank, cpu = smp_processor_id();
+	struct threshold_block *block, *tmp;
 
 	/*
 	 * Validate that the threshold bank has been initialized already. The
@@ -916,16 +917,11 @@ static void amd_threshold_interrupt(void)
 		if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
 			continue;
 
-		first_block = bp[bank]->blocks;
-		if (!first_block)
+		thr_bank = bp[bank];
+		if (!thr_bank)
 			continue;
 
-		/*
-		 * The first block is also the head of the list. Check it first
-		 * before iterating over the rest.
-		 */
-		log_and_reset_block(first_block);
-		list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
+		list_for_each_entry_safe(block, tmp, &thr_bank->miscj, miscj)
 			log_and_reset_block(block);
 	}
 }
@@ -1151,13 +1147,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
 		default_attrs[2] = NULL;
 	}
 
-	INIT_LIST_HEAD(&b->miscj);
-
-	/* This is safe as @tb is not visible yet */
-	if (tb->blocks)
-		list_add(&b->miscj, &tb->blocks->miscj);
-	else
-		tb->blocks = b;
+	list_add(&b->miscj, &tb->miscj);
 
 	err = kobject_init_and_add(&b->kobj, &threshold_ktype, tb->kobj, get_name(cpu, bank, b));
 	if (err)
@@ -1208,6 +1198,8 @@ static int threshold_create_bank(struct threshold_bank **bp, unsigned int cpu,
 		goto out_free;
 	}
 
+	INIT_LIST_HEAD(&b->miscj);
+
 	err = allocate_threshold_blocks(cpu, b, bank, 0, mca_msr_reg(bank, MCA_MISC));
 	if (err)
 		goto out_kobj;
@@ -1228,26 +1220,15 @@ static void threshold_block_release(struct kobject *kobj)
 	kfree(to_block(kobj));
 }
 
-static void deallocate_threshold_blocks(struct threshold_bank *bank)
+static void threshold_remove_bank(struct threshold_bank *bank)
 {
 	struct threshold_block *pos, *tmp;
 
-	list_for_each_entry_safe(pos, tmp, &bank->blocks->miscj, miscj) {
+	list_for_each_entry_safe(pos, tmp, &bank->miscj, miscj) {
 		list_del(&pos->miscj);
 		kobject_put(&pos->kobj);
 	}
 
-	kobject_put(&bank->blocks->kobj);
-}
-
-static void threshold_remove_bank(struct threshold_bank *bank)
-{
-	if (!bank->blocks)
-		goto out_free;
-
-	deallocate_threshold_blocks(bank);
-
-out_free:
 	kobject_put(bank->kobj);
 	kfree(bank);
 }

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 09/22] x86/mce: Cleanup bank processing on init
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (7 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

From: Borislav Petkov <bp@suse.de>

Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename
that function to what it does now - prepares banks. Do this so that
generic and vendor banks init goes first so that settings done during
that init can take effect before the first bank polling takes place.

Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks()
as it already loops over the banks.

The MCP_DONTLOG flag is no longer needed, since the MCA polling function
is now called only if boot-time logging should be done.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-5-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Update commit message.
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2, but based on old patch (see link).
    * Kept old tags for reference.

 arch/x86/include/asm/mce.h     |  3 +-
 arch/x86/kernel/cpu/mce/core.c | 63 ++++++++++++------------------------------
 2 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 752802bf966b..3224f3862dc8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -290,8 +290,7 @@ DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
 enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
-	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
-	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
+	MCP_QUEUE_LOG	= BIT(2),	/* only queue to genpool */
 };
 
 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ae2e2d8ec99b..486cddefaa7a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -807,9 +807,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		continue;
 
 log_it:
-		if (flags & MCP_DONTLOG)
-			goto clear_it;
-
 		mce_read_aux(&err, i);
 		m->severity = mce_severity(m, NULL, NULL, false);
 		/*
@@ -1812,7 +1809,7 @@ static void __mcheck_cpu_mce_banks_init(void)
 		/*
 		 * Init them all, __mcheck_cpu_apply_quirks() is going to apply
 		 * the required vendor quirks before
-		 * __mcheck_cpu_init_clear_banks() does the final bank setup.
+		 * __mcheck_cpu_init_prepare_banks() does the final bank setup.
 		 */
 		b->ctl = -1ULL;
 		b->init = true;
@@ -1851,21 +1848,8 @@ static void __mcheck_cpu_cap_init(void)
 
 static void __mcheck_cpu_init_generic(void)
 {
-	enum mcp_flags m_fl = 0;
-	mce_banks_t all_banks;
 	u64 cap;
 
-	if (!mca_cfg.bootlog)
-		m_fl = MCP_DONTLOG;
-
-	/*
-	 * Log the machine checks left over from the previous reset. Log them
-	 * only, do not start processing them. That will happen in mcheck_late_init()
-	 * when all consumers have been registered on the notifier chain.
-	 */
-	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
-
 	cr4_set_bits(X86_CR4_MCE);
 
 	rdmsrq(MSR_IA32_MCG_CAP, cap);
@@ -1873,36 +1857,23 @@ static void __mcheck_cpu_init_generic(void)
 		wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
 }
 
-static void __mcheck_cpu_init_clear_banks(void)
+static void __mcheck_cpu_init_prepare_banks(void)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+	u64 msrval;
 	int i;
 
-	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
-		struct mce_bank *b = &mce_banks[i];
+	/*
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
+	 */
+	if (mca_cfg.bootlog) {
+		mce_banks_t all_banks;
 
-		if (!b->init)
-			continue;
-		wrmsrq(mca_msr_reg(i, MCA_CTL), b->ctl);
-		wrmsrq(mca_msr_reg(i, MCA_STATUS), 0);
+		bitmap_fill(all_banks, MAX_NR_BANKS);
+		machine_check_poll(MCP_UC | MCP_QUEUE_LOG, &all_banks);
 	}
-}
-
-/*
- * Do a final check to see if there are any unused/RAZ banks.
- *
- * This must be done after the banks have been initialized and any quirks have
- * been applied.
- *
- * Do not call this from any user-initiated flows, e.g. CPU hotplug or sysfs.
- * Otherwise, a user who disables a bank will not be able to re-enable it
- * without a system reboot.
- */
-static void __mcheck_cpu_check_banks(void)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	u64 msrval;
-	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		struct mce_bank *b = &mce_banks[i];
@@ -1910,6 +1881,9 @@ static void __mcheck_cpu_check_banks(void)
 		if (!b->init)
 			continue;
 
+		wrmsrq(mca_msr_reg(i, MCA_CTL), b->ctl);
+		wrmsrq(mca_msr_reg(i, MCA_STATUS), 0);
+
 		rdmsrq(mca_msr_reg(i, MCA_CTL), msrval);
 		b->init = !!msrval;
 	}
@@ -2315,8 +2289,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 	__mcheck_cpu_init_early(c);
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
-	__mcheck_cpu_init_clear_banks();
-	__mcheck_cpu_check_banks();
+	__mcheck_cpu_init_prepare_banks();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2484,7 +2457,7 @@ static void mce_syscore_resume(void)
 {
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
-	__mcheck_cpu_init_clear_banks();
+	__mcheck_cpu_init_prepare_banks();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2502,7 +2475,7 @@ static void mce_cpu_restart(void *data)
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 	__mcheck_cpu_init_generic();
-	__mcheck_cpu_init_clear_banks();
+	__mcheck_cpu_init_prepare_banks();
 	__mcheck_cpu_init_timer();
 }
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early()
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (8 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 09/22] x86/mce: Cleanup bank processing on init Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-26  8:03   ` Nikolay Borisov
  2025-06-24 14:16 ` [PATCH v4 11/22] x86/mce: Define BSP-only init Yazen Ghannam
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The __mcheck_cpu_init_early() function was introduced so that some
vendor-specific features are detected before the first MCA polling event
done in __mcheck_cpu_init_generic().

Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
additional code will be needed to support various system configurations.

However, the current and future vendor-specific code should be done during
vendor init. This keeps all the vendor code in a common location and
simplifies the generic init flow.

Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().

Also, move __mcheck_cpu_init_generic() after
__mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
MCA polling event.

Additionally, this brings the MCA init flow closer to what is described
in the x86 docs.

The AMD PPRs say
  "The operating system must initialize the MCA_CONFIG registers prior to
  initialization of the MCA_CTL registers.

  The MCA_CTL registers must be initialized prior to enabling the error
  reporting banks in MCG_CTL".

However, the Intel SDM "Machine-Check Initialization Pseudocode" says
MCG_CTL first then MCi_CTL.

But both agree that CR4.MCE should be set last.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-6-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Update commit message.
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2, but based on old patch (see link).
    * Changed cpu_has() to cpu_feature_enabled().

 arch/x86/kernel/cpu/mce/amd.c  |  4 ++++
 arch/x86/kernel/cpu/mce/core.c | 20 +++-----------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5d351ec863cd..292109e46a94 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -655,6 +655,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
+	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
+	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
+	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
 		if (mce_flags.smca)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 486cddefaa7a..ebe3e98f7606 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2034,19 +2034,6 @@ static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
 	return false;
 }
 
-/*
- * Init basic CPU features needed for early decoding of MCEs.
- */
-static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
-{
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
-		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
-		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
-		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
-		mce_flags.amd_threshold	 = 1;
-	}
-}
-
 static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
 {
 	struct mca_config *cfg = &mca_cfg;
@@ -2286,10 +2273,9 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	mca_cfg.initialized = 1;
 
-	__mcheck_cpu_init_early(c);
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2455,9 +2441,9 @@ static void mce_syscore_shutdown(void)
  */
 static void mce_syscore_resume(void)
 {
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2474,8 +2460,8 @@ static void mce_cpu_restart(void *data)
 {
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_timer();
 }
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 11/22] x86/mce: Define BSP-only init
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (9 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-25 11:04   ` Nikolay Borisov
  2025-06-24 14:16 ` [PATCH v4 12/22] x86/mce: Define BSP-only SMCA init Yazen Ghannam
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Currently, MCA initialization is executed identically on each CPU as
they are brought online. However, a number of MCA initialization tasks
only need to be done once.

Define a function to collect all 'global' init tasks and call this from
the BSP only. Start with CPU features.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-7-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Change cpu_mca_init() to mca_bsp_init().
    * Drop code comment.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.

 arch/x86/include/asm/mce.h     |  2 ++
 arch/x86/kernel/cpu/common.c   |  1 +
 arch/x86/kernel/cpu/mce/amd.c  |  3 ---
 arch/x86/kernel/cpu/mce/core.c | 28 +++++++++++++++++++++-------
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3224f3862dc8..31e3cb550fb3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -241,12 +241,14 @@ struct cper_ia_proc_ctx;
 
 #ifdef CONFIG_X86_MCE
 int mcheck_init(void);
+void mca_bsp_init(struct cpuinfo_x86 *c);
 void mcheck_cpu_init(struct cpuinfo_x86 *c);
 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 			       u64 lapic_id);
 #else
 static inline int mcheck_init(void) { return 0; }
+static inline void mca_bsp_init(struct cpuinfo_x86 *c) {}
 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8feb8fd2957a..8a00faa1042a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1771,6 +1771,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 		setup_clear_cpu_cap(X86_FEATURE_LA57);
 
 	detect_nopl();
+	mca_bsp_init(c);
 }
 
 void __init init_cpu_devs(void)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 292109e46a94..25a24d0b9cf9 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -655,9 +655,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
-	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
-	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
-	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
 	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ebe3e98f7606..c55462e6af1c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1837,13 +1837,6 @@ static void __mcheck_cpu_cap_init(void)
 	this_cpu_write(mce_num_banks, b);
 
 	__mcheck_cpu_mce_banks_init();
-
-	/* Use accurate RIP reporting if available. */
-	if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
-		mca_cfg.rip_msr = MSR_IA32_MCG_EIP;
-
-	if (cap & MCG_SER_P)
-		mca_cfg.ser = 1;
 }
 
 static void __mcheck_cpu_init_generic(void)
@@ -2243,6 +2236,27 @@ DEFINE_IDTENTRY_RAW(exc_machine_check)
 }
 #endif
 
+void mca_bsp_init(struct cpuinfo_x86 *c)
+{
+	u64 cap;
+
+	if (!mce_available(c))
+		return;
+
+	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
+	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
+	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
+
+	rdmsrq(MSR_IA32_MCG_CAP, cap);
+
+	/* Use accurate RIP reporting if available. */
+	if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
+		mca_cfg.rip_msr = MSR_IA32_MCG_EIP;
+
+	if (cap & MCG_SER_P)
+		mca_cfg.ser = 1;
+}
+
 /*
  * Called for each booted CPU to set up machine checks.
  * Must be called with preempt off:

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 12/22] x86/mce: Define BSP-only SMCA init
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (10 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 11/22] x86/mce: Define BSP-only init Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 13/22] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Currently on AMD systems, MCA interrupt handler functions are set during
CPU init. However, the functions only need to be set once for the whole
system.

Assign the handlers only during BSP init. Do so only for SMCA systems to
maintain the old behavior for legacy systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-8-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Change mce_smca_cpu_init() to smca_bsp_init().
    
    v2->v3:
    * No change.
    
    v1->v2:
    * New in v2.

 arch/x86/kernel/cpu/mce/amd.c      | 6 ++++++
 arch/x86/kernel/cpu/mce/core.c     | 3 +++
 arch/x86/kernel/cpu/mce/internal.h | 2 ++
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 25a24d0b9cf9..d55c4b5f5354 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -686,6 +686,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 		deferred_error_interrupt_enable(c);
 }
 
+void smca_bsp_init(void)
+{
+	mce_threshold_vector	  = amd_threshold_interrupt;
+	deferred_error_int_vector = amd_deferred_error_interrupt;
+}
+
 /*
  * DRAM ECC errors are reported in the Northbridge (bank 4) with
  * Extended Error Code 8.
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index c55462e6af1c..c852d06713ed 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2247,6 +2247,9 @@ void mca_bsp_init(struct cpuinfo_x86 *c)
 	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
 	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
 
+	if (mce_flags.smca)
+		smca_bsp_init();
+
 	rdmsrq(MSR_IA32_MCG_CAP, cap);
 
 	/* Use accurate RIP reporting if available. */
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 64ac25b95360..6cb2995f0ec1 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -294,12 +294,14 @@ static __always_inline void smca_extract_err_addr(struct mce *m)
 	m->addr &= GENMASK_ULL(55, lsb);
 }
 
+void smca_bsp_init(void);
 #else
 static inline void mce_threshold_create_device(unsigned int cpu)	{ }
 static inline void mce_threshold_remove_device(unsigned int cpu)	{ }
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
 static inline void smca_extract_err_addr(struct mce *m) { }
+static inline void smca_bsp_init(void) { }
 #endif
 
 #ifdef CONFIG_X86_ANCIENT_MCE

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 13/22] x86/mce: Do 'UNKNOWN' vendor check early
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (11 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 12/22] x86/mce: Define BSP-only SMCA init Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The 'UNKNOWN' vendor check is handled as a quirk that is run on each
online CPU. However, all CPUs are expected to have the same vendor.

Move the 'UNKNOWN' vendor check to the BSP-only init so it is done early
and once. Remove the unnecessary return value from the quirks check.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-9-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.

 arch/x86/kernel/cpu/mce/core.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index c852d06713ed..a723c7886eae 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1979,14 +1979,11 @@ static void apply_quirks_zhaoxin(struct cpuinfo_x86 *c)
 }
 
 /* Add per CPU specific workarounds here */
-static bool __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
+static void __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 {
 	struct mca_config *cfg = &mca_cfg;
 
 	switch (c->x86_vendor) {
-	case X86_VENDOR_UNKNOWN:
-		pr_info("unknown CPU type - not enabling MCE support\n");
-		return false;
 	case X86_VENDOR_AMD:
 		apply_quirks_amd(c);
 		break;
@@ -2002,8 +1999,6 @@ static bool __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
 		cfg->panic_timeout = 30;
-
-	return true;
 }
 
 static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
@@ -2243,6 +2238,12 @@ void mca_bsp_init(struct cpuinfo_x86 *c)
 	if (!mce_available(c))
 		return;
 
+	if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
+		mca_cfg.disabled = 1;
+		pr_info("unknown CPU type - not enabling MCE support\n");
+		return;
+	}
+
 	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
 	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
 	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
@@ -2277,10 +2278,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	__mcheck_cpu_cap_init();
 
-	if (!__mcheck_cpu_apply_quirks(c)) {
-		mca_cfg.disabled = 1;
-		return;
-	}
+	__mcheck_cpu_apply_quirks(c);
 
 	if (!mce_gen_pool_init()) {
 		mca_cfg.disabled = 1;

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (12 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 13/22] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-27 11:02   ` Nikolay Borisov
  2025-06-24 14:16 ` [PATCH v4 15/22] x86/mce: Move machine_check_poll() status checks to helper functions Yazen Ghannam
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Many quirks are global configuration settings and a handful apply to
each CPU.

Move the per-CPU quirks to vendor init to execute them on each online
CPU. Set the global quirks during BSP-only init so they're only executed
once and early.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-10-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Add newline in mce_amd_feature_init().
    * Remove __mcheck_cpu_apply_quirks().
    * Update code comment ref. __mcheck_cpu_apply_quirks().
    
    v2->v3:
    * Update code comment.
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * New in v2.

 arch/x86/kernel/cpu/mce/amd.c   | 24 +++++++++++++
 arch/x86/kernel/cpu/mce/core.c  | 79 +++++++++++------------------------------
 arch/x86/kernel/cpu/mce/intel.c | 18 ++++++++++
 3 files changed, 62 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d55c4b5f5354..28dc36cdd137 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -648,6 +648,28 @@ static void disable_err_thresholding(struct cpuinfo_x86 *c, unsigned int bank)
 		wrmsrq(MSR_K7_HWCR, hwcr);
 }
 
+static void amd_apply_quirks(struct cpuinfo_x86 *c)
+{
+	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+	/* This should be disabled by the BIOS, but isn't always */
+	if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
+		/*
+		 * disable GART TBL walk error reporting, which
+		 * trips off incorrectly with the IOMMU & 3ware
+		 * & Cerberus:
+		 */
+		clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
+	}
+
+	/*
+	 * Various K7s with broken bank 0 around. Always disable
+	 * by default.
+	 */
+	if (c->x86 == 6 && this_cpu_read(mce_num_banks))
+		mce_banks[0].ctl = 0;
+}
+
 /* cpu init entry point, called from mce.c with preempt off */
 void mce_amd_feature_init(struct cpuinfo_x86 *c)
 {
@@ -655,6 +677,8 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	amd_apply_quirks(c);
+
 	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index a723c7886eae..4f6af060a395 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1807,8 +1807,9 @@ static void __mcheck_cpu_mce_banks_init(void)
 		struct mce_bank *b = &mce_banks[i];
 
 		/*
-		 * Init them all, __mcheck_cpu_apply_quirks() is going to apply
-		 * the required vendor quirks before
+		 * Init them all by default.
+		 *
+		 * The required vendor quirks will be applied before
 		 * __mcheck_cpu_init_prepare_banks() does the final bank setup.
 		 */
 		b->ctl = -1ULL;
@@ -1884,18 +1885,6 @@ static void __mcheck_cpu_init_prepare_banks(void)
 
 static void apply_quirks_amd(struct cpuinfo_x86 *c)
 {
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-
-	/* This should be disabled by the BIOS, but isn't always */
-	if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
-		/*
-		 * disable GART TBL walk error reporting, which
-		 * trips off incorrectly with the IOMMU & 3ware
-		 * & Cerberus:
-		 */
-		clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
-	}
-
 	if (c->x86 < 0x11 && mca_cfg.bootlog < 0) {
 		/*
 		 * Lots of broken BIOS around that don't clear them
@@ -1904,13 +1893,6 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c)
 		mca_cfg.bootlog = 0;
 	}
 
-	/*
-	 * Various K7s with broken bank 0 around. Always disable
-	 * by default.
-	 */
-	if (c->x86 == 6 && this_cpu_read(mce_num_banks))
-		mce_banks[0].ctl = 0;
-
 	/*
 	 * overflow_recov is supported for F15h Models 00h-0fh
 	 * even though we don't have a CPUID bit for it.
@@ -1924,23 +1906,10 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c)
 
 static void apply_quirks_intel(struct cpuinfo_x86 *c)
 {
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-
 	/* Older CPUs (prior to family 6) don't need quirks. */
 	if (c->x86_vfm < INTEL_PENTIUM_PRO)
 		return;
 
-	/*
-	 * SDM documents that on family 6 bank 0 should not be written
-	 * because it aliases to another special BIOS controlled
-	 * register.
-	 * But it's not aliased anymore on model 0x1a+
-	 * Don't ignore bank 0 completely because there could be a
-	 * valid event later, merely don't write CTL0.
-	 */
-	if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks))
-		mce_banks[0].init = false;
-
 	/*
 	 * All newer Intel systems support MCE broadcasting. Enable
 	 * synchronization with a one second timeout.
@@ -1978,29 +1947,6 @@ static void apply_quirks_zhaoxin(struct cpuinfo_x86 *c)
 	}
 }
 
-/* Add per CPU specific workarounds here */
-static void __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
-{
-	struct mca_config *cfg = &mca_cfg;
-
-	switch (c->x86_vendor) {
-	case X86_VENDOR_AMD:
-		apply_quirks_amd(c);
-		break;
-	case X86_VENDOR_INTEL:
-		apply_quirks_intel(c);
-		break;
-	case X86_VENDOR_ZHAOXIN:
-		apply_quirks_zhaoxin(c);
-		break;
-	}
-
-	if (cfg->monarch_timeout < 0)
-		cfg->monarch_timeout = 0;
-	if (cfg->bootlog != 0)
-		cfg->panic_timeout = 30;
-}
-
 static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
 {
 	if (c->x86 != 5)
@@ -2259,6 +2205,23 @@ void mca_bsp_init(struct cpuinfo_x86 *c)
 
 	if (cap & MCG_SER_P)
 		mca_cfg.ser = 1;
+
+	switch (c->x86_vendor) {
+	case X86_VENDOR_AMD:
+		apply_quirks_amd(c);
+		break;
+	case X86_VENDOR_INTEL:
+		apply_quirks_intel(c);
+		break;
+	case X86_VENDOR_ZHAOXIN:
+		apply_quirks_zhaoxin(c);
+		break;
+	}
+
+	if (mca_cfg.monarch_timeout < 0)
+		mca_cfg.monarch_timeout = 0;
+	if (mca_cfg.bootlog != 0)
+		mca_cfg.panic_timeout = 30;
 }
 
 /*
@@ -2278,8 +2241,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	__mcheck_cpu_cap_init();
 
-	__mcheck_cpu_apply_quirks(c);
-
 	if (!mce_gen_pool_init()) {
 		mca_cfg.disabled = 1;
 		pr_emerg("Couldn't allocate MCE records pool!\n");
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index efcf21e9552e..ae9417d634ac 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -468,8 +468,26 @@ static void intel_imc_init(struct cpuinfo_x86 *c)
 	}
 }
 
+static void intel_apply_quirks(struct cpuinfo_x86 *c)
+{
+	/*
+	 * SDM documents that on family 6 bank 0 should not be written
+	 * because it aliases to another special BIOS controlled
+	 * register.
+	 * But it's not aliased anymore on model 0x1a+
+	 * Don't ignore bank 0 completely because there could be a
+	 * valid event later, merely don't write CTL0.
+	 *
+	 * Older CPUs (prior to family 6) can't reach this point and already
+	 * return early due to the check of __mcheck_cpu_ancient_init().
+	 */
+	if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks))
+		this_cpu_ptr(mce_banks_array)[0].init = false;
+}
+
 void mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
+	intel_apply_quirks(c);
 	intel_init_cmci();
 	intel_init_lmce();
 	intel_imc_init(c);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 15/22] x86/mce: Move machine_check_poll() status checks to helper functions
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (13 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 16/22] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

There are a number of generic and vendor-specific status checks in
machine_check_poll(). These are used to determine if an error should be
skipped.

Move these into helper functions. Future vendor-specific checks will be
added to the helpers.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-11-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Change log_poll_error() to should_log_poll_error().
    * Keep code comment.

 arch/x86/kernel/cpu/mce/core.c | 88 +++++++++++++++++++++++-------------------
 1 file changed, 48 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4f6af060a395..bed059c6c808 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -714,6 +714,52 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
+/*
+ * Newer Intel systems that support software error
+ * recovery need to make additional checks. Other
+ * CPUs should skip over uncorrected errors, but log
+ * everything else.
+ */
+static bool ser_should_log_poll_error(struct mce *m)
+{
+	/* Log "not enabled" (speculative) errors */
+	if (!(m->status & MCI_STATUS_EN))
+		return true;
+
+	/*
+	 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
+	 * UC == 1 && PCC == 0 && S == 0
+	 */
+	if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
+		return true;
+
+	return false;
+}
+
+static bool should_log_poll_error(enum mcp_flags flags, struct mce_hw_err *err)
+{
+	struct mce *m = &err->m;
+
+	/* If this entry is not valid, ignore it. */
+	if (!(m->status & MCI_STATUS_VAL))
+		return false;
+
+	/*
+	 * If we are logging everything (at CPU online) or this
+	 * is a corrected error, then we must log it.
+	 */
+	if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
+		return true;
+
+	if (mca_cfg.ser)
+		return ser_should_log_poll_error(m);
+
+	if (m->status & MCI_STATUS_UC)
+		return false;
+
+	return true;
+}
+
 /*
  * Poll for corrected events or events that happened before reset.
  * Those are just logged through /dev/mcelog.
@@ -765,48 +811,10 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (!mca_cfg.cmci_disabled)
 			mce_track_storm(m);
 
-		/* If this entry is not valid, ignore it */
-		if (!(m->status & MCI_STATUS_VAL))
+		/* Verify that the error should be logged based on hardware conditions. */
+		if (!should_log_poll_error(flags, &err))
 			continue;
 
-		/*
-		 * If we are logging everything (at CPU online) or this
-		 * is a corrected error, then we must log it.
-		 */
-		if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
-			goto log_it;
-
-		/*
-		 * Newer Intel systems that support software error
-		 * recovery need to make additional checks. Other
-		 * CPUs should skip over uncorrected errors, but log
-		 * everything else.
-		 */
-		if (!mca_cfg.ser) {
-			if (m->status & MCI_STATUS_UC)
-				continue;
-			goto log_it;
-		}
-
-		/* Log "not enabled" (speculative) errors */
-		if (!(m->status & MCI_STATUS_EN))
-			goto log_it;
-
-		/*
-		 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
-		 * UC == 1 && PCC == 0 && S == 0
-		 */
-		if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
-			goto log_it;
-
-		/*
-		 * Skip anything else. Presumption is that our read of this
-		 * bank is racing with a machine check. Leave the log alone
-		 * for do_machine_check() to deal with it.
-		 */
-		continue;
-
-log_it:
 		mce_read_aux(&err, i);
 		m->severity = mce_severity(m, NULL, NULL, false);
 		/*

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 16/22] x86/mce: Unify AMD THR handler with MCA Polling
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (14 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 15/22] x86/mce: Move machine_check_poll() status checks to helper functions Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 17/22] x86/mce: Unify AMD DFR " Yazen Ghannam
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

AMD systems optionally support an MCA thresholding interrupt. The
interrupt should be used as another signal to trigger MCA polling. This
is similar to how the Intel Corrected Machine Check interrupt (CMCI) is
handled.

AMD MCA thresholding is managed using the MCA_MISC registers within an
MCA bank. The OS will need to modify the hardware error count field in
order to reset the threshold limit and rearm the interrupt. Management
of the MCA_MISC register should be done as a follow up to the basic MCA
polling flow. It should not be the main focus of the interrupt handler.

Furthermore, future systems will have the ability to send an MCA
thresholding interrupt to the OS even when the OS does not manage the
feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.

Call the common MCA polling function when handling the MCA thresholding
interrupt. This will allow the OS to find any valid errors whether or
not the MCA thresholding feature is OS-managed. Also, this allows the
common MCA polling options and kernel parameters to apply to AMD
systems.

Add a callback to the MCA polling function to check and reset any
threshold blocks that have reached their threshold limit.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-12-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Start collecting per-CPU items in a struct.
    * Keep and use mce_flags.amd_threshold.

 arch/x86/kernel/cpu/mce/amd.c      | 49 ++++++++++++++++----------------------
 arch/x86/kernel/cpu/mce/core.c     |  3 +++
 arch/x86/kernel/cpu/mce/internal.h |  2 ++
 3 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 28dc36cdd137..0c789fa1ae4c 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -54,6 +54,12 @@
 
 static bool thresholding_irq_en;
 
+struct mce_amd_cpu_data {
+	mce_banks_t     thr_intr_banks;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_amd_cpu_data, mce_amd_data);
+
 static const char * const th_names[] = {
 	"load_store",
 	"insn_fetch",
@@ -558,6 +564,7 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
 	if (!b.interrupt_capable)
 		goto done;
 
+	__set_bit(bank, this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
 	b.interrupt_enable = 1;
 
 	if (!mce_flags.smca) {
@@ -898,12 +905,7 @@ static void amd_deferred_error_interrupt(void)
 		log_error_deferred(bank);
 }
 
-static void log_error_thresholding(unsigned int bank, u64 misc)
-{
-	_log_error_deferred(bank, misc);
-}
-
-static void log_and_reset_block(struct threshold_block *block)
+static void reset_block(struct threshold_block *block)
 {
 	struct thresh_restart tr;
 	u32 low = 0, high = 0;
@@ -917,23 +919,14 @@ static void log_and_reset_block(struct threshold_block *block)
 	if (!(high & MASK_OVERFLOW_HI))
 		return;
 
-	/* Log the MCE which caused the threshold event. */
-	log_error_thresholding(block->bank, ((u64)high << 32) | low);
-
-	/* Reset threshold block after logging error. */
 	memset(&tr, 0, sizeof(tr));
 	tr.b = block;
 	threshold_restart_block(&tr);
 }
 
-/*
- * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
- * goes off when error_count reaches threshold_limit.
- */
-static void amd_threshold_interrupt(void)
+void amd_reset_thr_limit(unsigned int bank)
 {
-	struct threshold_bank **bp = this_cpu_read(threshold_banks), *thr_bank;
-	unsigned int bank, cpu = smp_processor_id();
+	struct threshold_bank **bp = this_cpu_read(threshold_banks);
 	struct threshold_block *block, *tmp;
 
 	/*
@@ -941,20 +934,20 @@ static void amd_threshold_interrupt(void)
 	 * handler is installed at boot time, but on a hotplug event the
 	 * interrupt might fire before the data has been initialized.
 	 */
-	if (!bp)
+	if (!bp || !bp[bank])
 		return;
 
-	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
-		if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
-			continue;
-
-		thr_bank = bp[bank];
-		if (!thr_bank)
-			continue;
+	list_for_each_entry_safe(block, tmp, &bp[bank]->miscj, miscj)
+		reset_block(block);
+}
 
-		list_for_each_entry_safe(block, tmp, &thr_bank->miscj, miscj)
-			log_and_reset_block(block);
-	}
+/*
+ * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
+ * goes off when error_count reaches threshold_limit.
+ */
+static void amd_threshold_interrupt(void)
+{
+	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bed059c6c808..a0aaf0abd798 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -831,6 +831,9 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 			mce_log(&err);
 
 clear_it:
+		if (mce_flags.amd_threshold)
+			amd_reset_thr_limit(i);
+
 		/*
 		 * Clear state for this bank.
 		 */
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 6cb2995f0ec1..e25ad0c005d5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -269,6 +269,7 @@ void mce_threshold_create_device(unsigned int cpu);
 void mce_threshold_remove_device(unsigned int cpu);
 extern bool amd_filter_mce(struct mce *m);
 bool amd_mce_usable_address(struct mce *m);
+void amd_reset_thr_limit(unsigned int bank);
 
 /*
  * If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits
@@ -300,6 +301,7 @@ static inline void mce_threshold_create_device(unsigned int cpu)	{ }
 static inline void mce_threshold_remove_device(unsigned int cpu)	{ }
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
+static inline void amd_reset_thr_limit(unsigned int bank) { }
 static inline void smca_extract_err_addr(struct mce *m) { }
 static inline void smca_bsp_init(void) { }
 #endif

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 17/22] x86/mce: Unify AMD DFR handler with MCA Polling
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (15 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 16/22] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 18/22] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

AMD systems optionally support a deferred error interrupt. The interrupt
should be used as another signal to trigger MCA polling. This is similar
to how other MCA interrupts are handled.

Deferred errors do not require any special handling related to the
interrupt, e.g. resetting or rearming the interrupt, etc.

However, Scalable MCA systems include a pair of registers, MCA_DESTAT
and MCA_DEADDR, that should be checked for valid errors. This check
should be done whenever MCA registers are polled. Currently, the
deferred error interrupt does this check, but the MCA polling function
does not.

Call the MCA polling function when handling the deferred error
interrupt. This keeps all "polling" cases in a common function.

Call the polling function only for banks that have the deferred error
interrupt enabled.

Add an SMCA status check helper. This will do the same status check and
register clearing that the interrupt handler has done. And it extends
the common polling flow to find AMD deferred errors.

Remove old code whose functionality is already covered in the common MCA
code.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-13-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Add kflag for checking DFR registers.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Keep code comment.
    * Log directly from helper function rather than pass values.
    
    Link:
    https://lore.kernel.org/r/20250213-wip-mca-updates-v2-13-3636547fe05f@amd.com
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Keep code comment.
    * Log directly from helper function rather than pass values.

 arch/x86/include/asm/mce.h     |   6 +++
 arch/x86/kernel/cpu/mce/amd.c  | 103 ++---------------------------------------
 arch/x86/kernel/cpu/mce/core.c |  50 +++++++++++++++++++-
 3 files changed, 59 insertions(+), 100 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 31e3cb550fb3..7d6588195d56 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -165,6 +165,12 @@
  */
 #define MCE_IN_KERNEL_COPYIN	BIT_ULL(7)
 
+/*
+ * Indicates that handler should check and clear Deferred error registers
+ * rather than common ones.
+ */
+#define MCE_CHECK_DFR_REGS	BIT_ULL(8)
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 0c789fa1ae4c..9d4d0169713b 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -56,6 +56,7 @@ static bool thresholding_irq_en;
 
 struct mce_amd_cpu_data {
 	mce_banks_t     thr_intr_banks;
+	mce_banks_t     dfr_intr_banks;
 };
 
 static DEFINE_PER_CPU_READ_MOSTLY(struct mce_amd_cpu_data, mce_amd_data);
@@ -300,8 +301,10 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 		 * APIC based interrupt. First, check that no interrupt has been
 		 * set.
 		 */
-		if ((low & BIT(5)) && !((high >> 5) & 0x3))
+		if ((low & BIT(5)) && !((high >> 5) & 0x3)) {
+			__set_bit(bank, this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
 			high |= BIT(5);
+		}
 
 		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
 
@@ -794,37 +797,6 @@ bool amd_mce_usable_address(struct mce *m)
 	return false;
 }
 
-static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
-{
-	struct mce_hw_err err;
-	struct mce *m = &err.m;
-
-	mce_prep_record(&err);
-
-	m->status = status;
-	m->misc   = misc;
-	m->bank   = bank;
-	m->tsc	 = rdtsc();
-
-	if (m->status & MCI_STATUS_ADDRV) {
-		m->addr = addr;
-
-		smca_extract_err_addr(m);
-	}
-
-	if (mce_flags.smca) {
-		rdmsrq(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
-
-		if (m->status & MCI_STATUS_SYNDV) {
-			rdmsrq(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
-			rdmsrq(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vendor.amd.synd1);
-			rdmsrq(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vendor.amd.synd2);
-		}
-	}
-
-	mce_log(&err);
-}
-
 DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
 {
 	trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
@@ -834,75 +806,10 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
 	apic_eoi();
 }
 
-/*
- * Returns true if the logged error is deferred. False, otherwise.
- */
-static inline bool
-_log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
-{
-	u64 status, addr = 0;
-
-	rdmsrq(msr_stat, status);
-	if (!(status & MCI_STATUS_VAL))
-		return false;
-
-	if (status & MCI_STATUS_ADDRV)
-		rdmsrq(msr_addr, addr);
-
-	__log_error(bank, status, addr, misc);
-
-	wrmsrq(msr_stat, 0);
-
-	return status & MCI_STATUS_DEFERRED;
-}
-
-static bool _log_error_deferred(unsigned int bank, u32 misc)
-{
-	if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
-			     mca_msr_reg(bank, MCA_ADDR), misc))
-		return false;
-
-	/*
-	 * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
-	 * Return true here to avoid accessing these registers.
-	 */
-	if (!mce_flags.smca)
-		return true;
-
-	/* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
-	wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
-	return true;
-}
-
-/*
- * We have three scenarios for checking for Deferred errors:
- *
- * 1) Non-SMCA systems check MCA_STATUS and log error if found.
- * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
- *    clear MCA_DESTAT.
- * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and
- *    log it.
- */
-static void log_error_deferred(unsigned int bank)
-{
-	if (_log_error_deferred(bank, 0))
-		return;
-
-	/*
-	 * Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
-	 * for a valid error.
-	 */
-	_log_error_bank(bank, MSR_AMD64_SMCA_MCx_DESTAT(bank),
-			      MSR_AMD64_SMCA_MCx_DEADDR(bank), 0);
-}
-
 /* APIC interrupt handler for deferred errors */
 static void amd_deferred_error_interrupt(void)
 {
-	unsigned int bank;
-
-	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
-		log_error_deferred(bank);
+	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
 }
 
 static void reset_block(struct threshold_block *block)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index a0aaf0abd798..dc5ddddceb22 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -687,7 +687,10 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 		m->misc = mce_rdmsrq(mca_msr_reg(i, MCA_MISC));
 
 	if (m->status & MCI_STATUS_ADDRV) {
-		m->addr = mce_rdmsrq(mca_msr_reg(i, MCA_ADDR));
+		if (m->kflags & MCE_CHECK_DFR_REGS)
+			m->addr = mce_rdmsrq(MSR_AMD64_SMCA_MCx_DEADDR(i));
+		else
+			m->addr = mce_rdmsrq(mca_msr_reg(i, MCA_ADDR));
 
 		/*
 		 * Mask the reported address by the reported granularity.
@@ -714,6 +717,43 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
+/*
+ * We have three scenarios for checking for Deferred errors:
+ *
+ * 1) Non-SMCA systems check MCA_STATUS and log error if found.
+ * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
+ *    clear MCA_DESTAT.
+ * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and
+ *    log it.
+ */
+static bool smca_should_log_poll_error(enum mcp_flags flags, struct mce_hw_err *err)
+{
+	struct mce *m = &err->m;
+
+	/*
+	 * If this is a deferred error found in MCA_STATUS, then clear
+	 * the redundant data from the MCA_DESTAT register.
+	 */
+	if (m->status & MCI_STATUS_VAL) {
+		if (m->status & MCI_STATUS_DEFERRED)
+			mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
+
+		return true;
+	}
+
+	/*
+	 * If the MCA_DESTAT register has valid data, then use
+	 * it as the status register.
+	 */
+	m->status = mce_rdmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank));
+
+	if (!(m->status & MCI_STATUS_VAL))
+		return false;
+
+	m->kflags |= MCE_CHECK_DFR_REGS;
+	return true;
+}
+
 /*
  * Newer Intel systems that support software error
  * recovery need to make additional checks. Other
@@ -740,6 +780,9 @@ static bool should_log_poll_error(enum mcp_flags flags, struct mce_hw_err *err)
 {
 	struct mce *m = &err->m;
 
+	if (mce_flags.smca)
+		return smca_should_log_poll_error(flags, err);
+
 	/* If this entry is not valid, ignore it. */
 	if (!(m->status & MCI_STATUS_VAL))
 		return false;
@@ -837,7 +880,10 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		/*
 		 * Clear state for this bank.
 		 */
-		mce_wrmsrq(mca_msr_reg(i, MCA_STATUS), 0);
+		if (m->kflags & MCE_CHECK_DFR_REGS)
+			mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(i), 0);
+		else
+			mce_wrmsrq(mca_msr_reg(i, MCA_STATUS), 0);
 	}
 
 	/*

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 18/22] x86/mce/amd: Support SMCA Corrected Error Interrupt
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (16 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 17/22] x86/mce: Unify AMD DFR " Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 19/22] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

AMD systems optionally support MCA thresholding which provides the
ability for hardware to send an interrupt when a set error threshold is
reached. This feature counts errors of all severities, but it is
commonly used to report correctable errors with an interrupt rather than
polling.

Scalable MCA systems allow the Platform to take control of this feature.
In this case, the OS will not see the feature configuration and control
bits in the MCA_MISC* registers. The OS will not receive the MCA
thresholding interrupt, and it will need to poll for correctable errors.

A "corrected error interrupt" will be available on Scalable MCA systems.
This will be used in the same configuration where the Platform controls
MCA thresholding. However, the Platform will now be able to send the
MCA thresholding interrupt to the OS.

Check for, and enable, this feature during per-CPU SMCA init.

Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-15-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Add code comment describing bits.
    
    v2->v3:
    * Add tags from Tony.
    
    v1->v2:
    * Use new per-CPU struct.

 arch/x86/kernel/cpu/mce/amd.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9d4d0169713b..3ddb28f90d70 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -271,6 +271,7 @@ void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
+	struct mce_amd_cpu_data *data = this_cpu_ptr(&mce_amd_data);
 	u8 *bank_counts = this_cpu_ptr(smca_bank_counts);
 	const struct smca_hwid *s_hwid;
 	unsigned int i, hwid_mcatype;
@@ -302,10 +303,27 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 		 * set.
 		 */
 		if ((low & BIT(5)) && !((high >> 5) & 0x3)) {
-			__set_bit(bank, this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
+			__set_bit(bank, data->dfr_intr_banks);
 			high |= BIT(5);
 		}
 
+		/*
+		 * SMCA Corrected Error Interrupt
+		 *
+		 * MCA_CONFIG[IntPresent] is bit 10, and tells us if the bank can
+		 * send an MCA Thresholding interrupt without the OS initializing
+		 * this feature. This can be used if the threshold limit is managed
+		 * by the platform.
+		 *
+		 * MCA_CONFIG[IntEn] is bit 40 (8 in the high portion of the MSR).
+		 * The OS should set this to inform the platform that the OS is ready
+		 * to handle the MCA Thresholding interrupt.
+		 */
+		if (low & BIT(10)) {
+			__set_bit(bank, data->thr_intr_banks);
+			high |= BIT(8);
+		}
+
 		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
 
 		wrmsr(smca_config, low, high);

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 19/22] x86/mce/amd: Remove redundant reset_block()
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (17 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 18/22] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 20/22] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Many of the checks in reset_block() are done again in the block reset
function. So drop the redundant checks.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/kernel/cpu/mce/amd.c | 28 +++++++---------------------
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3ddb28f90d70..3cb9f6a68316 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -830,29 +830,11 @@ static void amd_deferred_error_interrupt(void)
 	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
 }
 
-static void reset_block(struct threshold_block *block)
-{
-	struct thresh_restart tr;
-	u32 low = 0, high = 0;
-
-	if (!block)
-		return;
-
-	if (rdmsr_safe(block->address, &low, &high))
-		return;
-
-	if (!(high & MASK_OVERFLOW_HI))
-		return;
-
-	memset(&tr, 0, sizeof(tr));
-	tr.b = block;
-	threshold_restart_block(&tr);
-}
-
 void amd_reset_thr_limit(unsigned int bank)
 {
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
 	struct threshold_block *block, *tmp;
+	struct thresh_restart tr;
 
 	/*
 	 * Validate that the threshold bank has been initialized already. The
@@ -862,8 +844,12 @@ void amd_reset_thr_limit(unsigned int bank)
 	if (!bp || !bp[bank])
 		return;
 
-	list_for_each_entry_safe(block, tmp, &bp[bank]->miscj, miscj)
-		reset_block(block);
+	memset(&tr, 0, sizeof(tr));
+
+	list_for_each_entry_safe(block, tmp, &bp[bank]->miscj, miscj) {
+		tr.b = block;
+		threshold_restart_block(&tr);
+	}
 }
 
 /*

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 20/22] x86/mce/amd: Define threshold restart function for banks
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (18 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 19/22] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 21/22] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 22/22] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

Prepare for CMCI storm support by moving the common bank/block
iterator code to a helper function.

Include a parameter to switch the interrupt enable. This will be used by
the CMCI storm handling function.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/kernel/cpu/mce/amd.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3cb9f6a68316..56e7d99cef86 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -470,6 +470,24 @@ static void threshold_restart_block(void *_tr)
 	wrmsr(tr->b->address, lo, hi);
 }
 
+static void threshold_restart_bank(unsigned int bank, bool intr_en)
+{
+	struct threshold_bank **thr_banks = this_cpu_read(threshold_banks);
+	struct threshold_block *block, *tmp;
+	struct thresh_restart tr;
+
+	if (!thr_banks || !thr_banks[bank])
+		return;
+
+	memset(&tr, 0, sizeof(tr));
+
+	list_for_each_entry_safe(block, tmp, &thr_banks[bank]->miscj, miscj) {
+		tr.b = block;
+		tr.b->interrupt_enable = intr_en;
+		threshold_restart_block(&tr);
+	}
+}
+
 static void mce_threshold_block_init(struct threshold_block *b, int offset)
 {
 	struct thresh_restart tr = {
@@ -832,24 +850,7 @@ static void amd_deferred_error_interrupt(void)
 
 void amd_reset_thr_limit(unsigned int bank)
 {
-	struct threshold_bank **bp = this_cpu_read(threshold_banks);
-	struct threshold_block *block, *tmp;
-	struct thresh_restart tr;
-
-	/*
-	 * Validate that the threshold bank has been initialized already. The
-	 * handler is installed at boot time, but on a hotplug event the
-	 * interrupt might fire before the data has been initialized.
-	 */
-	if (!bp || !bp[bank])
-		return;
-
-	memset(&tr, 0, sizeof(tr));
-
-	list_for_each_entry_safe(block, tmp, &bp[bank]->miscj, miscj) {
-		tr.b = block;
-		threshold_restart_block(&tr);
-	}
+	threshold_restart_bank(bank, true);
 }
 
 /*

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 21/22] x86/mce: Handle AMD threshold interrupt storms
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (19 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 20/22] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  2025-06-24 14:16 ` [PATCH v4 22/22] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Extend the logic of handling CMCI storms to AMD threshold interrupts.

Rely on the similar approach as of Intel's CMCI to mitigate storms per
CPU and per bank. But, unlike CMCI, do not set thresholds and reduce
interrupt rate on a storm. Rather, disable the interrupt on the
corresponding CPU and bank. Re-enable back the interrupts if enough
consecutive polls of the bank show no corrected errors (30, as
programmed by Intel).

Turning off the threshold interrupts would be a better solution on AMD
systems as other error severities will still be handled even if the
threshold interrupts are disabled.

[Tony: Small tweak because mce_handle_storm() isn't a pointer now]
[Yazen: Rebase and simplify]

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250415-wip-mca-updates-v3-16-8ffd9eb4aa56@amd.com
    
    v3->v4:
    * Simplify based on new patches in this set.
    
    v2->v3:
    * Add tag from Qiuxu.
    
    v1->v2:
    * New in v2, but based on older patch.
    * Rebased on current set and simplified.
    * Kept old tags.

 arch/x86/kernel/cpu/mce/amd.c       | 5 +++++
 arch/x86/kernel/cpu/mce/internal.h  | 2 ++
 arch/x86/kernel/cpu/mce/threshold.c | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 56e7d99cef86..85fd9273e90a 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -848,6 +848,11 @@ static void amd_deferred_error_interrupt(void)
 	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
 }
 
+void mce_amd_handle_storm(unsigned int bank, bool on)
+{
+	threshold_restart_bank(bank, on);
+}
+
 void amd_reset_thr_limit(unsigned int bank)
 {
 	threshold_restart_bank(bank, true);
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index e25ad0c005d5..09ebcf82df93 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -267,6 +267,7 @@ void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m);
 #ifdef CONFIG_X86_MCE_AMD
 void mce_threshold_create_device(unsigned int cpu);
 void mce_threshold_remove_device(unsigned int cpu);
+void mce_amd_handle_storm(unsigned int bank, bool on);
 extern bool amd_filter_mce(struct mce *m);
 bool amd_mce_usable_address(struct mce *m);
 void amd_reset_thr_limit(unsigned int bank);
@@ -299,6 +300,7 @@ void smca_bsp_init(void);
 #else
 static inline void mce_threshold_create_device(unsigned int cpu)	{ }
 static inline void mce_threshold_remove_device(unsigned int cpu)	{ }
+static inline void mce_amd_handle_storm(unsigned int bank, bool on)	{ }
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
 static inline void amd_reset_thr_limit(unsigned int bank) { }
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index f4a007616468..45144598ec74 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -63,6 +63,9 @@ static void mce_handle_storm(unsigned int bank, bool on)
 	case X86_VENDOR_INTEL:
 		mce_intel_handle_storm(bank, on);
 		break;
+	case X86_VENDOR_AMD:
+		mce_amd_handle_storm(bank, on);
+		break;
 	}
 }
 

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v4 22/22] x86/mce: Save and use APEI corrected threshold limit
  2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
                   ` (20 preceding siblings ...)
  2025-06-24 14:16 ` [PATCH v4 21/22] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
@ 2025-06-24 14:16 ` Yazen Ghannam
  21 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-24 14:16 UTC (permalink / raw)
  To: x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi, Yazen Ghannam

The MCA threshold limit generally is not something that needs to change
during runtime. It is common for a system administrator to decide on a
policy for their managed systems.

If MCA thresholding is OS-managed, then the threshold limit must be set
at every boot. However, many systems allow the user to set a value in
their BIOS. And this is reported through an APEI HEST entry even if
thresholding is not in FW-First mode.

Use this value, if available, to set the OS-managed threshold limit.
Users can still override it through sysfs if desired for testing or
debug.

APEI is parsed after MCE is initialized. So reset the thresholding
blocks later to pick up the threshold limit.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    v3->v4:
    * New in v4.

 arch/x86/include/asm/mce.h          |  6 ++++++
 arch/x86/kernel/acpi/apei.c         |  2 ++
 arch/x86/kernel/cpu/mce/amd.c       | 18 ++++++++++++++++--
 arch/x86/kernel/cpu/mce/internal.h  |  2 ++
 arch/x86/kernel/cpu/mce/threshold.c | 13 +++++++++++++
 5 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 7d6588195d56..1cfbfff0be3f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -308,6 +308,12 @@ DECLARE_PER_CPU(struct mce, injectm);
 /* Disable CMCI/polling for MCA bank claimed by firmware */
 extern void mce_disable_bank(int bank);
 
+#ifdef CONFIG_X86_MCE_THRESHOLD
+void mce_save_apei_thr_limit(u32 thr_limit);
+#else
+static inline void mce_save_apei_thr_limit(u32 thr_limit) { }
+#endif /* CONFIG_X86_MCE_THRESHOLD */
+
 /*
  * Exception handler
  */
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
index 0916f00a992e..e21419e686eb 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -19,6 +19,8 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
 	if (!cmc->enabled)
 		return 0;
 
+	mce_save_apei_thr_limit(cmc->notify.error_threshold_value);
+
 	/*
 	 * We expect HEST to provide a list of MC banks that report errors
 	 * in firmware first mode. Otherwise, return non-zero value to
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 85fd9273e90a..9e26b0437dc6 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -488,6 +488,18 @@ static void threshold_restart_bank(unsigned int bank, bool intr_en)
 	}
 }
 
+/* Try to use the threshold limit reported through APEI. */
+static u16 get_thr_limit(void)
+{
+	u32 thr_limit = mce_get_apei_thr_limit();
+
+	/* Fallback to old default if APEI limit is not available. */
+	if (!thr_limit)
+		return THRESHOLD_MAX;
+
+	return min(thr_limit, THRESHOLD_MAX);
+}
+
 static void mce_threshold_block_init(struct threshold_block *b, int offset)
 {
 	struct thresh_restart tr = {
@@ -496,7 +508,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
 		.lvt_off		= offset,
 	};
 
-	b->threshold_limit		= THRESHOLD_MAX;
+	b->threshold_limit		= get_thr_limit();
 	threshold_restart_block(&tr);
 };
 
@@ -1079,7 +1091,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
 	b->address		= address;
 	b->interrupt_enable	= 0;
 	b->interrupt_capable	= lvt_interrupt_supported(bank, high);
-	b->threshold_limit	= THRESHOLD_MAX;
+	b->threshold_limit	= get_thr_limit();
 
 	if (b->interrupt_capable) {
 		default_attrs[2] = &interrupt_enable.attr;
@@ -1090,6 +1102,8 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
 
 	list_add(&b->miscj, &tb->miscj);
 
+	mce_threshold_block_init(b, (high & MASK_LVTOFF_HI) >> 20);
+
 	err = kobject_init_and_add(&b->kobj, &threshold_ktype, tb->kobj, get_name(cpu, bank, b));
 	if (err)
 		goto out_free;
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 09ebcf82df93..df98930a32a5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -67,6 +67,7 @@ void mce_track_storm(struct mce *mce);
 void mce_inherit_storm(unsigned int bank);
 bool mce_get_storm_mode(void);
 void mce_set_storm_mode(bool storm);
+u32  mce_get_apei_thr_limit(void);
 #else
 static inline void cmci_storm_begin(unsigned int bank) {}
 static inline void cmci_storm_end(unsigned int bank) {}
@@ -74,6 +75,7 @@ static inline void mce_track_storm(struct mce *mce) {}
 static inline void mce_inherit_storm(unsigned int bank) {}
 static inline bool mce_get_storm_mode(void) { return false; }
 static inline void mce_set_storm_mode(bool storm) {}
+static inline u32  mce_get_apei_thr_limit(void) { return 0; }
 #endif
 
 /*
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 45144598ec74..d00d5bf9959d 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -13,6 +13,19 @@
 
 #include "internal.h"
 
+static u32 mce_apei_thr_limit;
+
+void mce_save_apei_thr_limit(u32 thr_limit)
+{
+	mce_apei_thr_limit = thr_limit;
+	pr_info("HEST: Corrected error threshold limit = %u\n", thr_limit);
+}
+
+u32 mce_get_apei_thr_limit(void)
+{
+	return mce_apei_thr_limit;
+}
+
 static void default_threshold_interrupt(void)
 {
 	pr_err("Unexpected threshold interrupt at vector %x\n",

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 11/22] x86/mce: Define BSP-only init
  2025-06-24 14:16 ` [PATCH v4 11/22] x86/mce: Define BSP-only init Yazen Ghannam
@ 2025-06-25 11:04   ` Nikolay Borisov
  2025-06-25 11:26     ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-25 11:04 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:16, Yazen Ghannam wrote:
> Currently, MCA initialization is executed identically on each CPU as
> they are brought online. However, a number of MCA initialization tasks
> only need to be done once.
> 
> Define a function to collect all 'global' init tasks and call this from
> the BSP only. Start with CPU features.
> 
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> 
> Notes:
>      Link:
>      https://lore.kernel.org/r/20250415-wip-mca-updates-v3-7-8ffd9eb4aa56@amd.com
>      
>      v3->v4:
>      * Change cpu_mca_init() to mca_bsp_init().
>      * Drop code comment.
>      
>      v2->v3:
>      * Add tags from Qiuxu and Tony.
>      
>      v1->v2:
>      * New in v2.
> 
>   arch/x86/include/asm/mce.h     |  2 ++
>   arch/x86/kernel/cpu/common.c   |  1 +
>   arch/x86/kernel/cpu/mce/amd.c  |  3 ---
>   arch/x86/kernel/cpu/mce/core.c | 28 +++++++++++++++++++++-------
>   4 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 3224f3862dc8..31e3cb550fb3 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -241,12 +241,14 @@ struct cper_ia_proc_ctx;
>   
>   #ifdef CONFIG_X86_MCE
>   int mcheck_init(void);
> +void mca_bsp_init(struct cpuinfo_x86 *c);
>   void mcheck_cpu_init(struct cpuinfo_x86 *c);
>   void mcheck_cpu_clear(struct cpuinfo_x86 *c);
>   int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
>   			       u64 lapic_id);
>   #else
>   static inline int mcheck_init(void) { return 0; }
> +static inline void mca_bsp_init(struct cpuinfo_x86 *c) {}
>   static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
>   static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
>   static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 8feb8fd2957a..8a00faa1042a 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1771,6 +1771,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>   		setup_clear_cpu_cap(X86_FEATURE_LA57);
>   
>   	detect_nopl();
> +	mca_bsp_init(c);
>   }
>   
>   void __init init_cpu_devs(void)
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 292109e46a94..25a24d0b9cf9 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -655,9 +655,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>   	u32 low = 0, high = 0, address = 0;
>   	int offset = -1;
>   
> -	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
> -	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
> -	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
>   	mce_flags.amd_threshold	 = 1;
>   
>   	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index ebe3e98f7606..c55462e6af1c 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1837,13 +1837,6 @@ static void __mcheck_cpu_cap_init(void)
>   	this_cpu_write(mce_num_banks, b);
>   
>   	__mcheck_cpu_mce_banks_init();
> -
> -	/* Use accurate RIP reporting if available. */
> -	if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
> -		mca_cfg.rip_msr = MSR_IA32_MCG_EIP;
> -
> -	if (cap & MCG_SER_P)
> -		mca_cfg.ser = 1;
>   }
>   
>   static void __mcheck_cpu_init_generic(void)
> @@ -2243,6 +2236,27 @@ DEFINE_IDTENTRY_RAW(exc_machine_check)
>   }
>   #endif
>   
> +void mca_bsp_init(struct cpuinfo_x86 *c)
> +{
> +	u64 cap;
> +
> +	if (!mce_available(c))
> +		return;
> +
> +	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
> +	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
> +	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);

nit: Why use cpu_feature_enabled VS say boot_cpu_has since none of the 3 
features are defined in cpufeaturemasks.h, meaning that 
cpu_feature_enabled is essentially static_cpu_has, given that this is 
not a fast path?

It's not wrong per-se but I think the cpu_feature_enabled api is 
somewhat of a trainwreck i.e we ought to have a version that uses 
boot_cpu_has for "ordinary uses" and probably cpu_feature_enabled_fast 
for fastpaths.
> +
> +	rdmsrq(MSR_IA32_MCG_CAP, cap);
> +
> +	/* Use accurate RIP reporting if available. */
> +	if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
> +		mca_cfg.rip_msr = MSR_IA32_MCG_EIP;
> +
> +	if (cap & MCG_SER_P)
> +		mca_cfg.ser = 1;
> +}
> +
>   /*
>    * Called for each booted CPU to set up machine checks.
>    * Must be called with preempt off:
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 11/22] x86/mce: Define BSP-only init
  2025-06-25 11:04   ` Nikolay Borisov
@ 2025-06-25 11:26     ` Borislav Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2025-06-25 11:26 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown,
	linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi

On Wed, Jun 25, 2025 at 02:04:24PM +0300, Nikolay Borisov wrote:
> nit: Why use cpu_feature_enabled VS say boot_cpu_has since none of the 3
> features are defined in cpufeaturemasks.h, meaning that cpu_feature_enabled
> is essentially static_cpu_has, given that this is not a fast path?
> 
> It's not wrong per-se but I think the cpu_feature_enabled api is somewhat of
> a trainwreck i.e we ought to have a version that uses boot_cpu_has for
> "ordinary uses" and probably cpu_feature_enabled_fast for fastpaths.

Look at the resulting asm of *cpu_has().

And, we need exactly one interface that everyone should use and everyone
should not care about what it does underneath. Not 10 interfaces as it is now.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides
  2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
@ 2025-06-25 13:22   ` Nikolay Borisov
  2025-06-28 11:33   ` [tip: ras/urgent] x86/mce: Ensure user polling settings are honored when restarting timer tip-bot2 for Yazen Ghannam
  1 sibling, 0 replies; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-25 13:22 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:15, Yazen Ghannam wrote:
> Users can disable MCA polling by setting the "ignore_ce" parameter or by
> setting "check_interval=0". This tells the kernel to *not* start the MCE
> timer on a CPU.
> 
> If the user did not disable CMCI, then storms can occur. When these
> happen, the MCE timer will be started with a fixed interval. After the
> storm subsides, the timer's next interval is set to check_interval.

I think the subject of the patch doesn't do justice to the patch 
content. In fact, what this change does is ensure the timer function 
honors CE handling being disabled either via ignore_ce or check_interval 
being 0 when a CMCI storm subsides. So a subject along the lines of:

"Ensure user settings are considered when CMCI storm subsides" or 
something like that is more descriptive of what you are doing.

At the very least you are not restoring anything, because even without 
this patch when the storm subsided you'd start the timer with a value of 
'iv'.

> 
> This disregards the user's input through "ignore_ce" and
> "check_interval". Furthermore, if "check_interval=0", then the new timer
> will run faster than expected.
> 
> Create a new helper to check these conditions and use it when a CMCI
> storm ends.
> 
> Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: stable@vger.kernel.org
> ---
> 
> Notes:
>      Link:
>      https://lore.kernel.org/r/20250415-wip-mca-updates-v3-17-8ffd9eb4aa56@amd.com
>      
>      v3->v4:
>      * Update commit message.
>      * Move to beginning of set.
>      * Note: Polling vs thresholding use case updates not yet addressed.
>      
>      v2->v3:
>      * New in v3.
> 
>   arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 07d61937427f..ae2e2d8ec99b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1740,6 +1740,11 @@ static void mc_poll_banks_default(void)
>   
>   void (*mc_poll_banks)(void) = mc_poll_banks_default;
>   
> +static bool should_enable_timer(unsigned long iv)
> +{
> +	return !mca_cfg.ignore_ce && iv;
> +}
> +
>   static void mce_timer_fn(struct timer_list *t)
>   {
>   	struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
> @@ -1763,7 +1768,7 @@ static void mce_timer_fn(struct timer_list *t)
>   
>   	if (mce_get_storm_mode()) {
>   		__start_timer(t, HZ);
> -	} else {
> +	} else if (should_enable_timer(iv)) {
>   		__this_cpu_write(mce_next_interval, iv);
>   		__start_timer(t, iv);
>   	}
> @@ -2156,7 +2161,7 @@ static void mce_start_timer(struct timer_list *t)
>   {
>   	unsigned long iv = check_interval * HZ;
>   
> -	if (mca_cfg.ignore_ce || !iv)
> +	if (!should_enable_timer(iv))
>   		return;
>   
>   	this_cpu_write(mce_next_interval, iv);
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
  2025-06-24 14:16 ` [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() Yazen Ghannam
@ 2025-06-25 14:57   ` Nikolay Borisov
  0 siblings, 0 replies; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-25 14:57 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:16, Yazen Ghannam wrote:
> The return values are not checked, so set return type to 'void'.
> 
> Also, move function declarations to internal.h, since these functions are
> only used within the MCE subsystem.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>


Realistically the only case when this code can fail is if the machine is 
tight on memory and a cpu is brought on. So the wider question is should 
such a situation fail the hotplug event? Likely something else in the 
chain would have failed up earlier, so I guess the answer is no.

In any case LGTM:

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank
  2025-06-24 14:16 ` [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank Yazen Ghannam
@ 2025-06-25 16:52   ` Nikolay Borisov
  2025-06-27 11:14     ` Nikolay Borisov
  2025-08-25 13:59     ` Borislav Petkov
  0 siblings, 2 replies; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-25 16:52 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:16, Yazen Ghannam wrote:
> The threshold_bank structure is a container for one or more
> threshold_block structures. Currently, the container has a single
> pointer to the 'first' threshold_block structure which then has a linked
> list of the remaining threshold_block structures.
> 
> This results in an extra level of indirection where the 'first' block is
> checked before iterating over the remaining blocks.
> 
> Remove the indirection by including the head of the block list in the
> threshold_bank structure which already acts as a container for all the
> bank's thresholding blocks.
> 
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> 
> Notes:
>      Link:
>      https://lore.kernel.org/r/20250415-wip-mca-updates-v3-4-8ffd9eb4aa56@amd.com
>      
>      v3->v4:
>      * No change.
>      
>      v2->v3:
>      * Added tags from Qiuxu and Tony.
>      
>      v1->v2:
>      * New in v2.
> 
>   arch/x86/kernel/cpu/mce/amd.c | 43 ++++++++++++-------------------------------
>   1 file changed, 12 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 0ffbee329a8c..5d351ec863cd 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -241,7 +241,8 @@ struct threshold_block {
>   
>   struct threshold_bank {
>   	struct kobject		*kobj;
> -	struct threshold_block	*blocks;
> +	/* List of threshold blocks within this MCA bank. */
> +	struct list_head	miscj;
>   };
>   
>   static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
> @@ -900,9 +901,9 @@ static void log_and_reset_block(struct threshold_block *block)
>    */
>   static void amd_threshold_interrupt(void)
>   {
> -	struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
> -	struct threshold_bank **bp = this_cpu_read(threshold_banks);
> +	struct threshold_bank **bp = this_cpu_read(threshold_banks), *thr_bank;
>   	unsigned int bank, cpu = smp_processor_id();
> +	struct threshold_block *block, *tmp;
>   
>   	/*
>   	 * Validate that the threshold bank has been initialized already. The
> @@ -916,16 +917,11 @@ static void amd_threshold_interrupt(void)
>   		if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
>   			continue;

<slight off topic>

nit: I wonder if instead of using per_cpu and manual bit testing can't a direct
call to x86_this_cpu_test_bit be a better solution. The assembly looks like:

[OLD]

xorl    %r14d, %r14d    # ivtmp.245
movq    %rax, 8(%rsp)   # cpu, %sfp
# arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
movq    $bank_map, %rax #, __ptr
movq    %rax, (%rsp)    # __ptr, %sfp
.L236:
movq    8(%rsp), %rax   # %sfp, cpu
# arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
movq    (%rsp), %rsi    # %sfp, __ptr
# arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
movq    __per_cpu_offset(,%rax,8), %rax # __per_cpu_offset[cpu_23], __per_cpu_offset[cpu_23]
# arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
movq    (%rax,%rsi), %rax
# arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
btq %r14, %rax

[NEW]

xorl    %r15d, %r15d    # ivtmp.246
.L236:
# 917 "arch/x86/kernel/cpu/mce/amd.c" 1
btl %r15d, %gs:bank_map(%rip)   # ivtmp.246, *_9


That way you end up with a single btl (but I guess a version that uses btq should be added as well)
inside the loop rather than a bunch of instructions moving data around for per_cpu.

Alternatively, since this is running in interrupt context can't you use directly this_cpu_read(bank_map) and eliminate the smp_processor_id invocation?

</slight off topic>

>   

<snip>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early()
  2025-06-24 14:16 ` [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
@ 2025-06-26  8:03   ` Nikolay Borisov
  2025-06-30 12:58     ` Yazen Ghannam
  0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-26  8:03 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:16, Yazen Ghannam wrote:
> The __mcheck_cpu_init_early() function was introduced so that some
> vendor-specific features are detected before the first MCA polling event
> done in __mcheck_cpu_init_generic().
> 
> Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
> additional code will be needed to support various system configurations.
> 
> However, the current and future vendor-specific code should be done during
> vendor init. This keeps all the vendor code in a common location and
> simplifies the generic init flow.
> 
> Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
> 
> Also, move __mcheck_cpu_init_generic() after
> __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> MCA polling event.
> 
> Additionally, this brings the MCA init flow closer to what is described
> in the x86 docs.
> 
> The AMD PPRs say
>    "The operating system must initialize the MCA_CONFIG registers prior to
>    initialization of the MCA_CTL registers.
> 
>    The MCA_CTL registers must be initialized prior to enabling the error
>    reporting banks in MCG_CTL".
> 
> However, the Intel SDM "Machine-Check Initialization Pseudocode" says
> MCG_CTL first then MCi_CTL.
> 
> But both agree that CR4.MCE should be set last.
> 
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>


IMO the change which moves __mcheck_cpu_init_generic should be in a 
separate patch so that in the changelog it's abundantly clear that it's 
a "world switch" function and its invocation timing is important.

<snip>


In any case:

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks
  2025-06-24 14:16 ` [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
@ 2025-06-27 11:02   ` Nikolay Borisov
  2025-06-30 13:00     ` Yazen Ghannam
  0 siblings, 1 reply; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-27 11:02 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/24/25 17:16, Yazen Ghannam wrote:
> Many quirks are global configuration settings and a handful apply to
> each CPU.
> 
> Move the per-CPU quirks to vendor init to execute them on each online
> CPU. Set the global quirks during BSP-only init so they're only executed
> once and early.
> 
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>


After this patch the code ends up with 2 sets of quirk functions with 
very confusing names:

amd_apply_quirks/apply_quirks_amd
intel_apply_quirks/apply_quirks_intel


Better naming convention would be something along the lines of:

amd_apply_global_quirks/amd_apply_cpu_quirks and the same for intel.


> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank
  2025-06-25 16:52   ` Nikolay Borisov
@ 2025-06-27 11:14     ` Nikolay Borisov
  2025-06-30 12:57       ` Yazen Ghannam
  2025-08-25 13:59     ` Borislav Petkov
  1 sibling, 1 reply; 39+ messages in thread
From: Nikolay Borisov @ 2025-06-27 11:14 UTC (permalink / raw)
  To: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown
  Cc: linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi



On 6/25/25 19:52, Nikolay Borisov wrote:
> 
> 
> On 6/24/25 17:16, Yazen Ghannam wrote:
>> The threshold_bank structure is a container for one or more
>> threshold_block structures. Currently, the container has a single
>> pointer to the 'first' threshold_block structure which then has a linked
>> list of the remaining threshold_block structures.
>>
>> This results in an extra level of indirection where the 'first' block is
>> checked before iterating over the remaining blocks.
>>
>> Remove the indirection by including the head of the block list in the
>> threshold_bank structure which already acts as a container for all the
>> bank's thresholding blocks.
>>
>> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Tested-by: Tony Luck <tony.luck@intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> ---
>>
>> Notes:
>>      Link:
>>      https://lore.kernel.org/r/20250415-wip-mca-updates- 
>> v3-4-8ffd9eb4aa56@amd.com
>>      v3->v4:
>>      * No change.
>>      v2->v3:
>>      * Added tags from Qiuxu and Tony.
>>      v1->v2:
>>      * New in v2.
>>
>>   arch/x86/kernel/cpu/mce/amd.c | 43 +++++++++++ 
>> +-------------------------------
>>   1 file changed, 12 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/ 
>> amd.c
>> index 0ffbee329a8c..5d351ec863cd 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -241,7 +241,8 @@ struct threshold_block {
>>   struct threshold_bank {
>>       struct kobject        *kobj;
>> -    struct threshold_block    *blocks;
>> +    /* List of threshold blocks within this MCA bank. */
>> +    struct list_head    miscj;
>>   };
>>   static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
>> @@ -900,9 +901,9 @@ static void log_and_reset_block(struct 
>> threshold_block *block)
>>    */
>>   static void amd_threshold_interrupt(void)
>>   {
>> -    struct threshold_block *first_block = NULL, *block = NULL, *tmp = 
>> NULL;
>> -    struct threshold_bank **bp = this_cpu_read(threshold_banks);
>> +    struct threshold_bank **bp = this_cpu_read(threshold_banks), 
>> *thr_bank;
>>       unsigned int bank, cpu = smp_processor_id();
>> +    struct threshold_block *block, *tmp;
>>       /*
>>        * Validate that the threshold bank has been initialized 
>> already. The
>> @@ -916,16 +917,11 @@ static void amd_threshold_interrupt(void)
>>           if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
>>               continue;
> 
> <slight off topic>
> 
> nit: I wonder if instead of using per_cpu and manual bit testing can't a 
> direct
> call to x86_this_cpu_test_bit be a better solution. The assembly looks 
> like:
> 
> [OLD]
> 
> xorl    %r14d, %r14d    # ivtmp.245
> movq    %rax, 8(%rsp)   # cpu, %sfp
> # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) 
> & BIT_ULL(bank)))
> movq    $bank_map, %rax #, __ptr
> movq    %rax, (%rsp)    # __ptr, %sfp
> .L236:
> movq    8(%rsp), %rax   # %sfp, cpu
> # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) 
> & BIT_ULL(bank)))
> movq    (%rsp), %rsi    # %sfp, __ptr
> # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) 
> & BIT_ULL(bank)))
> movq    __per_cpu_offset(,%rax,8), %rax # __per_cpu_offset[cpu_23], 
> __per_cpu_offset[cpu_23]
> # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) 
> & BIT_ULL(bank)))
> movq    (%rax,%rsi), %rax
> # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu) 
> & BIT_ULL(bank)))
> btq %r14, %rax
> 
> [NEW]
> 
> xorl    %r15d, %r15d    # ivtmp.246
> .L236:
> # 917 "arch/x86/kernel/cpu/mce/amd.c" 1
> btl %r15d, %gs:bank_map(%rip)   # ivtmp.246, *_9
> 
> 
> That way you end up with a single btl (but I guess a version that uses 
> btq should be added as well)
> inside the loop rather than a bunch of instructions moving data around 
> for per_cpu.
> 
> Alternatively, since this is running in interrupt context can't you use 
> directly this_cpu_read(bank_map) and eliminate the smp_processor_id 
> invocation?


Actually the total number of banks are at most 128 as per the layout of 
MCG_CAP register, so using btl is fine. Also I'm not sure why the 
original code uses BIT_ULL vs just BIT since we can't have a 64bit value.


> 
> </slight off topic>
> 
> 
> <snip>
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [tip: ras/urgent] x86/mce/amd: Fix threshold limit reset
  2025-06-24 14:15 ` [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset Yazen Ghannam
@ 2025-06-28 11:33   ` tip-bot2 for Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2025-06-28 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov (AMD), stable, x86, linux-kernel

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     5f6e3b720694ad771911f637a51930f511427ce1
Gitweb:        https://git.kernel.org/tip/5f6e3b720694ad771911f637a51930f511427ce1
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Tue, 24 Jun 2025 14:15:59 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Fri, 27 Jun 2025 13:16:23 +02:00

x86/mce/amd: Fix threshold limit reset

The MCA threshold limit must be reset after servicing the interrupt.

Currently, the restart function doesn't have an explicit check for this.  It
makes some assumptions based on the current limit and what's in the registers.
These assumptions don't always hold, so the limit won't be reset in some
cases.

Make the reset condition explicit. Either an interrupt/overflow has occurred
or the bank is being initialized.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-4-236dd74f645f@amd.com
---
 arch/x86/kernel/cpu/mce/amd.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6820ebc..5c4eb28 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -350,7 +350,6 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 
 struct thresh_restart {
 	struct threshold_block	*b;
-	int			reset;
 	int			set_lvt_off;
 	int			lvt_off;
 	u16			old_limit;
@@ -432,13 +431,13 @@ static void threshold_restart_bank(void *_tr)
 
 	rdmsr(tr->b->address, lo, hi);
 
-	if (tr->b->threshold_limit < (hi & THRESHOLD_MAX))
-		tr->reset = 1;	/* limit cannot be lower than err count */
-
-	if (tr->reset) {		/* reset err count and overflow bit */
-		hi =
-		    (hi & ~(MASK_ERR_COUNT_HI | MASK_OVERFLOW_HI)) |
-		    (THRESHOLD_MAX - tr->b->threshold_limit);
+	/*
+	 * Reset error count and overflow bit.
+	 * This is done during init or after handling an interrupt.
+	 */
+	if (hi & MASK_OVERFLOW_HI || tr->set_lvt_off) {
+		hi &= ~(MASK_ERR_COUNT_HI | MASK_OVERFLOW_HI);
+		hi |= THRESHOLD_MAX - tr->b->threshold_limit;
 	} else if (tr->old_limit) {	/* change limit w/o reset */
 		int new_count = (hi & THRESHOLD_MAX) +
 		    (tr->old_limit - tr->b->threshold_limit);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip: ras/urgent] x86/mce/amd: Add default names for MCA banks and blocks
  2025-06-24 14:15 ` [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks Yazen Ghannam
@ 2025-06-28 11:33   ` tip-bot2 for Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2025-06-28 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov (AMD), stable, x86, linux-kernel

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     d66e1e90b16055d2f0ee76e5384e3f119c3c2773
Gitweb:        https://git.kernel.org/tip/d66e1e90b16055d2f0ee76e5384e3f119c3c2773
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Tue, 24 Jun 2025 14:15:58 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Fri, 27 Jun 2025 13:13:36 +02:00

x86/mce/amd: Add default names for MCA banks and blocks

Ensure that sysfs init doesn't fail for new/unrecognized bank types or if
a bank has additional blocks available.

Most MCA banks have a single thresholding block, so the block takes the same
name as the bank.

Unified Memory Controllers (UMCs) are a special case where there are two
blocks and each has a unique name.

However, the microarchitecture allows for five blocks. Any new MCA bank types
with more than one block will be missing names for the extra blocks. The MCE
sysfs will fail to initialize in this case.

Fixes: 87a6d4091bd7 ("x86/mce/AMD: Update sysfs bank names for SMCA systems")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-3-236dd74f645f@amd.com
---
 arch/x86/kernel/cpu/mce/amd.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9d852c3..6820ebc 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1113,13 +1113,20 @@ static const char *get_name(unsigned int cpu, unsigned int bank, struct threshol
 	}
 
 	bank_type = smca_get_bank_type(cpu, bank);
-	if (bank_type >= N_SMCA_BANK_TYPES)
-		return NULL;
 
 	if (b && (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2)) {
 		if (b->block < ARRAY_SIZE(smca_umc_block_names))
 			return smca_umc_block_names[b->block];
-		return NULL;
+	}
+
+	if (b && b->block) {
+		snprintf(buf_mcatype, MAX_MCATYPE_NAME_LEN, "th_block_%u", b->block);
+		return buf_mcatype;
+	}
+
+	if (bank_type >= N_SMCA_BANK_TYPES) {
+		snprintf(buf_mcatype, MAX_MCATYPE_NAME_LEN, "th_bank_%u", bank);
+		return buf_mcatype;
 	}
 
 	if (per_cpu(smca_bank_counts, cpu)[bank_type] == 1)

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip: ras/urgent] x86/mce: Ensure user polling settings are honored when restarting timer
  2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
  2025-06-25 13:22   ` Nikolay Borisov
@ 2025-06-28 11:33   ` tip-bot2 for Yazen Ghannam
  1 sibling, 0 replies; 39+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2025-06-28 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov (AMD), stable, x86, linux-kernel

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     00c092de6f28ebd32208aef83b02d61af2229b60
Gitweb:        https://git.kernel.org/tip/00c092de6f28ebd32208aef83b02d61af2229b60
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Tue, 24 Jun 2025 14:15:57 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Fri, 27 Jun 2025 12:41:44 +02:00

x86/mce: Ensure user polling settings are honored when restarting timer

Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.

If the user did not disable CMCI, then storms can occur. When these
happen, the MCE timer will be started with a fixed interval. After the
storm subsides, the timer's next interval is set to check_interval.

This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.

Create a new helper to check these conditions and use it when a CMCI
storm ends.

  [ bp: Massage. ]

Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-2-236dd74f645f@amd.com
---
 arch/x86/kernel/cpu/mce/core.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 07d6193..4da4eab 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1740,6 +1740,11 @@ static void mc_poll_banks_default(void)
 
 void (*mc_poll_banks)(void) = mc_poll_banks_default;
 
+static bool should_enable_timer(unsigned long iv)
+{
+	return !mca_cfg.ignore_ce && iv;
+}
+
 static void mce_timer_fn(struct timer_list *t)
 {
 	struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
@@ -1763,7 +1768,7 @@ static void mce_timer_fn(struct timer_list *t)
 
 	if (mce_get_storm_mode()) {
 		__start_timer(t, HZ);
-	} else {
+	} else if (should_enable_timer(iv)) {
 		__this_cpu_write(mce_next_interval, iv);
 		__start_timer(t, iv);
 	}
@@ -2156,11 +2161,10 @@ static void mce_start_timer(struct timer_list *t)
 {
 	unsigned long iv = check_interval * HZ;
 
-	if (mca_cfg.ignore_ce || !iv)
-		return;
-
-	this_cpu_write(mce_next_interval, iv);
-	__start_timer(t, iv);
+	if (should_enable_timer(iv)) {
+		this_cpu_write(mce_next_interval, iv);
+		__start_timer(t, iv);
+	}
 }
 
 static void __mcheck_cpu_setup_timer(void)

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip: ras/urgent] x86/mce: Don't remove sysfs if thresholding sysfs init fails
  2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
@ 2025-06-28 11:33   ` tip-bot2 for Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot2 for Yazen Ghannam @ 2025-06-28 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yazen Ghannam, Borislav Petkov (AMD), Qiuxu Zhuo, Tony Luck,
	stable, x86, linux-kernel

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID:     4c113a5b28bfd589e2010b5fc8867578b0135ed7
Gitweb:        https://git.kernel.org/tip/4c113a5b28bfd589e2010b5fc8867578b0135ed7
Author:        Yazen Ghannam <yazen.ghannam@amd.com>
AuthorDate:    Tue, 24 Jun 2025 14:15:56 
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Thu, 26 Jun 2025 17:28:13 +02:00

x86/mce: Don't remove sysfs if thresholding sysfs init fails

Currently, the MCE subsystem sysfs interface will be removed if the
thresholding sysfs interface fails to be created. A common failure is due to
new MCA bank types that are not recognized and don't have a short name set.

The MCA thresholding feature is optional and should not break the common MCE
sysfs interface. Also, new MCA bank types are occasionally introduced, and
updates will be needed to recognize them. But likewise, this should not break
the common sysfs interface.

Keep the MCE sysfs interface regardless of the status of the thresholding
sysfs interface.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-1-236dd74f645f@amd.com
---
 arch/x86/kernel/cpu/mce/core.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e9b3c5d..07d6193 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2801,15 +2801,9 @@ static int mce_cpu_dead(unsigned int cpu)
 static int mce_cpu_online(unsigned int cpu)
 {
 	struct timer_list *t = this_cpu_ptr(&mce_timer);
-	int ret;
 
 	mce_device_create(cpu);
-
-	ret = mce_threshold_create_device(cpu);
-	if (ret) {
-		mce_device_remove(cpu);
-		return ret;
-	}
+	mce_threshold_create_device(cpu);
 	mce_reenable_cpu();
 	mce_start_timer(t);
 	return 0;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank
  2025-06-27 11:14     ` Nikolay Borisov
@ 2025-06-30 12:57       ` Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-30 12:57 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: x86, Tony Luck, Rafael J. Wysocki, Len Brown, linux-kernel,
	linux-edac, Smita.KoralahalliChannabasappa, Qiuxu Zhuo,
	linux-acpi

On Fri, Jun 27, 2025 at 02:14:40PM +0300, Nikolay Borisov wrote:
> 
> 
> On 6/25/25 19:52, Nikolay Borisov wrote:
> > 
> > 
> > On 6/24/25 17:16, Yazen Ghannam wrote:
> > > The threshold_bank structure is a container for one or more
> > > threshold_block structures. Currently, the container has a single
> > > pointer to the 'first' threshold_block structure which then has a linked
> > > list of the remaining threshold_block structures.
> > > 
> > > This results in an extra level of indirection where the 'first' block is
> > > checked before iterating over the remaining blocks.
> > > 
> > > Remove the indirection by including the head of the block list in the
> > > threshold_bank structure which already acts as a container for all the
> > > bank's thresholding blocks.
> > > 
> > > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > Tested-by: Tony Luck <tony.luck@intel.com>
> > > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > > ---
> > > 
> > > Notes:
> > >      Link:
> > >      https://lore.kernel.org/r/20250415-wip-mca-updates-
> > > v3-4-8ffd9eb4aa56@amd.com
> > >      v3->v4:
> > >      * No change.
> > >      v2->v3:
> > >      * Added tags from Qiuxu and Tony.
> > >      v1->v2:
> > >      * New in v2.
> > > 
> > >   arch/x86/kernel/cpu/mce/amd.c | 43 +++++++++++
> > > +-------------------------------
> > >   1 file changed, 12 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/mce/amd.c
> > > b/arch/x86/kernel/cpu/mce/ amd.c
> > > index 0ffbee329a8c..5d351ec863cd 100644
> > > --- a/arch/x86/kernel/cpu/mce/amd.c
> > > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > > @@ -241,7 +241,8 @@ struct threshold_block {
> > >   struct threshold_bank {
> > >       struct kobject        *kobj;
> > > -    struct threshold_block    *blocks;
> > > +    /* List of threshold blocks within this MCA bank. */
> > > +    struct list_head    miscj;
> > >   };
> > >   static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
> > > @@ -900,9 +901,9 @@ static void log_and_reset_block(struct
> > > threshold_block *block)
> > >    */
> > >   static void amd_threshold_interrupt(void)
> > >   {
> > > -    struct threshold_block *first_block = NULL, *block = NULL, *tmp
> > > = NULL;
> > > -    struct threshold_bank **bp = this_cpu_read(threshold_banks);
> > > +    struct threshold_bank **bp = this_cpu_read(threshold_banks),
> > > *thr_bank;
> > >       unsigned int bank, cpu = smp_processor_id();
> > > +    struct threshold_block *block, *tmp;
> > >       /*
> > >        * Validate that the threshold bank has been initialized
> > > already. The
> > > @@ -916,16 +917,11 @@ static void amd_threshold_interrupt(void)
> > >           if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
> > >               continue;
> > 
> > <slight off topic>
> > 
> > nit: I wonder if instead of using per_cpu and manual bit testing can't a
> > direct
> > call to x86_this_cpu_test_bit be a better solution. The assembly looks
> > like:
> > 
> > [OLD]
> > 
> > xorl    %r14d, %r14d    # ivtmp.245
> > movq    %rax, 8(%rsp)   # cpu, %sfp
> > # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu)
> > & BIT_ULL(bank)))
> > movq    $bank_map, %rax #, __ptr
> > movq    %rax, (%rsp)    # __ptr, %sfp
> > .L236:
> > movq    8(%rsp), %rax   # %sfp, cpu
> > # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu)
> > & BIT_ULL(bank)))
> > movq    (%rsp), %rsi    # %sfp, __ptr
> > # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu)
> > & BIT_ULL(bank)))
> > movq    __per_cpu_offset(,%rax,8), %rax # __per_cpu_offset[cpu_23],
> > __per_cpu_offset[cpu_23]
> > # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu)
> > & BIT_ULL(bank)))
> > movq    (%rax,%rsi), %rax
> > # arch/x86/kernel/cpu/mce/amd.c:917:        if (!(per_cpu(bank_map, cpu)
> > & BIT_ULL(bank)))
> > btq %r14, %rax
> > 
> > [NEW]
> > 
> > xorl    %r15d, %r15d    # ivtmp.246
> > .L236:
> > # 917 "arch/x86/kernel/cpu/mce/amd.c" 1
> > btl %r15d, %gs:bank_map(%rip)   # ivtmp.246, *_9
> > 
> > 
> > That way you end up with a single btl (but I guess a version that uses
> > btq should be added as well)
> > inside the loop rather than a bunch of instructions moving data around
> > for per_cpu.
> > 
> > Alternatively, since this is running in interrupt context can't you use
> > directly this_cpu_read(bank_map) and eliminate the smp_processor_id
> > invocation?
> 
> 
> Actually the total number of banks are at most 128 as per the layout of
> MCG_CAP register, so using btl is fine. Also I'm not sure why the original
> code uses BIT_ULL vs just BIT since we can't have a 64bit value.
> 
> 

Hi Nikolay,

MCG_CAP[Count] is an 8-bit field, so we can (potentially) have up to 255
MCA banks.

"bank_map" is a bitmask, and current systems can have up to 64 MCA banks.
That is why BIT_ULL is needed.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early()
  2025-06-26  8:03   ` Nikolay Borisov
@ 2025-06-30 12:58     ` Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-30 12:58 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: x86, Tony Luck, Rafael J. Wysocki, Len Brown, linux-kernel,
	linux-edac, Smita.KoralahalliChannabasappa, Qiuxu Zhuo,
	linux-acpi

On Thu, Jun 26, 2025 at 11:03:20AM +0300, Nikolay Borisov wrote:
> 
> 
> On 6/24/25 17:16, Yazen Ghannam wrote:
> > The __mcheck_cpu_init_early() function was introduced so that some
> > vendor-specific features are detected before the first MCA polling event
> > done in __mcheck_cpu_init_generic().
> > 
> > Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
> > additional code will be needed to support various system configurations.
> > 
> > However, the current and future vendor-specific code should be done during
> > vendor init. This keeps all the vendor code in a common location and
> > simplifies the generic init flow.
> > 
> > Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
> > 
> > Also, move __mcheck_cpu_init_generic() after
> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> > MCA polling event.
> > 
> > Additionally, this brings the MCA init flow closer to what is described
> > in the x86 docs.
> > 
> > The AMD PPRs say
> >    "The operating system must initialize the MCA_CONFIG registers prior to
> >    initialization of the MCA_CTL registers.
> > 
> >    The MCA_CTL registers must be initialized prior to enabling the error
> >    reporting banks in MCG_CTL".
> > 
> > However, the Intel SDM "Machine-Check Initialization Pseudocode" says
> > MCG_CTL first then MCi_CTL.
> > 
> > But both agree that CR4.MCE should be set last.
> > 
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> 
> IMO the change which moves __mcheck_cpu_init_generic should be in a separate
> patch so that in the changelog it's abundantly clear that it's a "world
> switch" function and its invocation timing is important.
> 
> <snip>
> 
> 
> In any case:
> 
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

Thanks for the review. I can split this in the next revision if needed.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks
  2025-06-27 11:02   ` Nikolay Borisov
@ 2025-06-30 13:00     ` Yazen Ghannam
  0 siblings, 0 replies; 39+ messages in thread
From: Yazen Ghannam @ 2025-06-30 13:00 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: x86, Tony Luck, Rafael J. Wysocki, Len Brown, linux-kernel,
	linux-edac, Smita.KoralahalliChannabasappa, Qiuxu Zhuo,
	linux-acpi

On Fri, Jun 27, 2025 at 02:02:22PM +0300, Nikolay Borisov wrote:
> 
> 
> On 6/24/25 17:16, Yazen Ghannam wrote:
> > Many quirks are global configuration settings and a handful apply to
> > each CPU.
> > 
> > Move the per-CPU quirks to vendor init to execute them on each online
> > CPU. Set the global quirks during BSP-only init so they're only executed
> > once and early.
> > 
> > Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> 
> After this patch the code ends up with 2 sets of quirk functions with very
> confusing names:
> 
> amd_apply_quirks/apply_quirks_amd
> intel_apply_quirks/apply_quirks_intel
> 
> 
> Better naming convention would be something along the lines of:
> 
> amd_apply_global_quirks/amd_apply_cpu_quirks and the same for intel.
> 

Yes, good point. I'll make this change.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank
  2025-06-25 16:52   ` Nikolay Borisov
  2025-06-27 11:14     ` Nikolay Borisov
@ 2025-08-25 13:59     ` Borislav Petkov
  1 sibling, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2025-08-25 13:59 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Yazen Ghannam, x86, Tony Luck, Rafael J. Wysocki, Len Brown,
	linux-kernel, linux-edac, Smita.KoralahalliChannabasappa,
	Qiuxu Zhuo, linux-acpi

On Wed, Jun 25, 2025 at 07:52:26PM +0300, Nikolay Borisov wrote:
> That way you end up with a single btl (but I guess a version that uses btq
> should be added as well) inside the loop rather than a bunch of instructions
> moving data around for per_cpu.

There's also this_cpu_ptr() etc.

You know how I always take patches, right?

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-08-25 13:59 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 14:15 [PATCH v4 00/22] AMD MCA interrupts rework Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 01/22] x86/mce: Don't remove sysfs if thresholding sysfs init fails Yazen Ghannam
2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides Yazen Ghannam
2025-06-25 13:22   ` Nikolay Borisov
2025-06-28 11:33   ` [tip: ras/urgent] x86/mce: Ensure user polling settings are honored when restarting timer tip-bot2 for Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 03/22] x86/mce/amd: Add default names for MCA banks and blocks Yazen Ghannam
2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
2025-06-24 14:15 ` [PATCH v4 04/22] x86/mce/amd: Fix threshold limit reset Yazen Ghannam
2025-06-28 11:33   ` [tip: ras/urgent] " tip-bot2 for Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 05/22] x86/mce/amd: Rename threshold restart function Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 06/22] x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device() Yazen Ghannam
2025-06-25 14:57   ` Nikolay Borisov
2025-06-24 14:16 ` [PATCH v4 07/22] x86/mce/amd: Remove smca_banks_map Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 08/22] x86/mce/amd: Put list_head in threshold_bank Yazen Ghannam
2025-06-25 16:52   ` Nikolay Borisov
2025-06-27 11:14     ` Nikolay Borisov
2025-06-30 12:57       ` Yazen Ghannam
2025-08-25 13:59     ` Borislav Petkov
2025-06-24 14:16 ` [PATCH v4 09/22] x86/mce: Cleanup bank processing on init Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 10/22] x86/mce: Remove __mcheck_cpu_init_early() Yazen Ghannam
2025-06-26  8:03   ` Nikolay Borisov
2025-06-30 12:58     ` Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 11/22] x86/mce: Define BSP-only init Yazen Ghannam
2025-06-25 11:04   ` Nikolay Borisov
2025-06-25 11:26     ` Borislav Petkov
2025-06-24 14:16 ` [PATCH v4 12/22] x86/mce: Define BSP-only SMCA init Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 13/22] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 14/22] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
2025-06-27 11:02   ` Nikolay Borisov
2025-06-30 13:00     ` Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 15/22] x86/mce: Move machine_check_poll() status checks to helper functions Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 16/22] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 17/22] x86/mce: Unify AMD DFR " Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 18/22] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 19/22] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 20/22] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 21/22] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
2025-06-24 14:16 ` [PATCH v4 22/22] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).