public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Handle Ice Lake MONITOR erratum
@ 2025-04-21 19:22 Dave Hansen
  2025-04-21 19:32 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dave Hansen @ 2025-04-21 19:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, andrew.cooper3, Dave Hansen, Len Brown, Peter Zijlstra,
	Rafael J. Wysocki, Srinivas Pandruvada, stable


From: Dave Hansen <dave.hansen@linux.intel.com>

Andrew Cooper reported some boot issues on Ice Lake servers when
running Xen that he tracked down to MWAIT not waking up. Do the safe
thing and consider them buggy since there's a published erratum.
Note: I've seen no reports of this occurring on Linux.

Add Ice Lake servers to the list of shaky MONITOR implementations with
no workaround available. Also, before the if() gets too unwieldy, move
it over to a x86_cpu_id array. Additionally, add a comment to the
X86_BUG_MONITOR consumption site to make it clear how and why affected
CPUs get IPIs to wake them up.

There is no equivalent erratum for the "Xeon D" Ice Lakes so
INTEL_ICELAKE_D is not affected.

The erratum is called ICX143 in the "3rd Gen Intel Xeon Scalable
Processors, Codename Ice Lake Specification Update". It is Intel
document 637780, currently available here:

	https://cdrdv2.intel.com/v1/dl/getContent/637780

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org

---

 b/arch/x86/include/asm/mwait.h |    3 +++
 b/arch/x86/kernel/cpu/intel.c  |   17 ++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff -puN arch/x86/kernel/cpu/intel.c~ICX-MONITOR-bug arch/x86/kernel/cpu/intel.c
--- a/arch/x86/kernel/cpu/intel.c~ICX-MONITOR-bug	2025-04-18 13:54:46.022590596 -0700
+++ b/arch/x86/kernel/cpu/intel.c	2025-04-18 15:15:19.374365069 -0700
@@ -513,6 +513,19 @@ static void init_intel_misc_features(str
 }
 
 /*
+ * These CPUs have buggy MWAIT/MONITOR implementations that
+ * usually manifest as hangs or stalls at boot.
+ */
+#define MWAIT_VFM(_vfm)	\
+	X86_MATCH_VFM_FEATURE(_vfm, X86_FEATURE_MWAIT, 0)
+static const struct x86_cpu_id monitor_bug_list[] = {
+	MWAIT_VFM(INTEL_ATOM_GOLDMONT),
+	MWAIT_VFM(INTEL_LUNARLAKE_M),
+	MWAIT_VFM(INTEL_ICELAKE_X),	/* Erratum ICX143 */
+	{},
+};
+
+/*
  * This is a list of Intel CPUs that are known to suffer from downclocking when
  * ZMM registers (512-bit vectors) are used.  On these CPUs, when the kernel
  * executes SIMD-optimized code such as cryptography functions or CRCs, it
@@ -565,9 +578,7 @@ static void init_intel(struct cpuinfo_x8
 	     c->x86_vfm == INTEL_WESTMERE_EX))
 		set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR);
 
-	if (boot_cpu_has(X86_FEATURE_MWAIT) &&
-	    (c->x86_vfm == INTEL_ATOM_GOLDMONT ||
-	     c->x86_vfm == INTEL_LUNARLAKE_M))
+	if (x86_match_cpu(monitor_bug_list))
 		set_cpu_bug(c, X86_BUG_MONITOR);
 
 #ifdef CONFIG_X86_64
diff -puN arch/x86/include/asm/mwait.h~ICX-MONITOR-bug arch/x86/include/asm/mwait.h
--- a/arch/x86/include/asm/mwait.h~ICX-MONITOR-bug	2025-04-18 15:17:18.353749634 -0700
+++ b/arch/x86/include/asm/mwait.h	2025-04-18 15:20:06.037927656 -0700
@@ -110,6 +110,9 @@ static __always_inline void __sti_mwait(
  * through MWAIT. Whenever someone changes need_resched, we would be woken
  * up from MWAIT (without an IPI).
  *
+ * Buggy (X86_BUG_MONITOR) CPUs will never set the polling bit and will
+ * always be sent IPIs.
+ *
  * New with Core Duo processors, MWAIT can take some hints based on CPU
  * capability.
  */
_

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [PATCH] Handle Ice Lake MONITOR erratum
@ 2025-04-25  3:08 Christian Ludloff
  2025-05-01 20:33 ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Ludloff @ 2025-04-25  3:08 UTC (permalink / raw)
  To: Dave Hansen
  Cc: x86, andrew.cooper3, Len Brown, Peter Zijlstra, Rafael J. Wysocki,
	Srinivas Pandruvada, stable

> [ICX143 in https://cdrdv2.intel.com/v1/dl/getContent/637780]

> There is no equivalent erratum for the "Xeon D" Ice Lakes so
> INTEL_ICELAKE_D is not affected.

There is ICXD80 in...

https://www.intel.com/content/www/us/en/content-details/714069/intel-xeon-d-1700-and-d-1800-processor-family-specification-update.html
https://www.intel.com/content/www/us/en/content-details/714071/intel-xeon-d-2700-and-d-2800-processor-family-specification-update.html

And although the ICL spec update...

https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/ice-lake-ultra-mobile-u/10th-generation-core-processor-specification-update/errata-details/

...doesn't seem to have a MONITOR erratum, it might be a good
idea for Intel to double check.

--
Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-01 20:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 19:22 [PATCH] Handle Ice Lake MONITOR erratum Dave Hansen
2025-04-21 19:32 ` Andrew Cooper
2025-04-21 23:02 ` srinivas pandruvada
2025-04-22  6:46 ` Ingo Molnar
2025-04-22 14:18   ` Dave Hansen
2025-04-22 19:35     ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2025-04-25  3:08 Christian Ludloff
2025-05-01 20:33 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox