* [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
@ 2025-04-23 11:32 Roger Pau Monne
2025-04-23 13:05 ` Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Roger Pau Monne @ 2025-04-23 11:32 UTC (permalink / raw)
To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper
There are several errata on Intel regarding the usage of the MONITOR/MWAIT
instructions, all having in common that stores to the monitored region
might not wake up the CPU.
Fix them by forcing the sending of an IPI for the affected models.
The Ice Lake issue has been reproduced internally on XenServer hardware,
and the fix does seem to prevent it. The symptom was APs getting stuck in
the idle loop immediately after bring up, which in turn prevented the BSP
from making progress. This would happen before the watchdog was
initialized, and hence the whole system would get stuck.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Apollo and Lunar Lake fixes have not been tested, due to lack of hardware.
---
Changes since v1:
- Only probe for the errata at boot.
- Use Intel model names instead of raw values.
- Make force_mwait_ipi_wakeup __ro_after_init.
---
xen/arch/x86/acpi/cpu_idle.c | 6 ++++++
xen/arch/x86/cpu/intel.c | 36 +++++++++++++++++++++++++++++++-
xen/arch/x86/include/asm/mwait.h | 3 +++
3 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 420198406def..1dbf15b01ed7 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -441,8 +441,14 @@ void cpuidle_wakeup_mwait(cpumask_t *mask)
cpumask_andnot(mask, mask, &target);
}
+/* Force sending of a wakeup IPI regardless of mwait usage. */
+bool __ro_after_init force_mwait_ipi_wakeup;
+
bool arch_skip_send_event_check(unsigned int cpu)
{
+ if ( force_mwait_ipi_wakeup )
+ return false;
+
/*
* This relies on softirq_pending() and mwait_wakeup() to access data
* on the same cache line.
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 6a680ba38dc9..6c8377d08428 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -8,6 +8,7 @@
#include <asm/intel-family.h>
#include <asm/processor.h>
#include <asm/msr.h>
+#include <asm/mwait.h>
#include <asm/uaccess.h>
#include <asm/mpspec.h>
#include <asm/apic.h>
@@ -368,7 +369,6 @@ static void probe_c3_errata(const struct cpuinfo_x86 *c)
INTEL_FAM6_MODEL(0x25),
{ }
};
-#undef INTEL_FAM6_MODEL
/* Serialized by the AP bringup code. */
if ( max_cstate > 1 && (c->apicid & (c->x86_num_siblings - 1)) &&
@@ -380,6 +380,38 @@ static void probe_c3_errata(const struct cpuinfo_x86 *c)
}
}
+/*
+ * APL30: One use of the MONITOR/MWAIT instruction pair is to allow a logical
+ * processor to wait in a sleep state until a store to the armed address range
+ * occurs. Due to this erratum, stores to the armed address range may not
+ * trigger MWAIT to resume execution.
+ *
+ * ICX143: Under complex microarchitectural conditions, a monitor that is armed
+ * with the MWAIT instruction may not be triggered, leading to a processor
+ * hang.
+ *
+ * LNL030: Problem P-cores may not exit power state Core C6 on monitor hit.
+ *
+ * Force the sending of an IPI in those cases.
+ */
+static void __init probe_mwait_errata(void)
+{
+ static const struct x86_cpu_id models[] = {
+ INTEL_FAM6_MODEL(INTEL_FAM6_ATOM_GOLDMONT), /* APL30 */
+ INTEL_FAM6_MODEL(INTEL_FAM6_ICELAKE_X), /* ICX143 */
+ INTEL_FAM6_MODEL(INTEL_FAM6_LUNARLAKE_M), /* LNL030 */
+ { }
+ };
+#undef INTEL_FAM6_MODEL
+
+ if ( boot_cpu_has(X86_FEATURE_MONITOR) && x86_match_cpu(models) )
+ {
+ printk(XENLOG_WARNING
+ "Forcing IPI MWAIT wakeup due to CPU erratum\n");
+ force_mwait_ipi_wakeup = true;
+ }
+}
+
/*
* P4 Xeon errata 037 workaround.
* Hardware prefetcher may cause stale data to be loaded into the cache.
@@ -406,6 +438,8 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
__set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
probe_c3_errata(c);
+ if (system_state < SYS_STATE_active && c == &boot_cpu_data)
+ probe_mwait_errata();
}
diff --git a/xen/arch/x86/include/asm/mwait.h b/xen/arch/x86/include/asm/mwait.h
index 000a692f6d19..c52cd3f51011 100644
--- a/xen/arch/x86/include/asm/mwait.h
+++ b/xen/arch/x86/include/asm/mwait.h
@@ -13,6 +13,9 @@
#define MWAIT_ECX_INTERRUPT_BREAK 0x1
+/* Force sending of a wakeup IPI regardless of mwait usage. */
+extern bool force_mwait_ipi_wakeup;
+
void mwait_idle_with_hints(unsigned int eax, unsigned int ecx);
#ifdef CONFIG_INTEL
bool mwait_pc10_supported(void);
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-23 11:32 [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata Roger Pau Monne
@ 2025-04-23 13:05 ` Jan Beulich
2025-04-23 13:13 ` Andrew Cooper
2025-04-25 12:36 ` Alejandro Vallejo
2 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2025-04-23 13:05 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel
On 23.04.2025 13:32, Roger Pau Monne wrote:
> There are several errata on Intel regarding the usage of the MONITOR/MWAIT
> instructions, all having in common that stores to the monitored region
> might not wake up the CPU.
>
> Fix them by forcing the sending of an IPI for the affected models.
>
> The Ice Lake issue has been reproduced internally on XenServer hardware,
> and the fix does seem to prevent it. The symptom was APs getting stuck in
> the idle loop immediately after bring up, which in turn prevented the BSP
> from making progress. This would happen before the watchdog was
> initialized, and hence the whole system would get stuck.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
with a nit and an entirely optional suggestion:
> @@ -380,6 +380,38 @@ static void probe_c3_errata(const struct cpuinfo_x86 *c)
> }
> }
>
> +/*
> + * APL30: One use of the MONITOR/MWAIT instruction pair is to allow a logical
> + * processor to wait in a sleep state until a store to the armed address range
> + * occurs. Due to this erratum, stores to the armed address range may not
> + * trigger MWAIT to resume execution.
> + *
> + * ICX143: Under complex microarchitectural conditions, a monitor that is armed
> + * with the MWAIT instruction may not be triggered, leading to a processor
> + * hang.
> + *
> + * LNL030: Problem P-cores may not exit power state Core C6 on monitor hit.
> + *
> + * Force the sending of an IPI in those cases.
> + */
> +static void __init probe_mwait_errata(void)
> +{
> + static const struct x86_cpu_id models[] = {
__initconst
> @@ -406,6 +438,8 @@ static void Intel_errata_workarounds(struct cpuinfo_x86 *c)
> __set_bit(X86_FEATURE_CLFLUSH_MONITOR, c->x86_capability);
>
> probe_c3_errata(c);
> + if (system_state < SYS_STATE_active && c == &boot_cpu_data)
> + probe_mwait_errata();
> }
You could simplify the condition by using just system_state < SYS_STATE_smp_boot
(without any &&), I think.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-23 11:32 [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata Roger Pau Monne
2025-04-23 13:05 ` Jan Beulich
@ 2025-04-23 13:13 ` Andrew Cooper
2025-04-23 14:08 ` Roger Pau Monné
2025-04-25 12:36 ` Alejandro Vallejo
2 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2025-04-23 13:13 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich
On 23/04/2025 12:32 pm, Roger Pau Monne wrote:
> There are several errata on Intel regarding the usage of the MONITOR/MWAIT
> instructions, all having in common that stores to the monitored region
> might not wake up the CPU.
>
> Fix them by forcing the sending of an IPI for the affected models.
>
> The Ice Lake issue has been reproduced internally on XenServer hardware,
> and the fix does seem to prevent it. The symptom was APs getting stuck in
> the idle loop immediately after bring up, which in turn prevented the BSP
> from making progress. This would happen before the watchdog was
> initialized, and hence the whole system would get stuck.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Apollo and Lunar Lake fixes have not been tested, due to lack of hardware.
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> index 420198406def..1dbf15b01ed7 100644
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -441,8 +441,14 @@ void cpuidle_wakeup_mwait(cpumask_t *mask)
> cpumask_andnot(mask, mask, &target);
> }
>
> +/* Force sending of a wakeup IPI regardless of mwait usage. */
> +bool __ro_after_init force_mwait_ipi_wakeup;
> +
> bool arch_skip_send_event_check(unsigned int cpu)
> {
> + if ( force_mwait_ipi_wakeup )
> + return false;
> +
I don't especially like this. The callers are a loop over all CPUs, and
this can't be inlined/simplified automatically.
But, lets get the fix in place first. Optimising comes later.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-23 13:13 ` Andrew Cooper
@ 2025-04-23 14:08 ` Roger Pau Monné
0 siblings, 0 replies; 7+ messages in thread
From: Roger Pau Monné @ 2025-04-23 14:08 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, Jan Beulich
On Wed, Apr 23, 2025 at 02:13:01PM +0100, Andrew Cooper wrote:
> On 23/04/2025 12:32 pm, Roger Pau Monne wrote:
> > There are several errata on Intel regarding the usage of the MONITOR/MWAIT
> > instructions, all having in common that stores to the monitored region
> > might not wake up the CPU.
> >
> > Fix them by forcing the sending of an IPI for the affected models.
> >
> > The Ice Lake issue has been reproduced internally on XenServer hardware,
> > and the fix does seem to prevent it. The symptom was APs getting stuck in
> > the idle loop immediately after bring up, which in turn prevented the BSP
> > from making progress. This would happen before the watchdog was
> > initialized, and hence the whole system would get stuck.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Apollo and Lunar Lake fixes have not been tested, due to lack of hardware.
>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
> > diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> > index 420198406def..1dbf15b01ed7 100644
> > --- a/xen/arch/x86/acpi/cpu_idle.c
> > +++ b/xen/arch/x86/acpi/cpu_idle.c
> > @@ -441,8 +441,14 @@ void cpuidle_wakeup_mwait(cpumask_t *mask)
> > cpumask_andnot(mask, mask, &target);
> > }
> >
> > +/* Force sending of a wakeup IPI regardless of mwait usage. */
> > +bool __ro_after_init force_mwait_ipi_wakeup;
> > +
> > bool arch_skip_send_event_check(unsigned int cpu)
> > {
> > + if ( force_mwait_ipi_wakeup )
> > + return false;
> > +
>
> I don't especially like this. The callers are a loop over all CPUs, and
> this can't be inlined/simplified automatically.
Hm, I can look into this later, I can make maybe turn
arch_skip_send_event_check into an inline. Let me get this
committed first.
Thanks, Roger.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-23 11:32 [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata Roger Pau Monne
2025-04-23 13:05 ` Jan Beulich
2025-04-23 13:13 ` Andrew Cooper
@ 2025-04-25 12:36 ` Alejandro Vallejo
2025-04-25 12:44 ` Andrew Cooper
2025-04-25 12:55 ` Roger Pau Monné
2 siblings, 2 replies; 7+ messages in thread
From: Alejandro Vallejo @ 2025-04-25 12:36 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel
Cc: Jan Beulich, Andrew Cooper, Frediano Ziglio, Xen-devel
On Wed Apr 23, 2025 at 12:32 PM BST, Roger Pau Monne wrote:
> There are several errata on Intel regarding the usage of the MONITOR/MWAIT
> instructions, all having in common that stores to the monitored region
> might not wake up the CPU.
>
> Fix them by forcing the sending of an IPI for the affected models.
>
> The Ice Lake issue has been reproduced internally on XenServer hardware,
> and the fix does seem to prevent it. The symptom was APs getting stuck in
> the idle loop immediately after bring up, which in turn prevented the BSP
> from making progress.
Ugh... so this is what it was... Awesome having this madness fixed.
Do you happen to know if Linux has a similar fix in place?
> This would happen before the watchdog was initialized, and hence the
> whole system would get stuck.
That's nasty. It was the misassumption that the watchdog was already
running that had me going in circles thinking it was a lockup rather
than a livelock. Oh, well.
I believe the kudos for finally being able to reproduce this goes to
Frediano?
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-25 12:36 ` Alejandro Vallejo
@ 2025-04-25 12:44 ` Andrew Cooper
2025-04-25 12:55 ` Roger Pau Monné
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2025-04-25 12:44 UTC (permalink / raw)
To: Alejandro Vallejo, Roger Pau Monne, xen-devel
Cc: Jan Beulich, Frediano Ziglio
On 25/04/2025 1:36 pm, Alejandro Vallejo wrote:
> On Wed Apr 23, 2025 at 12:32 PM BST, Roger Pau Monne wrote:
>> There are several errata on Intel regarding the usage of the MONITOR/MWAIT
>> instructions, all having in common that stores to the monitored region
>> might not wake up the CPU.
>>
>> Fix them by forcing the sending of an IPI for the affected models.
>>
>> The Ice Lake issue has been reproduced internally on XenServer hardware,
>> and the fix does seem to prevent it. The symptom was APs getting stuck in
>> the idle loop immediately after bring up, which in turn prevented the BSP
>> from making progress.
> Ugh... so this is what it was... Awesome having this madness fixed.
>
> Do you happen to know if Linux has a similar fix in place?
https://lore.kernel.org/lkml/20250421192205.7CC1A7D9@davehans-spike.ostc.intel.com/T/#u
>
>> This would happen before the watchdog was initialized, and hence the
>> whole system would get stuck.
> That's nasty. It was the misassumption that the watchdog was already
> running that had me going in circles thinking it was a lockup rather
> than a livelock. Oh, well.
>
> I believe the kudos for finally being able to reproduce this goes to
> Frediano?
Of course.
The bit about the watchdog is a little bit of a red herring. The
rcu_barrier() loop processes softirqs, so the watchdog wouldn't have
fired even it had been set up.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata
2025-04-25 12:36 ` Alejandro Vallejo
2025-04-25 12:44 ` Andrew Cooper
@ 2025-04-25 12:55 ` Roger Pau Monné
1 sibling, 0 replies; 7+ messages in thread
From: Roger Pau Monné @ 2025-04-25 12:55 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: xen-devel, Jan Beulich, Andrew Cooper, Frediano Ziglio, Xen-devel
On Fri, Apr 25, 2025 at 01:36:42PM +0100, Alejandro Vallejo wrote:
> On Wed Apr 23, 2025 at 12:32 PM BST, Roger Pau Monne wrote:
> > There are several errata on Intel regarding the usage of the MONITOR/MWAIT
> > instructions, all having in common that stores to the monitored region
> > might not wake up the CPU.
> >
> > Fix them by forcing the sending of an IPI for the affected models.
> >
> > The Ice Lake issue has been reproduced internally on XenServer hardware,
> > and the fix does seem to prevent it. The symptom was APs getting stuck in
> > the idle loop immediately after bring up, which in turn prevented the BSP
> > from making progress.
>
> Ugh... so this is what it was... Awesome having this madness fixed.
>
> Do you happen to know if Linux has a similar fix in place?
It should have now:
https://lore.kernel.org/lkml/20250421192205.7CC1A7D9@davehans-spike.ostc.intel.com/
> > This would happen before the watchdog was initialized, and hence the
> > whole system would get stuck.
>
> That's nasty. It was the misassumption that the watchdog was already
> running that had me going in circles thinking it was a lockup rather
> than a livelock. Oh, well.
>
> I believe the kudos for finally being able to reproduce this goes to
> Frediano?
Yes, Frediano managed to get it to reproduce reliably in the lab
(maybe 1-2 hits per day), and afterwards we pushed the rate up by just
rebooting in Xen itself after AP bringup, once we knew the specific
hardware that exhibited the issue.
Regards, Roger.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-25 12:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 11:32 [PATCH v2] x86/intel: workaround several MONITOR/MWAIT errata Roger Pau Monne
2025-04-23 13:05 ` Jan Beulich
2025-04-23 13:13 ` Andrew Cooper
2025-04-23 14:08 ` Roger Pau Monné
2025-04-25 12:36 ` Alejandro Vallejo
2025-04-25 12:44 ` Andrew Cooper
2025-04-25 12:55 ` Roger Pau Monné
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.