* [PATCH 0/2] Xen real-time x86
@ 2025-07-08 0:06 Stefano Stabellini
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 0:06 UTC (permalink / raw)
To: xen-devel
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné,
stefano.stabellini, Xenia.Ragiadakou, alejandro.garciavallejo,
Jason.Andryuk
Hi all,
This short patch series improves Xen real-time execution on AMD x86
processors.
The key to real-time performance is deterministic guest execution times
and deterministic guest interrupt latency. In such configurations, the
null scheduler is typically used, and there should be no IPIs or other
sources of vCPU execution interruptions beyond the guest timer interrupt
as configured by the guest, and any passthrough interrupts for
passthrough devices.
This is because, upon receiving a critical interrupt, the guest (such as
FreeRTOS or Zephyr) typically has a very short window of time to
complete the required action. Being interrupted in the middle of this
critical section could prevent the guest from completing the action
within the allotted time, leading to malfunctions.
To address this, the patch series disables IPIs that could potentially
affect the real-time domain.
Cheers,
Stefano
Stefano Stabellini (2):
xen/x86: don't send IPI to sync TSC when it is reliable
xen/x86: introduce AMD_MCE_NONFATAL
xen/arch/x86/Kconfig.cpu | 15 +++++++++++++++
xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 3 ++-
xen/arch/x86/time.c | 4 ++++
3 files changed, 21 insertions(+), 1 deletion(-)
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 0:06 [PATCH 0/2] Xen real-time x86 Stefano Stabellini
@ 2025-07-08 0:07 ` Stefano Stabellini
2025-07-08 9:54 ` Alejandro Vallejo
2025-07-08 13:24 ` Jan Beulich
2025-07-08 0:07 ` [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL Stefano Stabellini
2025-07-08 10:11 ` [PATCH 0/2] Xen real-time x86 Roger Pau Monné
2 siblings, 2 replies; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 0:07 UTC (permalink / raw)
To: xen-devel
Cc: jbeulich, andrew.cooper3, roger.pau, stefano.stabellini,
Xenia.Ragiadakou, alejandro.garciavallejo, Jason.Andryuk
On real time configuration with the null scheduler, we shouldn't
interrupt the guest execution unless strictly necessary: the guest could
be a real time guest (e.g. FreeRTOS) and interrupting its execution
could lead to a missed deadline.
The principal source of interruptions is IPIs. Remove the unnecessary
IPI on all physical CPUs to sync the TSC when the TSC is known to be
reliable.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
---
xen/arch/x86/time.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 59129f419d..bfd022174a 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
local_irq_enable();
}
+ if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
+ return;
+
cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
/* @wait=1 because we must wait for all cpus before freeing @r. */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
2025-07-08 0:06 [PATCH 0/2] Xen real-time x86 Stefano Stabellini
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
@ 2025-07-08 0:07 ` Stefano Stabellini
2025-07-08 3:23 ` Demi Marie Obenour
2025-07-08 10:25 ` Alejandro Vallejo
2025-07-08 10:11 ` [PATCH 0/2] Xen real-time x86 Roger Pau Monné
2 siblings, 2 replies; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 0:07 UTC (permalink / raw)
To: xen-devel
Cc: jbeulich, andrew.cooper3, roger.pau, stefano.stabellini,
Xenia.Ragiadakou, alejandro.garciavallejo, Jason.Andryuk
Today, checking for non-fatal MCE errors on ARM is very invasive: it
involves a periodic timer interrupting the physical CPU execution at
regular intervals. Moreover, when the timer fires, the handler sends an
IPI to all physical CPUs.
Both these actions are disruptive in terms of latency and deterministic
execution times for real-time workloads. They might miss a deadline due
to one of these IPIs. Make it possible to disable non-fatal MCE errors
checking with a new Kconfig option (AMD_MCE_NONFATAL).
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
---
RFC. I couldn't find a better way to do this.
---
xen/arch/x86/Kconfig.cpu | 15 +++++++++++++++
xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 3 ++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
index 5fb18db1aa..14e20ad19d 100644
--- a/xen/arch/x86/Kconfig.cpu
+++ b/xen/arch/x86/Kconfig.cpu
@@ -10,6 +10,21 @@ config AMD
May be turned off in builds targetting other vendors. Otherwise,
must be enabled for Xen to work suitably on AMD platforms.
+config AMD_MCE_NONFATAL
+ bool "Check for non-fatal MCEs on AMD CPUs"
+ default y
+ depends on AMD
+ help
+ Check for non-fatal MCE errors.
+
+ When this option is on (default), Xen regularly checks for
+ non-fatal MCEs potentially occurring on all physical CPUs. The
+ checking is done via timers and IPI interrupts, which is
+ acceptable in most configurations, but not for real-time.
+
+ Turn this option off if you plan on deploying real-time workloads
+ on Xen.
+
config INTEL
bool "Support Intel CPUs"
default y
diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
index 7d48c9ab5f..812e18f612 100644
--- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
+++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
@@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
{
- if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
+ if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
+ (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
return;
/* Assume we are on K8 or newer AMD or Hygon CPU here */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
2025-07-08 0:07 ` [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL Stefano Stabellini
@ 2025-07-08 3:23 ` Demi Marie Obenour
2025-07-08 10:25 ` Alejandro Vallejo
1 sibling, 0 replies; 23+ messages in thread
From: Demi Marie Obenour @ 2025-07-08 3:23 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 180 bytes --]
On 7/7/25 20:07, Stefano Stabellini wrote:
> Today, checking for non-fatal MCE errors on ARM is very invasive: it
s/ARM/AMD/
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
@ 2025-07-08 9:54 ` Alejandro Vallejo
2025-07-08 17:40 ` Stefano Stabellini
2025-07-08 13:24 ` Jan Beulich
1 sibling, 1 reply; 23+ messages in thread
From: Alejandro Vallejo @ 2025-07-08 9:54 UTC (permalink / raw)
To: Stefano Stabellini, xen-devel
Cc: jbeulich, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
Jason.Andryuk
On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> On real time configuration with the null scheduler, we shouldn't
> interrupt the guest execution unless strictly necessary: the guest could
> be a real time guest (e.g. FreeRTOS) and interrupting its execution
> could lead to a missed deadline.
>
> The principal source of interruptions is IPIs. Remove the unnecessary
> IPI on all physical CPUs to sync the TSC when the TSC is known to be
> reliable.
>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> ---
> xen/arch/x86/time.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 59129f419d..bfd022174a 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
> local_irq_enable();
> }
>
> + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> + boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
> + return;
> +
This should check "(tsc_flags & TSC_RELIABLE_SOCKET)" as well. The TSCs might
still be unsynchronized across sockets.
I'm still quite confused as to how Xen (mis)handles time, but wouldn't this need
to go inside the branch above? If the clocksource is not the TSC as well the TSC
can still drift with respect to the actual clocksource (PIT, HPET or ACPI timer).
If so, we could probably do an early return in the branch above ignoring the
conditions (they are required for picking the TSC clocksource already, including
synchronization across sockets).
Another matter is whether we could drop the "master_stime" write. Would we
care about it at all?
> cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
>
> /* @wait=1 because we must wait for all cpus before freeing @r. */
Cheers,
Aljeandro
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-08 0:06 [PATCH 0/2] Xen real-time x86 Stefano Stabellini
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
2025-07-08 0:07 ` [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL Stefano Stabellini
@ 2025-07-08 10:11 ` Roger Pau Monné
2025-07-08 13:31 ` Jan Beulich
2 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2025-07-08 10:11 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel, Jan Beulich, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
> Hi all,
>
> This short patch series improves Xen real-time execution on AMD x86
> processors.
>
> The key to real-time performance is deterministic guest execution times
> and deterministic guest interrupt latency. In such configurations, the
> null scheduler is typically used, and there should be no IPIs or other
> sources of vCPU execution interruptions beyond the guest timer interrupt
> as configured by the guest, and any passthrough interrupts for
> passthrough devices.
>
> This is because, upon receiving a critical interrupt, the guest (such as
> FreeRTOS or Zephyr) typically has a very short window of time to
> complete the required action. Being interrupted in the middle of this
> critical section could prevent the guest from completing the action
> within the allotted time, leading to malfunctions.
There's IMO still one pending issue after this series on x86, maybe
you have addressed this with some local patch. Interrupt forwarding
from Xen into HVM/PVH guests uses a softirq to do the injection, which
means there's a non-deterministic window of latency between when the
interrupt is received by Xen, as to when it's injected to the guest,
because the softirq might not get processed right after being set as
pending (there might be other softirqs to process, or simply Xen might
be busy doing some other operation).
I think you want to look into adding a new command line option or
similar, that allows selecting whether guest IRQs are deferred to a
softirq for injection, or are injected as part of the processing done
in the IRQ handler itself.
Otherwise there will always be a non-deterministic amount of latency
on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen
some weird/unexpected variance when doing this passthrough interrupt
latency measurements on x86?
Regards, Roger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
2025-07-08 0:07 ` [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL Stefano Stabellini
2025-07-08 3:23 ` Demi Marie Obenour
@ 2025-07-08 10:25 ` Alejandro Vallejo
2025-07-08 13:28 ` Jan Beulich
1 sibling, 1 reply; 23+ messages in thread
From: Alejandro Vallejo @ 2025-07-08 10:25 UTC (permalink / raw)
To: Stefano Stabellini, xen-devel
Cc: jbeulich, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
Jason.Andryuk
On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> Today, checking for non-fatal MCE errors on ARM is very invasive: it
> involves a periodic timer interrupting the physical CPU execution at
> regular intervals. Moreover, when the timer fires, the handler sends an
> IPI to all physical CPUs.
>
> Both these actions are disruptive in terms of latency and deterministic
> execution times for real-time workloads. They might miss a deadline due
> to one of these IPIs. Make it possible to disable non-fatal MCE errors
> checking with a new Kconfig option (AMD_MCE_NONFATAL).
>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> ---
> RFC. I couldn't find a better way to do this.
> ---
> xen/arch/x86/Kconfig.cpu | 15 +++++++++++++++
> xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 3 ++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
> index 5fb18db1aa..14e20ad19d 100644
> --- a/xen/arch/x86/Kconfig.cpu
> +++ b/xen/arch/x86/Kconfig.cpu
> @@ -10,6 +10,21 @@ config AMD
> May be turned off in builds targetting other vendors. Otherwise,
> must be enabled for Xen to work suitably on AMD platforms.
>
> +config AMD_MCE_NONFATAL
> + bool "Check for non-fatal MCEs on AMD CPUs"
> + default y
> + depends on AMD
> + help
> + Check for non-fatal MCE errors.
> +
> + When this option is on (default), Xen regularly checks for
> + non-fatal MCEs potentially occurring on all physical CPUs. The
> + checking is done via timers and IPI interrupts, which is
> + acceptable in most configurations, but not for real-time.
> +
> + Turn this option off if you plan on deploying real-time workloads
> + on Xen.
> +
This being in the CPU vendor submenu seems off. I'd expect only a list of
silicon vendors here. I think it ought to be in the regular Kconfig file.
> config INTEL
> bool "Support Intel CPUs"
> default y
> diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> index 7d48c9ab5f..812e18f612 100644
> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> @@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
>
> void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
> {
> - if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
> + if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
> + (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
> return;
>
> /* Assume we are on K8 or newer AMD or Hygon CPU here */
It can be made more general to remove more code. What do you think of removing
all non-fatals and getting rid of the initcall altogether?
diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
index 5fb18db1aa..a4b892a1aa 100644
--- a/xen/arch/x86/Kconfig.cpu
+++ b/xen/arch/x86/Kconfig.cpu
@@ -10,6 +10,20 @@ config AMD
May be turned off in builds targetting other vendors. Otherwise,
must be enabled for Xen to work suitably on AMD platforms.
+config MCE_NONFATAL
+ bool "Check for non-fatal MCEs"
+ default y
+ help
+ Check for non-fatal MCE errors.
+
+ When this option is on (default), Xen regularly checks for
+ non-fatal MCEs potentially occurring on all physical CPUs. The
+ checking is done via timers and IPI interrupts, which is
+ acceptable in most configurations, but not for real-time.
+
+ Turn this option off if you plan on deploying real-time workloads
+ on Xen.
+
config INTEL
bool "Support Intel CPUs"
default y
diff --git a/xen/arch/x86/cpu/mcheck/Makefile b/xen/arch/x86/cpu/mcheck/Makefile
index e6cb4dd503..c70b441888 100644
--- a/xen/arch/x86/cpu/mcheck/Makefile
+++ b/xen/arch/x86/cpu/mcheck/Makefile
@@ -1,12 +1,12 @@
-obj-$(CONFIG_AMD) += amd_nonfatal.o
+obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
obj-$(CONFIG_AMD) += mce_amd.o
obj-y += mcaction.o
obj-y += barrier.o
-obj-$(CONFIG_INTEL) += intel-nonfatal.o
+obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) += intel-nonfatal.o
obj-y += mctelem.o
obj-y += mce.o
obj-y += mce-apei.o
obj-$(CONFIG_INTEL) += mce_intel.o
-obj-y += non-fatal.o
+obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
obj-y += util.o
obj-y += vmce.o
... with the Kconfig option probably in the regular x86 Kconfig rather than
Kconfig.cpu
Thoughts?
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
2025-07-08 9:54 ` Alejandro Vallejo
@ 2025-07-08 13:24 ` Jan Beulich
2025-07-08 17:40 ` Stefano Stabellini
1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2025-07-08 13:24 UTC (permalink / raw)
To: Stefano Stabellini
Cc: andrew.cooper3, roger.pau, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk, xen-devel
On 08.07.2025 02:07, Stefano Stabellini wrote:
> On real time configuration with the null scheduler, we shouldn't
> interrupt the guest execution unless strictly necessary: the guest could
> be a real time guest (e.g. FreeRTOS) and interrupting its execution
> could lead to a missed deadline.
>
> The principal source of interruptions is IPIs. Remove the unnecessary
> IPI on all physical CPUs to sync the TSC when the TSC is known to be
> reliable.
If it had been truly unnecessary for all the time, I'm sure someone would
have suggested to get rid of the overhead. IOW I think there is more to be
said as to this being correct / safe, including in any corner cases.
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> ---
> xen/arch/x86/time.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 59129f419d..bfd022174a 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
> local_irq_enable();
> }
>
> + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> + boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
> + return;
This would render the (first of the two) invocation(s) of the function from
verify_tsc_reliability() (largely) dead; it would then be only r.master_stime
which gets updated (see also Alejandro's reply), which surely wouldn't have
required that call in the first place.
Jan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
2025-07-08 10:25 ` Alejandro Vallejo
@ 2025-07-08 13:28 ` Jan Beulich
2025-07-08 17:13 ` Stefano Stabellini
0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2025-07-08 13:28 UTC (permalink / raw)
To: Alejandro Vallejo, Stefano Stabellini
Cc: andrew.cooper3, roger.pau, Xenia.Ragiadakou, Jason.Andryuk,
xen-devel
On 08.07.2025 12:25, Alejandro Vallejo wrote:
> On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
>> --- a/xen/arch/x86/Kconfig.cpu
>> +++ b/xen/arch/x86/Kconfig.cpu
>> @@ -10,6 +10,21 @@ config AMD
>> May be turned off in builds targetting other vendors. Otherwise,
>> must be enabled for Xen to work suitably on AMD platforms.
>>
>> +config AMD_MCE_NONFATAL
>> + bool "Check for non-fatal MCEs on AMD CPUs"
>> + default y
>> + depends on AMD
>> + help
>> + Check for non-fatal MCE errors.
>> +
>> + When this option is on (default), Xen regularly checks for
>> + non-fatal MCEs potentially occurring on all physical CPUs. The
>> + checking is done via timers and IPI interrupts, which is
>> + acceptable in most configurations, but not for real-time.
>> +
>> + Turn this option off if you plan on deploying real-time workloads
>> + on Xen.
>> +
>
> This being in the CPU vendor submenu seems off. I'd expect only a list of
> silicon vendors here. I think it ought to be in the regular Kconfig file.
Whether in this file or the regular one is up for discussion, but yes,
definitely not inside the vendor menu.
>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>> @@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
>>
>> void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>> {
>> - if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
>> + if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
>> + (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
>> return;
>>
>> /* Assume we are on K8 or newer AMD or Hygon CPU here */
>
> It can be made more general to remove more code. What do you think of removing
> all non-fatals and getting rid of the initcall altogether?
I think such a more general approach would be quite a bit better.
Jan
> diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
> index 5fb18db1aa..a4b892a1aa 100644
> --- a/xen/arch/x86/Kconfig.cpu
> +++ b/xen/arch/x86/Kconfig.cpu
> @@ -10,6 +10,20 @@ config AMD
> May be turned off in builds targetting other vendors. Otherwise,
> must be enabled for Xen to work suitably on AMD platforms.
>
> +config MCE_NONFATAL
> + bool "Check for non-fatal MCEs"
> + default y
> + help
> + Check for non-fatal MCE errors.
> +
> + When this option is on (default), Xen regularly checks for
> + non-fatal MCEs potentially occurring on all physical CPUs. The
> + checking is done via timers and IPI interrupts, which is
> + acceptable in most configurations, but not for real-time.
> +
> + Turn this option off if you plan on deploying real-time workloads
> + on Xen.
> +
> config INTEL
> bool "Support Intel CPUs"
> default y
> diff --git a/xen/arch/x86/cpu/mcheck/Makefile b/xen/arch/x86/cpu/mcheck/Makefile
> index e6cb4dd503..c70b441888 100644
> --- a/xen/arch/x86/cpu/mcheck/Makefile
> +++ b/xen/arch/x86/cpu/mcheck/Makefile
> @@ -1,12 +1,12 @@
> -obj-$(CONFIG_AMD) += amd_nonfatal.o
> +obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
> obj-$(CONFIG_AMD) += mce_amd.o
> obj-y += mcaction.o
> obj-y += barrier.o
> -obj-$(CONFIG_INTEL) += intel-nonfatal.o
> +obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) += intel-nonfatal.o
> obj-y += mctelem.o
> obj-y += mce.o
> obj-y += mce-apei.o
> obj-$(CONFIG_INTEL) += mce_intel.o
> -obj-y += non-fatal.o
> +obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
> obj-y += util.o
> obj-y += vmce.o
>
> ... with the Kconfig option probably in the regular x86 Kconfig rather than
> Kconfig.cpu
>
> Thoughts?
>
> Cheers,
> Alejandro
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-08 10:11 ` [PATCH 0/2] Xen real-time x86 Roger Pau Monné
@ 2025-07-08 13:31 ` Jan Beulich
2025-07-08 17:11 ` Stefano Stabellini
0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2025-07-08 13:31 UTC (permalink / raw)
To: Roger Pau Monné, Stefano Stabellini
Cc: xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On 08.07.2025 12:11, Roger Pau Monné wrote:
> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
>> Hi all,
>>
>> This short patch series improves Xen real-time execution on AMD x86
>> processors.
>>
>> The key to real-time performance is deterministic guest execution times
>> and deterministic guest interrupt latency. In such configurations, the
>> null scheduler is typically used, and there should be no IPIs or other
>> sources of vCPU execution interruptions beyond the guest timer interrupt
>> as configured by the guest, and any passthrough interrupts for
>> passthrough devices.
>>
>> This is because, upon receiving a critical interrupt, the guest (such as
>> FreeRTOS or Zephyr) typically has a very short window of time to
>> complete the required action. Being interrupted in the middle of this
>> critical section could prevent the guest from completing the action
>> within the allotted time, leading to malfunctions.
>
> There's IMO still one pending issue after this series on x86, maybe
> you have addressed this with some local patch.
Not just one, I think. We use IPIs for other purposes as well. The way
I read the text above, all of them are a (potential) problem.
Jan
> Interrupt forwarding
> from Xen into HVM/PVH guests uses a softirq to do the injection, which
> means there's a non-deterministic window of latency between when the
> interrupt is received by Xen, as to when it's injected to the guest,
> because the softirq might not get processed right after being set as
> pending (there might be other softirqs to process, or simply Xen might
> be busy doing some other operation).
>
> I think you want to look into adding a new command line option or
> similar, that allows selecting whether guest IRQs are deferred to a
> softirq for injection, or are injected as part of the processing done
> in the IRQ handler itself.
>
> Otherwise there will always be a non-deterministic amount of latency
> on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen
> some weird/unexpected variance when doing this passthrough interrupt
> latency measurements on x86?
>
> Regards, Roger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-08 13:31 ` Jan Beulich
@ 2025-07-08 17:11 ` Stefano Stabellini
2025-07-09 5:37 ` Jan Beulich
2025-07-09 14:10 ` Roger Pau Monné
0 siblings, 2 replies; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 17:11 UTC (permalink / raw)
To: Jan Beulich
Cc: Roger Pau Monné, Stefano Stabellini, xen-devel,
Andrew Cooper, Xenia.Ragiadakou, alejandro.garciavallejo,
Jason.Andryuk
[-- Attachment #1: Type: text/plain, Size: 3718 bytes --]
On Tue, 8 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 12:11, Roger Pau Monné wrote:
> > On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
> >> Hi all,
> >>
> >> This short patch series improves Xen real-time execution on AMD x86
> >> processors.
> >>
> >> The key to real-time performance is deterministic guest execution times
> >> and deterministic guest interrupt latency. In such configurations, the
> >> null scheduler is typically used, and there should be no IPIs or other
> >> sources of vCPU execution interruptions beyond the guest timer interrupt
> >> as configured by the guest, and any passthrough interrupts for
> >> passthrough devices.
> >>
> >> This is because, upon receiving a critical interrupt, the guest (such as
> >> FreeRTOS or Zephyr) typically has a very short window of time to
> >> complete the required action. Being interrupted in the middle of this
> >> critical section could prevent the guest from completing the action
> >> within the allotted time, leading to malfunctions.
> >
> > There's IMO still one pending issue after this series on x86, maybe
> > you have addressed this with some local patch.
>
> Not just one, I think. We use IPIs for other purposes as well. The way
> I read the text above, all of them are a (potential) problem.
Yes, all of them are potentially a problem. If you know of any other
IPI, please let me know and I'll try to remove them. One of my goals
posting this series was to raise awareness on this issue and attempting
to fix it with your help. It is not just IPIs, also Xen timers and other
things that could cause the guest to trap into Xen without the guest
knowledge. Typically IPIs are the worst offenders in my experience.
On ARM, I have done several experiments where, after the system is
configured correctly, I can see that if the RTOS does nothing, there are
no traps in Xen on the RTOS vCPU/pCPU for seconds.
As I tried to describe in the email, typically the real time
application, which tends to be based on an RTOS like FreeRTOS or Zephyr
(think of them like Unikernels), has a very small window of time from
receiving an interrupt to accomplish a critical task. Nothing should be
disturbing the execution of the RTOS during the critical window. The
operation the RTOS needs to perform is typically on a passthrough device
without Xen interactions.
In general from the hypervisor point of view, the idea is that Xen
should inject the interrupt and then leave the RTOS alone and
undisturbed to do its job.
> > Interrupt forwarding
> > from Xen into HVM/PVH guests uses a softirq to do the injection, which
> > means there's a non-deterministic window of latency between when the
> > interrupt is received by Xen, as to when it's injected to the guest,
> > because the softirq might not get processed right after being set as
> > pending (there might be other softirqs to process, or simply Xen might
> > be busy doing some other operation).
> >
> > I think you want to look into adding a new command line option or
> > similar, that allows selecting whether guest IRQs are deferred to a
> > softirq for injection, or are injected as part of the processing done
> > in the IRQ handler itself.
> >
> > Otherwise there will always be a non-deterministic amount of latency
> > on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen
> > some weird/unexpected variance when doing this passthrough interrupt
> > latency measurements on x86?
While this is not great and I agree with Roger that it should be
improved (I'll try to do so), in a well configured system I expect that
there should be no other softirqs on the RTOS vCPU/pCPU so it shouldn't
matter much if it is raise as a softirq or not?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
2025-07-08 13:28 ` Jan Beulich
@ 2025-07-08 17:13 ` Stefano Stabellini
0 siblings, 0 replies; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 17:13 UTC (permalink / raw)
To: Jan Beulich
Cc: Alejandro Vallejo, Stefano Stabellini, andrew.cooper3, roger.pau,
Xenia.Ragiadakou, Jason.Andryuk, xen-devel
On Tue, 8 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 12:25, Alejandro Vallejo wrote:
> > On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> >> --- a/xen/arch/x86/Kconfig.cpu
> >> +++ b/xen/arch/x86/Kconfig.cpu
> >> @@ -10,6 +10,21 @@ config AMD
> >> May be turned off in builds targetting other vendors. Otherwise,
> >> must be enabled for Xen to work suitably on AMD platforms.
> >>
> >> +config AMD_MCE_NONFATAL
> >> + bool "Check for non-fatal MCEs on AMD CPUs"
> >> + default y
> >> + depends on AMD
> >> + help
> >> + Check for non-fatal MCE errors.
> >> +
> >> + When this option is on (default), Xen regularly checks for
> >> + non-fatal MCEs potentially occurring on all physical CPUs. The
> >> + checking is done via timers and IPI interrupts, which is
> >> + acceptable in most configurations, but not for real-time.
> >> +
> >> + Turn this option off if you plan on deploying real-time workloads
> >> + on Xen.
> >> +
> >
> > This being in the CPU vendor submenu seems off. I'd expect only a list of
> > silicon vendors here. I think it ought to be in the regular Kconfig file.
>
> Whether in this file or the regular one is up for discussion, but yes,
> definitely not inside the vendor menu.
>
> >> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> >> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> >> @@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
> >>
> >> void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
> >> {
> >> - if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
> >> + if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
> >> + (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
> >> return;
> >>
> >> /* Assume we are on K8 or newer AMD or Hygon CPU here */
> >
> > It can be made more general to remove more code. What do you think of removing
> > all non-fatals and getting rid of the initcall altogether?
>
> I think such a more general approach would be quite a bit better.
I am fine with that, actually better to remove the code then to leave it
around and do nothing.
> > diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
> > index 5fb18db1aa..a4b892a1aa 100644
> > --- a/xen/arch/x86/Kconfig.cpu
> > +++ b/xen/arch/x86/Kconfig.cpu
> > @@ -10,6 +10,20 @@ config AMD
> > May be turned off in builds targetting other vendors. Otherwise,
> > must be enabled for Xen to work suitably on AMD platforms.
> >
> > +config MCE_NONFATAL
> > + bool "Check for non-fatal MCEs"
> > + default y
> > + help
> > + Check for non-fatal MCE errors.
> > +
> > + When this option is on (default), Xen regularly checks for
> > + non-fatal MCEs potentially occurring on all physical CPUs. The
> > + checking is done via timers and IPI interrupts, which is
> > + acceptable in most configurations, but not for real-time.
> > +
> > + Turn this option off if you plan on deploying real-time workloads
> > + on Xen.
> > +
> > config INTEL
> > bool "Support Intel CPUs"
> > default y
> > diff --git a/xen/arch/x86/cpu/mcheck/Makefile b/xen/arch/x86/cpu/mcheck/Makefile
> > index e6cb4dd503..c70b441888 100644
> > --- a/xen/arch/x86/cpu/mcheck/Makefile
> > +++ b/xen/arch/x86/cpu/mcheck/Makefile
> > @@ -1,12 +1,12 @@
> > -obj-$(CONFIG_AMD) += amd_nonfatal.o
> > +obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
> > obj-$(CONFIG_AMD) += mce_amd.o
> > obj-y += mcaction.o
> > obj-y += barrier.o
> > -obj-$(CONFIG_INTEL) += intel-nonfatal.o
> > +obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) += intel-nonfatal.o
> > obj-y += mctelem.o
> > obj-y += mce.o
> > obj-y += mce-apei.o
> > obj-$(CONFIG_INTEL) += mce_intel.o
> > -obj-y += non-fatal.o
> > +obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
> > obj-y += util.o
> > obj-y += vmce.o
> >
> > ... with the Kconfig option probably in the regular x86 Kconfig rather than
> > Kconfig.cpu
> >
> > Thoughts?
> >
> > Cheers,
> > Alejandro
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 9:54 ` Alejandro Vallejo
@ 2025-07-08 17:40 ` Stefano Stabellini
2025-07-08 17:53 ` Alejandro Vallejo
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 17:40 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: Stefano Stabellini, xen-devel, jbeulich, andrew.cooper3,
roger.pau, Xenia.Ragiadakou, Jason.Andryuk
On Tue, 8 Jul 2025, Alejandro Vallejo wrote:
> On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> > On real time configuration with the null scheduler, we shouldn't
> > interrupt the guest execution unless strictly necessary: the guest could
> > be a real time guest (e.g. FreeRTOS) and interrupting its execution
> > could lead to a missed deadline.
> >
> > The principal source of interruptions is IPIs. Remove the unnecessary
> > IPI on all physical CPUs to sync the TSC when the TSC is known to be
> > reliable.
> >
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> > ---
> > xen/arch/x86/time.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> > index 59129f419d..bfd022174a 100644
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
> > local_irq_enable();
> > }
> >
> > + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > + boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
> > + return;
> > +
>
> This should check "(tsc_flags & TSC_RELIABLE_SOCKET)" as well. The TSCs might
> still be unsynchronized across sockets.
>
> I'm still quite confused as to how Xen (mis)handles time, but wouldn't this need
> to go inside the branch above? If the clocksource is not the TSC as well the TSC
> can still drift with respect to the actual clocksource (PIT, HPET or ACPI timer).
I can move it inside the previous if
> If so, we could probably do an early return in the branch above ignoring the
> conditions (they are required for picking the TSC clocksource already, including
> synchronization across sockets).
>
> Another matter is whether we could drop the "master_stime" write. Would we
> care about it at all?
I'll drop it.
Is this what you had in mind?
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 59129f419d..d72e640f72 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
};
if ( clocksource_is_tsc() )
- {
- local_irq_disable();
- r.master_stime = read_platform_stime(&r.master_tsc_stamp);
- local_irq_enable();
- }
+ return;
cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 13:24 ` Jan Beulich
@ 2025-07-08 17:40 ` Stefano Stabellini
2025-07-09 7:04 ` Jan Beulich
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-08 17:40 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk, xen-devel
On Tue, 8 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 02:07, Stefano Stabellini wrote:
> > On real time configuration with the null scheduler, we shouldn't
> > interrupt the guest execution unless strictly necessary: the guest could
> > be a real time guest (e.g. FreeRTOS) and interrupting its execution
> > could lead to a missed deadline.
> >
> > The principal source of interruptions is IPIs. Remove the unnecessary
> > IPI on all physical CPUs to sync the TSC when the TSC is known to be
> > reliable.
>
> If it had been truly unnecessary for all the time, I'm sure someone would
> have suggested to get rid of the overhead.
I am not so sure someone else would have suggested it given that Xen on
x86 has been mostly used on the datacenter where real-time is not a
requirement.
> IOW I think there is more to be
> said as to this being correct / safe, including in any corner cases.
>
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> > ---
> > xen/arch/x86/time.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> > index 59129f419d..bfd022174a 100644
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
> > local_irq_enable();
> > }
> >
> > + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > + boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
> > + return;
>
> This would render the (first of the two) invocation(s) of the function from
> verify_tsc_reliability() (largely) dead; it would then be only r.master_stime
> which gets updated (see also Alejandro's reply), which surely wouldn't have
> required that call in the first place.
I'll follow Alejandro's suggestions
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 17:40 ` Stefano Stabellini
@ 2025-07-08 17:53 ` Alejandro Vallejo
0 siblings, 0 replies; 23+ messages in thread
From: Alejandro Vallejo @ 2025-07-08 17:53 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel, jbeulich, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
Jason.Andryuk
On Tue Jul 8, 2025 at 7:40 PM CEST, Stefano Stabellini wrote:
> On Tue, 8 Jul 2025, Alejandro Vallejo wrote:
>> On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
>> > On real time configuration with the null scheduler, we shouldn't
>> > interrupt the guest execution unless strictly necessary: the guest could
>> > be a real time guest (e.g. FreeRTOS) and interrupting its execution
>> > could lead to a missed deadline.
>> >
>> > The principal source of interruptions is IPIs. Remove the unnecessary
>> > IPI on all physical CPUs to sync the TSC when the TSC is known to be
>> > reliable.
>> >
>> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
>> > ---
>> > xen/arch/x86/time.c | 4 ++++
>> > 1 file changed, 4 insertions(+)
>> >
>> > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> > index 59129f419d..bfd022174a 100644
>> > --- a/xen/arch/x86/time.c
>> > +++ b/xen/arch/x86/time.c
>> > @@ -2303,6 +2303,10 @@ static void cf_check time_calibration(void *unused)
>> > local_irq_enable();
>> > }
>> >
>> > + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>> > + boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
>> > + return;
>> > +
>>
>> This should check "(tsc_flags & TSC_RELIABLE_SOCKET)" as well. The TSCs might
>> still be unsynchronized across sockets.
>>
>> I'm still quite confused as to how Xen (mis)handles time, but wouldn't this need
>> to go inside the branch above? If the clocksource is not the TSC as well the TSC
>> can still drift with respect to the actual clocksource (PIT, HPET or ACPI timer).
>
> I can move it inside the previous if
>
>
>> If so, we could probably do an early return in the branch above ignoring the
>> conditions (they are required for picking the TSC clocksource already, including
>> synchronization across sockets).
>>
>> Another matter is whether we could drop the "master_stime" write. Would we
>> care about it at all?
>
> I'll drop it.
>
> Is this what you had in mind?
>
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 59129f419d..d72e640f72 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
> };
>
> if ( clocksource_is_tsc() )
> - {
> - local_irq_disable();
> - r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> - local_irq_enable();
> - }
> + return;
>
> cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
>
Yes, I think that would do.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-08 17:11 ` Stefano Stabellini
@ 2025-07-09 5:37 ` Jan Beulich
2025-07-10 0:44 ` Stefano Stabellini
2025-07-09 14:10 ` Roger Pau Monné
1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2025-07-09 5:37 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Roger Pau Monné, xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On 08.07.2025 19:11, Stefano Stabellini wrote:
> On Tue, 8 Jul 2025, Jan Beulich wrote:
>> On 08.07.2025 12:11, Roger Pau Monné wrote:
>>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
>>>> Hi all,
>>>>
>>>> This short patch series improves Xen real-time execution on AMD x86
>>>> processors.
>>>>
>>>> The key to real-time performance is deterministic guest execution times
>>>> and deterministic guest interrupt latency. In such configurations, the
>>>> null scheduler is typically used, and there should be no IPIs or other
>>>> sources of vCPU execution interruptions beyond the guest timer interrupt
>>>> as configured by the guest, and any passthrough interrupts for
>>>> passthrough devices.
>>>>
>>>> This is because, upon receiving a critical interrupt, the guest (such as
>>>> FreeRTOS or Zephyr) typically has a very short window of time to
>>>> complete the required action. Being interrupted in the middle of this
>>>> critical section could prevent the guest from completing the action
>>>> within the allotted time, leading to malfunctions.
>>>
>>> There's IMO still one pending issue after this series on x86, maybe
>>> you have addressed this with some local patch.
>>
>> Not just one, I think. We use IPIs for other purposes as well. The way
>> I read the text above, all of them are a (potential) problem.
>
> Yes, all of them are potentially a problem. If you know of any other
> IPI, please let me know and I'll try to remove them.
INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe
also others in that group of vectors (see irq-vectors.h).
> One of my goals
> posting this series was to raise awareness on this issue and attempting
> to fix it with your help. It is not just IPIs, also Xen timers and other
> things that could cause the guest to trap into Xen without the guest
> knowledge. Typically IPIs are the worst offenders in my experience.
>
> On ARM, I have done several experiments where, after the system is
> configured correctly, I can see that if the RTOS does nothing, there are
> no traps in Xen on the RTOS vCPU/pCPU for seconds.
Being quiescent when the system is idle is only part of the overall
requirement, though?
Jan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
2025-07-08 17:40 ` Stefano Stabellini
@ 2025-07-09 7:04 ` Jan Beulich
0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2025-07-09 7:04 UTC (permalink / raw)
To: Stefano Stabellini
Cc: andrew.cooper3, roger.pau, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk, xen-devel
On 08.07.2025 19:40, Stefano Stabellini wrote:
> On Tue, 8 Jul 2025, Jan Beulich wrote:
>> On 08.07.2025 02:07, Stefano Stabellini wrote:
>>> On real time configuration with the null scheduler, we shouldn't
>>> interrupt the guest execution unless strictly necessary: the guest could
>>> be a real time guest (e.g. FreeRTOS) and interrupting its execution
>>> could lead to a missed deadline.
>>>
>>> The principal source of interruptions is IPIs. Remove the unnecessary
>>> IPI on all physical CPUs to sync the TSC when the TSC is known to be
>>> reliable.
>>
>> If it had been truly unnecessary for all the time, I'm sure someone would
>> have suggested to get rid of the overhead.
>
> I am not so sure someone else would have suggested it given that Xen on
> x86 has been mostly used on the datacenter where real-time is not a
> requirement.
What I mean to indicate is that we're generally always on the hunt of
unnecessary overhead that can be eliminated.
Jan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-08 17:11 ` Stefano Stabellini
2025-07-09 5:37 ` Jan Beulich
@ 2025-07-09 14:10 ` Roger Pau Monné
1 sibling, 0 replies; 23+ messages in thread
From: Roger Pau Monné @ 2025-07-09 14:10 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Jan Beulich, xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On Tue, Jul 08, 2025 at 10:11:18AM -0700, Stefano Stabellini wrote:
> On Tue, 8 Jul 2025, Jan Beulich wrote:
> > On 08.07.2025 12:11, Roger Pau Monné wrote:
> > > On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
> > > Interrupt forwarding
> > > from Xen into HVM/PVH guests uses a softirq to do the injection, which
> > > means there's a non-deterministic window of latency between when the
> > > interrupt is received by Xen, as to when it's injected to the guest,
> > > because the softirq might not get processed right after being set as
> > > pending (there might be other softirqs to process, or simply Xen might
> > > be busy doing some other operation).
> > >
> > > I think you want to look into adding a new command line option or
> > > similar, that allows selecting whether guest IRQs are deferred to a
> > > softirq for injection, or are injected as part of the processing done
> > > in the IRQ handler itself.
> > >
> > > Otherwise there will always be a non-deterministic amount of latency
> > > on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen
> > > some weird/unexpected variance when doing this passthrough interrupt
> > > latency measurements on x86?
>
> While this is not great and I agree with Roger that it should be
> improved (I'll try to do so), in a well configured system I expect that
> there should be no other softirqs on the RTOS vCPU/pCPU so it shouldn't
> matter much if it is raise as a softirq or not?
Possibly - if the physical CPU where the interrupt is injected is also
the one where the target vCPU is running it won't make much of a
difference whether injection to the guest is deferred to a softirq, as
softirqs must always be processed before returning to guest context.
So I would think that when using the interrupt-follows-vCPU Xen model,
where interrupts are moved around to follow the vCPUs they target,
this extra latency would only be seen when the interrupt is delivered
to a CPU different than the one where the target guest vCPU is
running, which is never in your scenario because you pin vCPUs.
Roger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-09 5:37 ` Jan Beulich
@ 2025-07-10 0:44 ` Stefano Stabellini
2025-07-10 7:02 ` Roger Pau Monné
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-10 0:44 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Roger Pau Monné, xen-devel,
Andrew Cooper, Xenia.Ragiadakou, alejandro.garciavallejo,
Jason.Andryuk
[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]
On Wed, 9 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 19:11, Stefano Stabellini wrote:
> > On Tue, 8 Jul 2025, Jan Beulich wrote:
> >> On 08.07.2025 12:11, Roger Pau Monné wrote:
> >>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
> >>>> Hi all,
> >>>>
> >>>> This short patch series improves Xen real-time execution on AMD x86
> >>>> processors.
> >>>>
> >>>> The key to real-time performance is deterministic guest execution times
> >>>> and deterministic guest interrupt latency. In such configurations, the
> >>>> null scheduler is typically used, and there should be no IPIs or other
> >>>> sources of vCPU execution interruptions beyond the guest timer interrupt
> >>>> as configured by the guest, and any passthrough interrupts for
> >>>> passthrough devices.
> >>>>
> >>>> This is because, upon receiving a critical interrupt, the guest (such as
> >>>> FreeRTOS or Zephyr) typically has a very short window of time to
> >>>> complete the required action. Being interrupted in the middle of this
> >>>> critical section could prevent the guest from completing the action
> >>>> within the allotted time, leading to malfunctions.
> >>>
> >>> There's IMO still one pending issue after this series on x86, maybe
> >>> you have addressed this with some local patch.
> >>
> >> Not just one, I think. We use IPIs for other purposes as well. The way
> >> I read the text above, all of them are a (potential) problem.
> >
> > Yes, all of them are potentially a problem. If you know of any other
> > IPI, please let me know and I'll try to remove them.
>
> INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe
> also others in that group of vectors (see irq-vectors.h).
Thanks Jan, I'll look into those.
> > One of my goals
> > posting this series was to raise awareness on this issue and attempting
> > to fix it with your help. It is not just IPIs, also Xen timers and other
> > things that could cause the guest to trap into Xen without the guest
> > knowledge. Typically IPIs are the worst offenders in my experience.
> >
> > On ARM, I have done several experiments where, after the system is
> > configured correctly, I can see that if the RTOS does nothing, there are
> > no traps in Xen on the RTOS vCPU/pCPU for seconds.
>
> Being quiescent when the system is idle is only part of the overall
> requirement, though?
Actually being quiescent when the system is idle is not a requirement.
The only requirements are:
1) quick interrupt injection into the RTOS
2) the RTOS must be undisturbed while executing the critical region
1) mostly means that the physical interrupt should be delivered to the
same pCPU running the RTOS vCPU. Otherwise the extra IPI causes unwanted
delays.
2) means that the RTOS must be undisturbed when executing the critical
section, which is typically right after receiving the interrupt and only
last for less than 1ms. In practice, it means the RTOS should absolutely
not be descheduled and there should be no (unnecessary) traps into Xen
while the RTOS is executing the critical section. It is expected that
the RTOS will run the critical section with interrupts disabled.
That's pretty much it. If we get this right, we have solved 99% of the
problem.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-10 0:44 ` Stefano Stabellini
@ 2025-07-10 7:02 ` Roger Pau Monné
2025-07-10 8:02 ` Jan Beulich
0 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2025-07-10 7:02 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Jan Beulich, xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote:
> On Wed, 9 Jul 2025, Jan Beulich wrote:
> > On 08.07.2025 19:11, Stefano Stabellini wrote:
> > > On Tue, 8 Jul 2025, Jan Beulich wrote:
> > >> On 08.07.2025 12:11, Roger Pau Monné wrote:
> > >>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> This short patch series improves Xen real-time execution on AMD x86
> > >>>> processors.
> > >>>>
> > >>>> The key to real-time performance is deterministic guest execution times
> > >>>> and deterministic guest interrupt latency. In such configurations, the
> > >>>> null scheduler is typically used, and there should be no IPIs or other
> > >>>> sources of vCPU execution interruptions beyond the guest timer interrupt
> > >>>> as configured by the guest, and any passthrough interrupts for
> > >>>> passthrough devices.
> > >>>>
> > >>>> This is because, upon receiving a critical interrupt, the guest (such as
> > >>>> FreeRTOS or Zephyr) typically has a very short window of time to
> > >>>> complete the required action. Being interrupted in the middle of this
> > >>>> critical section could prevent the guest from completing the action
> > >>>> within the allotted time, leading to malfunctions.
> > >>>
> > >>> There's IMO still one pending issue after this series on x86, maybe
> > >>> you have addressed this with some local patch.
> > >>
> > >> Not just one, I think. We use IPIs for other purposes as well. The way
> > >> I read the text above, all of them are a (potential) problem.
> > >
> > > Yes, all of them are potentially a problem. If you know of any other
> > > IPI, please let me know and I'll try to remove them.
> >
> > INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe
> > also others in that group of vectors (see irq-vectors.h).
>
> Thanks Jan, I'll look into those.
>
>
> > > One of my goals
> > > posting this series was to raise awareness on this issue and attempting
> > > to fix it with your help. It is not just IPIs, also Xen timers and other
> > > things that could cause the guest to trap into Xen without the guest
> > > knowledge. Typically IPIs are the worst offenders in my experience.
> > >
> > > On ARM, I have done several experiments where, after the system is
> > > configured correctly, I can see that if the RTOS does nothing, there are
> > > no traps in Xen on the RTOS vCPU/pCPU for seconds.
> >
> > Being quiescent when the system is idle is only part of the overall
> > requirement, though?
>
> Actually being quiescent when the system is idle is not a requirement.
>
> The only requirements are:
> 1) quick interrupt injection into the RTOS
> 2) the RTOS must be undisturbed while executing the critical region
>
> 1) mostly means that the physical interrupt should be delivered to the
> same pCPU running the RTOS vCPU. Otherwise the extra IPI causes unwanted
> delays.
This should already be the case, in the Xen model interrupts follow
vCPUs, so if you use pinning the vCPU should always be running
on the pCPU that's the target of the physical interrupt.
> 2) means that the RTOS must be undisturbed when executing the critical
> section, which is typically right after receiving the interrupt and only
> last for less than 1ms. In practice, it means the RTOS should absolutely
> not be descheduled and there should be no (unnecessary) traps into Xen
> while the RTOS is executing the critical section. It is expected that
> the RTOS will run the critical section with interrupts disabled.
What about other external interrupts? While the guest runs the
critical interrupt handling section with interrupts disabled, an
external interrupt from a device targeting the pCPU could cause a
vmexit. I'm not aware of a nice way to solve this however, as for
PVH/HVM Xen doesn't know when the guest has finished interrupt
processing (iret). Maybe this is not an issue in practice if you
isolate interrupts to different vCPUs (you might have to do this
already to ensure deterministic latency).
Roger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-10 7:02 ` Roger Pau Monné
@ 2025-07-10 8:02 ` Jan Beulich
2025-07-10 21:39 ` Stefano Stabellini
0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2025-07-10 8:02 UTC (permalink / raw)
To: Roger Pau Monné, Stefano Stabellini
Cc: xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
On 10.07.2025 09:02, Roger Pau Monné wrote:
> On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote:
>> 2) means that the RTOS must be undisturbed when executing the critical
>> section, which is typically right after receiving the interrupt and only
>> last for less than 1ms. In practice, it means the RTOS should absolutely
>> not be descheduled and there should be no (unnecessary) traps into Xen
>> while the RTOS is executing the critical section. It is expected that
>> the RTOS will run the critical section with interrupts disabled.
>
> What about other external interrupts? While the guest runs the
> critical interrupt handling section with interrupts disabled, an
> external interrupt from a device targeting the pCPU could cause a
> vmexit.
For interrupts to be handled by the guest, we may need to finally gain AVIC
support (albeit I'm not sure how close that is to VMX-es posted interrupts).
For interrupts handled in Xen the only way would be to allow the guest to
announce such critical sections to Xen. Which, besides being a security
concern, may of course itself represent unacceptable overhead.
Jan
> I'm not aware of a nice way to solve this however, as for
> PVH/HVM Xen doesn't know when the guest has finished interrupt
> processing (iret). Maybe this is not an issue in practice if you
> isolate interrupts to different vCPUs (you might have to do this
> already to ensure deterministic latency).
>
> Roger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-10 8:02 ` Jan Beulich
@ 2025-07-10 21:39 ` Stefano Stabellini
2025-07-11 1:23 ` Demi Marie Obenour
0 siblings, 1 reply; 23+ messages in thread
From: Stefano Stabellini @ 2025-07-10 21:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Roger Pau Monné, Stefano Stabellini, xen-devel,
Andrew Cooper, Xenia.Ragiadakou, alejandro.garciavallejo,
Jason.Andryuk
[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]
On Thu, 10 Jul 2025, Jan Beulich wrote:
> On 10.07.2025 09:02, Roger Pau Monné wrote:
> > On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote:
> >> 2) means that the RTOS must be undisturbed when executing the critical
> >> section, which is typically right after receiving the interrupt and only
> >> last for less than 1ms. In practice, it means the RTOS should absolutely
> >> not be descheduled and there should be no (unnecessary) traps into Xen
> >> while the RTOS is executing the critical section. It is expected that
> >> the RTOS will run the critical section with interrupts disabled.
> >
> > What about other external interrupts? While the guest runs the
> > critical interrupt handling section with interrupts disabled, an
> > external interrupt from a device targeting the pCPU could cause a
> > vmexit.
>
> For interrupts to be handled by the guest, we may need to finally gain AVIC
> support (albeit I'm not sure how close that is to VMX-es posted interrupts).
> For interrupts handled in Xen the only way would be to allow the guest to
> announce such critical sections to Xen. Which, besides being a security
> concern, may of course itself represent unacceptable overhead.
In the past, I wrote a patch for an ARM user basically to do what you
suggested: "announce such critical sections to Xen". It is easy for Xen
to know when the critical section start: upon receiving the critical
interrupt. I added an hypercall so that the RTOS could tell Xen when it
ends. This is the kind of dirty patch that is very effective but
difficult to generalize. As an example, you can pause all other VMs
during the critical section to make sure the RTOS has full bandwidth on
the bus. The critical section is much shorter than a scheduler slot
anyway. I did not try to upstream the patch.
> > I'm not aware of a nice way to solve this however, as for
> > PVH/HVM Xen doesn't know when the guest has finished interrupt
> > processing (iret). Maybe this is not an issue in practice if you
> > isolate interrupts to different vCPUs (you might have to do this
> > already to ensure deterministic latency).
Yeah, that should be solvable by moving around other interrupts to other
pCPUs.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/2] Xen real-time x86
2025-07-10 21:39 ` Stefano Stabellini
@ 2025-07-11 1:23 ` Demi Marie Obenour
0 siblings, 0 replies; 23+ messages in thread
From: Demi Marie Obenour @ 2025-07-11 1:23 UTC (permalink / raw)
To: Stefano Stabellini, Jan Beulich
Cc: Roger Pau Monné, xen-devel, Andrew Cooper, Xenia.Ragiadakou,
alejandro.garciavallejo, Jason.Andryuk
[-- Attachment #1.1.1: Type: text/plain, Size: 2290 bytes --]
On 7/10/25 17:39, Stefano Stabellini wrote:
> On Thu, 10 Jul 2025, Jan Beulich wrote:
>> On 10.07.2025 09:02, Roger Pau Monné wrote:
>>> On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote:
>>>> 2) means that the RTOS must be undisturbed when executing the critical
>>>> section, which is typically right after receiving the interrupt and only
>>>> last for less than 1ms. In practice, it means the RTOS should absolutely
>>>> not be descheduled and there should be no (unnecessary) traps into Xen
>>>> while the RTOS is executing the critical section. It is expected that
>>>> the RTOS will run the critical section with interrupts disabled.
>>>
>>> What about other external interrupts? While the guest runs the
>>> critical interrupt handling section with interrupts disabled, an
>>> external interrupt from a device targeting the pCPU could cause a
>>> vmexit.
>>
>> For interrupts to be handled by the guest, we may need to finally gain AVIC
>> support (albeit I'm not sure how close that is to VMX-es posted interrupts).
>> For interrupts handled in Xen the only way would be to allow the guest to
>> announce such critical sections to Xen. Which, besides being a security
>> concern, may of course itself represent unacceptable overhead.
>
> In the past, I wrote a patch for an ARM user basically to do what you
> suggested: "announce such critical sections to Xen". It is easy for Xen
> to know when the critical section start: upon receiving the critical
> interrupt. I added an hypercall so that the RTOS could tell Xen when it
> ends. This is the kind of dirty patch that is very effective but
> difficult to generalize. As an example, you can pause all other VMs
> during the critical section to make sure the RTOS has full bandwidth on
> the bus. The critical section is much shorter than a scheduler slot
> anyway. I did not try to upstream the patch.
Curious: why is the RTOS running on an x86 core at all, and not on a
microcontroller dedicated exclusively to real-time tasks? The
performance impact of isolating the RTOS from other tasks seems huge
compared to the cost of a tiny microcontroller that just runs the RTOS.
Have you considered upstreaming the patch?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-07-11 1:24 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 0:06 [PATCH 0/2] Xen real-time x86 Stefano Stabellini
2025-07-08 0:07 ` [PATCH 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
2025-07-08 9:54 ` Alejandro Vallejo
2025-07-08 17:40 ` Stefano Stabellini
2025-07-08 17:53 ` Alejandro Vallejo
2025-07-08 13:24 ` Jan Beulich
2025-07-08 17:40 ` Stefano Stabellini
2025-07-09 7:04 ` Jan Beulich
2025-07-08 0:07 ` [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL Stefano Stabellini
2025-07-08 3:23 ` Demi Marie Obenour
2025-07-08 10:25 ` Alejandro Vallejo
2025-07-08 13:28 ` Jan Beulich
2025-07-08 17:13 ` Stefano Stabellini
2025-07-08 10:11 ` [PATCH 0/2] Xen real-time x86 Roger Pau Monné
2025-07-08 13:31 ` Jan Beulich
2025-07-08 17:11 ` Stefano Stabellini
2025-07-09 5:37 ` Jan Beulich
2025-07-10 0:44 ` Stefano Stabellini
2025-07-10 7:02 ` Roger Pau Monné
2025-07-10 8:02 ` Jan Beulich
2025-07-10 21:39 ` Stefano Stabellini
2025-07-11 1:23 ` Demi Marie Obenour
2025-07-09 14:10 ` Roger Pau Monné
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.