All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Xen real-time x86
@ 2025-07-08 18:32 Stefano Stabellini
  2025-07-08 18:32 ` [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
  2025-07-08 18:32 ` [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL Stefano Stabellini
  0 siblings, 2 replies; 9+ messages in thread
From: Stefano Stabellini @ 2025-07-08 18:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné,
	stefano.stabellini, Xenia.Ragiadakou, alejandro.garciavallejo,
	Jason.Andryuk

Hi all,

This short patch series improves Xen real-time execution on AMD x86
processors.

The key to real-time performance is deterministic guest execution times
and deterministic guest interrupt latency. In such configurations, the
null scheduler is typically used, and there should be no IPIs or other
sources of vCPU execution interruptions beyond the guest timer interrupt
as configured by the guest, and any passthrough interrupts for
passthrough devices.

This is because, upon receiving a critical interrupt, the guest (such as
FreeRTOS or Zephyr) typically has a very short window of time to
complete the required action. Being interrupted in the middle of this
critical section could prevent the guest from completing the action
within the allotted time, leading to malfunctions.

To address this, the patch series disables IPIs that could potentially
affect the real-time domain.

Cheers,
Stefano


Stefano Stabellini (2):
      xen/x86: don't send IPI to sync TSC when it is reliable
      xen/x86: introduce MCE_NONFATAL

 xen/arch/x86/Kconfig             | 14 ++++++++++++++
 xen/arch/x86/cpu/mcheck/Makefile |  6 +++---
 xen/arch/x86/time.c              |  6 +-----
 3 files changed, 18 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
  2025-07-08 18:32 [PATCH v2 0/2] Xen real-time x86 Stefano Stabellini
@ 2025-07-08 18:32 ` Stefano Stabellini
  2025-07-09  9:44   ` Alejandro Vallejo
  2025-07-10  6:53   ` Jan Beulich
  2025-07-08 18:32 ` [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL Stefano Stabellini
  1 sibling, 2 replies; 9+ messages in thread
From: Stefano Stabellini @ 2025-07-08 18:32 UTC (permalink / raw)
  To: xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, stefano.stabellini,
	Xenia.Ragiadakou, alejandro.garciavallejo, Jason.Andryuk

On real time configuration with the null scheduler, we shouldn't
interrupt the guest execution unless strictly necessary: the guest could
be a real time guest (e.g. FreeRTOS) and interrupting its execution
could lead to a missed deadline. The principal source of interruptions
is IPIs.

When TSC is the chosen clocksource, we know it is reliable and
synchronized across cpus and clusters. Thus, we can return early
time_calibration because the calibration is not needed, removing the
related Xen timer and IPIs.

Also remove the master_stime write as it is unnecessary.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
---
Changes in v2:
- simplify the patch simply by returning early if clocksource_is_tsc()
- also remove setting r.master_stime as it is not needed
---
 xen/arch/x86/time.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 59129f419d..d72e640f72 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
     };
 
     if ( clocksource_is_tsc() )
-    {
-        local_irq_disable();
-        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
-        local_irq_enable();
-    }
+        return;
 
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL
  2025-07-08 18:32 [PATCH v2 0/2] Xen real-time x86 Stefano Stabellini
  2025-07-08 18:32 ` [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
@ 2025-07-08 18:32 ` Stefano Stabellini
  2025-07-09  9:46   ` Alejandro Vallejo
  1 sibling, 1 reply; 9+ messages in thread
From: Stefano Stabellini @ 2025-07-08 18:32 UTC (permalink / raw)
  To: xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, stefano.stabellini,
	Xenia.Ragiadakou, alejandro.garciavallejo, Jason.Andryuk

Today, checking for non-fatal MCE errors on AMD is very invasive: it
involves a periodic timer interrupting the physical CPU execution at
regular intervals. Moreover, when the timer fires, the handler sends an
IPI to all physical CPUs.

Both these actions are disruptive in terms of latency and deterministic
execution times for real-time workloads. They might miss a deadline due
to one of these IPIs. Make it possible to disable non-fatal MCE errors
checking with a new Kconfig option (MCE_NONFATAL).

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
---
Changes in v2:
- generalize the appraoch and remove the code when MCE_NONFATAL is not
  set
- move the new kconfig option to xen/arch/x86/Kconfig
---
 xen/arch/x86/Kconfig             | 14 ++++++++++++++
 xen/arch/x86/cpu/mcheck/Makefile |  6 +++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 752d5141bb..9ec0fb0bed 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -248,6 +248,20 @@ config X2APIC_MIXED
 
 endchoice
 
+config MCE_NONFATAL
+	bool "Check for non-fatal MCEs" if EXPERT
+	default y
+	help
+	  Check for non-fatal MCE errors.
+	
+	  When this option is on (default), Xen regularly checks for
+	  non-fatal MCEs potentially occurring on all physical CPUs. The
+	  checking is done via timers and IPI interrupts, which is
+	  acceptable in most configurations, but not for real-time.
+	
+	  Turn this option off if you plan on deploying real-time workloads
+	  on Xen.
+
 config GUEST
 	bool
 
diff --git a/xen/arch/x86/cpu/mcheck/Makefile b/xen/arch/x86/cpu/mcheck/Makefile
index e6cb4dd503..c70b441888 100644
--- a/xen/arch/x86/cpu/mcheck/Makefile
+++ b/xen/arch/x86/cpu/mcheck/Makefile
@@ -1,12 +1,12 @@
-obj-$(CONFIG_AMD) += amd_nonfatal.o
+obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
 obj-$(CONFIG_AMD) += mce_amd.o
 obj-y += mcaction.o
 obj-y += barrier.o
-obj-$(CONFIG_INTEL) += intel-nonfatal.o
+obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) += intel-nonfatal.o
 obj-y += mctelem.o
 obj-y += mce.o
 obj-y += mce-apei.o
 obj-$(CONFIG_INTEL) += mce_intel.o
-obj-y += non-fatal.o
+obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
 obj-y += util.o
 obj-y += vmce.o
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
  2025-07-08 18:32 ` [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
@ 2025-07-09  9:44   ` Alejandro Vallejo
  2025-07-10  6:53   ` Jan Beulich
  1 sibling, 0 replies; 9+ messages in thread
From: Alejandro Vallejo @ 2025-07-09  9:44 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
	Jason.Andryuk

On Tue Jul 8, 2025 at 8:32 PM CEST, Stefano Stabellini wrote:
> On real time configuration with the null scheduler, we shouldn't
> interrupt the guest execution unless strictly necessary: the guest could
> be a real time guest (e.g. FreeRTOS) and interrupting its execution
> could lead to a missed deadline. The principal source of interruptions
> is IPIs.
>
> When TSC is the chosen clocksource, we know it is reliable and
> synchronized across cpus and clusters. Thus, we can return early
> time_calibration because the calibration is not needed, removing the
> related Xen timer and IPIs.
>
> Also remove the master_stime write as it is unnecessary.
>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> ---
> Changes in v2:
> - simplify the patch simply by returning early if clocksource_is_tsc()
> - also remove setting r.master_stime as it is not needed
> ---
>  xen/arch/x86/time.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 59129f419d..d72e640f72 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
>      };
>  
>      if ( clocksource_is_tsc() )
> -    {
> -        local_irq_disable();
> -        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> -        local_irq_enable();
> -    }
> +        return;
>  
>      cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
>  

As far as I can tell, this shouldn't cause problems. But I'd prefer if someone
knowledgeable in the calibration code (Jan?) pitches in as to the effects of
master_stime in the absence of calibration. Otherwise:

  Reviewed-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL
  2025-07-08 18:32 ` [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL Stefano Stabellini
@ 2025-07-09  9:46   ` Alejandro Vallejo
  2025-07-09 15:00     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Alejandro Vallejo @ 2025-07-09  9:46 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
	Jason.Andryuk

On Tue Jul 8, 2025 at 8:32 PM CEST, Stefano Stabellini wrote:
> Today, checking for non-fatal MCE errors on AMD is very invasive: it
> involves a periodic timer interrupting the physical CPU execution at
> regular intervals. Moreover, when the timer fires, the handler sends an
> IPI to all physical CPUs.
>
> Both these actions are disruptive in terms of latency and deterministic
> execution times for real-time workloads. They might miss a deadline due
> to one of these IPIs. Make it possible to disable non-fatal MCE errors
> checking with a new Kconfig option (MCE_NONFATAL).
>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>

LGTM.

    Reviewed-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL
  2025-07-09  9:46   ` Alejandro Vallejo
@ 2025-07-09 15:00     ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2025-07-09 15:00 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: andrew.cooper3, roger.pau, Xenia.Ragiadakou, Jason.Andryuk,
	Alejandro Vallejo, xen-devel

On 09.07.2025 11:46, Alejandro Vallejo wrote:
> On Tue Jul 8, 2025 at 8:32 PM CEST, Stefano Stabellini wrote:
>> Today, checking for non-fatal MCE errors on AMD is very invasive: it
>> involves a periodic timer interrupting the physical CPU execution at
>> regular intervals. Moreover, when the timer fires, the handler sends an
>> IPI to all physical CPUs.
>>
>> Both these actions are disruptive in terms of latency and deterministic
>> execution times for real-time workloads. They might miss a deadline due
>> to one of these IPIs. Make it possible to disable non-fatal MCE errors
>> checking with a new Kconfig option (MCE_NONFATAL).
>>
>> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> 
> LGTM.
> 
>     Reviewed-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
  2025-07-08 18:32 ` [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
  2025-07-09  9:44   ` Alejandro Vallejo
@ 2025-07-10  6:53   ` Jan Beulich
  2025-07-11  1:34     ` Stefano Stabellini
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2025-07-10  6:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: andrew.cooper3, roger.pau, Xenia.Ragiadakou,
	alejandro.garciavallejo, Jason.Andryuk, xen-devel

On 08.07.2025 20:32, Stefano Stabellini wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
>      };
>  
>      if ( clocksource_is_tsc() )
> -    {
> -        local_irq_disable();
> -        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> -        local_irq_enable();
> -    }
> +        return;

Assuming the rendezvous can indeed be entirely skipped, I agree that there's
no point calling read_platform_stime() here. Yet to yield a consistent
result, more changes are then necessary imo:
- as indicated before, the invocation of this function from
  verify_tsc_reliability() when plt_tsc was chosen is then entirely
  pointless,
- time_calibration_nop_rendezvous() would then apparently want purging, not
  the least to make clear that TIME_CALIBRATE_SOFTIRQ is never raised in
  this mode (one of your goals after all, aiui),
- the function being a timer handler, it would be preferable if the timer
  wasn't ever activated in this mode (at which point rather than returning
  early, the code above could simply be purged, maybe replaced by e.g. an
  assertion),
- the above in particular requires dealing with cpu_frequency_change() (the
  other of the two places where the timer is actually activated).
Some care may be needed in all of this taking into consideration that the
platform timer change to TSC happens late. Albeit commit f954a1bf5f74
("x86/time: change initiation of the calibration timer") has imo eliminated
the main concern here.

As to skipping the rendezvous: Besides invoking the calibration softirq,
time_calibration_nop_rendezvous() also updates the per-CPU cpu_calibration
fields. There would thus need to be a pretty formal proof that calculations
involving ->local_stime or ->local_tsc can't possibly degrade or even
degenerate when they remain at their boot-time values. (As to
->master_stime, afaict the field simply isn't used at all in that mode,
which is a fair part of the reason why the code change above is okay _if_
the rendezvous itself can be eliminated. The justification for that could
also do with extending some, considering that much of the involved code is
pretty subtle.) Alternatively, if such a proof turned out impossible,
another way of updating the fields every once in a while would need adding.

Finally, what you do here isn't entirely reliable as to your apparent end
goal: "clocksource=tsc" is respected only when tsc_check_reliability()
completes with an acceptable outcome. There's certainly some variability in
this across multiple runs, i.e. if things went extremely bad, once in blue
moon you may end up with the TSC being rejected for use as platform timer.

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
  2025-07-10  6:53   ` Jan Beulich
@ 2025-07-11  1:34     ` Stefano Stabellini
  2025-07-11  6:10       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Stabellini @ 2025-07-11  1:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
	alejandro.garciavallejo, Jason.Andryuk, xen-devel, sstabellini

On Thu, 10 Jul 2025, Jan Beulich wrote:
> On 08.07.2025 20:32, Stefano Stabellini wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -2297,11 +2297,7 @@ static void cf_check time_calibration(void *unused)
> >      };
> >  
> >      if ( clocksource_is_tsc() )
> > -    {
> > -        local_irq_disable();
> > -        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> > -        local_irq_enable();
> > -    }
> > +        return;
> 
> Assuming the rendezvous can indeed be entirely skipped, I agree that there's
> no point calling read_platform_stime() here. Yet to yield a consistent
> result, more changes are then necessary imo:
> - as indicated before, the invocation of this function from
>   verify_tsc_reliability() when plt_tsc was chosen is then entirely
>   pointless,
> - time_calibration_nop_rendezvous() would then apparently want purging, not
>   the least to make clear that TIME_CALIBRATE_SOFTIRQ is never raised in
>   this mode (one of your goals after all, aiui),

Good suggestions.


> - the function being a timer handler, it would be preferable if the timer
>   wasn't ever activated in this mode (at which point rather than returning
>   early, the code above could simply be purged, maybe replaced by e.g. an
>   assertion),

I see your point about the timer not being activated in the first place.

But if we want to make the code more reliable we should keep the if
(clocksource_is_tsc()) return; in time_calibration. That way, in case of
mistakes elsewhere, still the desired behavior is obtained.

I'll add the changes to cpu_frequency_change and local_time_calibration.
I'll append an incremental patch to clarify my intent.


> - the above in particular requires dealing with cpu_frequency_change() (the
>   other of the two places where the timer is actually activated).
>
> Some care may be needed in all of this taking into consideration that the
> platform timer change to TSC happens late. Albeit commit f954a1bf5f74
> ("x86/time: change initiation of the calibration timer") has imo eliminated
> the main concern here.
> 
> As to skipping the rendezvous: Besides invoking the calibration softirq,
> time_calibration_nop_rendezvous() also updates the per-CPU cpu_calibration
> fields. There would thus need to be a pretty formal proof that calculations
> involving ->local_stime or ->local_tsc can't possibly degrade or even
> degenerate when they remain at their boot-time values. (As to
> ->master_stime, afaict the field simply isn't used at all in that mode,
> which is a fair part of the reason why the code change above is okay _if_
> the rendezvous itself can be eliminated. The justification for that could
> also do with extending some, considering that much of the involved code is
> pretty subtle.) Alternatively, if such a proof turned out impossible,
> another way of updating the fields every once in a while would need adding.

Do you mean a formal proof that the TSC is actually stable from a
hardware perspective? The software algorithm is the same no matter the
number of updates.


> Finally, what you do here isn't entirely reliable as to your apparent end
> goal: "clocksource=tsc" is respected only when tsc_check_reliability()
> completes with an acceptable outcome. There's certainly some variability in
> this across multiple runs, i.e. if things went extremely bad, once in blue
> moon you may end up with the TSC being rejected for use as platform timer.
 
That is interesting! One option is to change the code so that
clocksource=tsc is always respected. I have appended the change on top
of this patch. Please let me know if you have other suggestions.


diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index d72e640f72..d29266086d 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1877,7 +1877,7 @@ int cpu_frequency_change(u64 freq)
     update_vcpu_system_time(current);
 
     /* A full epoch should pass before we check for deviation. */
-    if ( smp_processor_id() == 0 )
+    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
     {
         set_timer(&calibration_timer, NOW() + EPOCH);
         platform_time_calibration();
@@ -2024,7 +2024,7 @@ static void cf_check local_time_calibration(void)
     update_vcpu_system_time(current);
 
  out:
-    if ( smp_processor_id() == 0 )
+    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
     {
         set_timer(&calibration_timer, NOW() + EPOCH);
         platform_time_calibration();
@@ -2271,22 +2271,6 @@ static void cf_check time_calibration_std_rendezvous(void *_r)
     time_calibration_rendezvous_tail(r, 0, rdtsc_ordered());
 }
 
-/*
- * Rendezvous function used when clocksource is TSC and
- * no CPU hotplug will be performed.
- */
-static void cf_check time_calibration_nop_rendezvous(void *rv)
-{
-    const struct calibration_rendezvous *r = rv;
-    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
-
-    c->local_tsc    = r->master_tsc_stamp;
-    c->local_stime  = r->master_stime;
-    c->master_stime = r->master_stime;
-
-    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
-}
-
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -2488,7 +2472,7 @@ static int __init cf_check verify_tsc_reliability(void)
          * CPUs are booted.
          */
         tsc_check_reliability();
-        if ( tsc_max_warp )
+        if ( tsc_max_warp && strcmp(opt_clocksource, "tsc") )
         {
             printk("TSC warp detected, disabling TSC_RELIABLE\n");
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
@@ -2506,21 +2490,12 @@ static int __init cf_check verify_tsc_reliability(void)
              */
             on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
 
-            /*
-             * We won't do CPU Hotplug and TSC clocksource is being used which
-             * means we have a reliable TSC, plus we don't sync with any other
-             * clocksource so no need for rendezvous.
-             */
-            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
-
             /* Finish platform timer switch. */
             try_platform_timer_tail();
 
             printk("Switched to Platform timer %s TSC\n",
                    freq_string(plt_src.frequency));
 
-            time_calibration(NULL);
-
             return 0;
         }
     }


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable
  2025-07-11  1:34     ` Stefano Stabellini
@ 2025-07-11  6:10       ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2025-07-11  6:10 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stefano Stabellini, andrew.cooper3, roger.pau, Xenia.Ragiadakou,
	alejandro.garciavallejo, Jason.Andryuk, xen-devel

On 11.07.2025 03:34, Stefano Stabellini wrote:
> On Thu, 10 Jul 2025, Jan Beulich wrote:
>> - the function being a timer handler, it would be preferable if the timer
>>   wasn't ever activated in this mode (at which point rather than returning
>>   early, the code above could simply be purged, maybe replaced by e.g. an
>>   assertion),
> 
> I see your point about the timer not being activated in the first place.
> 
> But if we want to make the code more reliable we should keep the if
> (clocksource_is_tsc()) return; in time_calibration. That way, in case of
> mistakes elsewhere, still the desired behavior is obtained.
> 
> I'll add the changes to cpu_frequency_change and local_time_calibration.
> I'll append an incremental patch to clarify my intent.
> 
> 
>> - the above in particular requires dealing with cpu_frequency_change() (the
>>   other of the two places where the timer is actually activated).
>>
>> Some care may be needed in all of this taking into consideration that the
>> platform timer change to TSC happens late. Albeit commit f954a1bf5f74
>> ("x86/time: change initiation of the calibration timer") has imo eliminated
>> the main concern here.
>>
>> As to skipping the rendezvous: Besides invoking the calibration softirq,
>> time_calibration_nop_rendezvous() also updates the per-CPU cpu_calibration
>> fields. There would thus need to be a pretty formal proof that calculations
>> involving ->local_stime or ->local_tsc can't possibly degrade or even
>> degenerate when they remain at their boot-time values. (As to
>> ->master_stime, afaict the field simply isn't used at all in that mode,
>> which is a fair part of the reason why the code change above is okay _if_
>> the rendezvous itself can be eliminated. The justification for that could
>> also do with extending some, considering that much of the involved code is
>> pretty subtle.) Alternatively, if such a proof turned out impossible,
>> another way of updating the fields every once in a while would need adding.
> 
> Do you mean a formal proof that the TSC is actually stable from a
> hardware perspective? The software algorithm is the same no matter the
> number of updates.

No, I really mean what I said - as the deltas are going to get larger that
are used as inputs to the calculations, it is (at least to me) not entirely
obvious that the calculations using those deltas can't degrade.

>> Finally, what you do here isn't entirely reliable as to your apparent end
>> goal: "clocksource=tsc" is respected only when tsc_check_reliability()
>> completes with an acceptable outcome. There's certainly some variability in
>> this across multiple runs, i.e. if things went extremely bad, once in blue
>> moon you may end up with the TSC being rejected for use as platform timer.
>  
> That is interesting! One option is to change the code so that
> clocksource=tsc is always respected. I have appended the change on top
> of this patch. Please let me know if you have other suggestions.
> 
> 
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index d72e640f72..d29266086d 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1877,7 +1877,7 @@ int cpu_frequency_change(u64 freq)
>      update_vcpu_system_time(current);
>  
>      /* A full epoch should pass before we check for deviation. */
> -    if ( smp_processor_id() == 0 )
> +    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
>      {
>          set_timer(&calibration_timer, NOW() + EPOCH);
>          platform_time_calibration();
> @@ -2024,7 +2024,7 @@ static void cf_check local_time_calibration(void)
>      update_vcpu_system_time(current);
>  
>   out:
> -    if ( smp_processor_id() == 0 )
> +    if ( smp_processor_id() == 0 && !clocksource_is_tsc() )
>      {
>          set_timer(&calibration_timer, NOW() + EPOCH);
>          platform_time_calibration();

Is this necessary? In this mode we won't make it into this function anymore,
will we? Hence if anything an early-out would be applicable.

> @@ -2271,22 +2271,6 @@ static void cf_check time_calibration_std_rendezvous(void *_r)
>      time_calibration_rendezvous_tail(r, 0, rdtsc_ordered());
>  }
>  
> -/*
> - * Rendezvous function used when clocksource is TSC and
> - * no CPU hotplug will be performed.
> - */
> -static void cf_check time_calibration_nop_rendezvous(void *rv)
> -{
> -    const struct calibration_rendezvous *r = rv;
> -    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
> -
> -    c->local_tsc    = r->master_tsc_stamp;
> -    c->local_stime  = r->master_stime;
> -    c->master_stime = r->master_stime;
> -
> -    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
> -}
> -
>  static void (*time_calibration_rendezvous_fn)(void *) =
>      time_calibration_std_rendezvous;
>  
> @@ -2488,7 +2472,7 @@ static int __init cf_check verify_tsc_reliability(void)
>           * CPUs are booted.
>           */
>          tsc_check_reliability();
> -        if ( tsc_max_warp )
> +        if ( tsc_max_warp && strcmp(opt_clocksource, "tsc") )
>          {
>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> @@ -2506,21 +2490,12 @@ static int __init cf_check verify_tsc_reliability(void)
>               */
>              on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>  
> -            /*
> -             * We won't do CPU Hotplug and TSC clocksource is being used which
> -             * means we have a reliable TSC, plus we don't sync with any other
> -             * clocksource so no need for rendezvous.
> -             */
> -            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;

Much like you prefer to leave a safeguard in time_calibration(), I think you
want to either leave a safeguard in the rendezvous handler now "used" instead,
or you want to poison this pointer. Any of such safeguards then imo want to
include ASSERT_UNREACHABLE().

Plus of course I hope it goes without saying that much also depends on the
(to be extended) patch description.

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-11  6:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 18:32 [PATCH v2 0/2] Xen real-time x86 Stefano Stabellini
2025-07-08 18:32 ` [PATCH v2 1/2] xen/x86: don't send IPI to sync TSC when it is reliable Stefano Stabellini
2025-07-09  9:44   ` Alejandro Vallejo
2025-07-10  6:53   ` Jan Beulich
2025-07-11  1:34     ` Stefano Stabellini
2025-07-11  6:10       ` Jan Beulich
2025-07-08 18:32 ` [PATCH v2 2/2] xen/x86: introduce MCE_NONFATAL Stefano Stabellini
2025-07-09  9:46   ` Alejandro Vallejo
2025-07-09 15:00     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.