* [PATCH 0/2] time: fix time accounting for x86 HVM guests
@ 2026-04-14 10:33 Roger Pau Monne
2026-04-14 10:33 ` [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled Roger Pau Monne
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Roger Pau Monne @ 2026-04-14 10:33 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Teddy Astie,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko
Hello,
When not emulating the TSC the guest time value calculated by using the
vCPU time info page in HVM mode would drift between time synchronization
intervals. First patch fixes the drift, second patch makes the
calculation of cpu_khz round up the value for better accuracy.
Thanks, Roger.
Roger Pau Monne (2):
x86/time: use native TSC scaling factors when TSC is not scaled
xen/cpu: round up cpu_khz calculations
xen/arch/arm/time.c | 4 ++--
xen/arch/riscv/time.c | 2 +-
xen/arch/x86/time.c | 20 ++++++++++++++------
3 files changed, 17 insertions(+), 9 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-14 10:33 [PATCH 0/2] time: fix time accounting for x86 HVM guests Roger Pau Monne
@ 2026-04-14 10:33 ` Roger Pau Monne
2026-04-16 11:28 ` Jan Beulich
2026-04-14 10:33 ` [PATCH 2/2] xen/cpu: round up cpu_khz calculations Roger Pau Monne
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Roger Pau Monne @ 2026-04-14 10:33 UTC (permalink / raw)
To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Teddy Astie
When running HVM guest in native TSC mode avoid using the recalculated vTSC
scaling factors based on the cpu_khz value. Using the kHz based frequency
leads to the TSC scaling values possibly not being the same as the ones
used by the per CPU cpu_time->tsc_scale field, which introduces skew
between the guest and Xen's calculations of the system time.
On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
(note this is a worse-case scenario), the cpu_khz variable will be set to
1999999kHz, and hence 999Hz cycles will be not accounted for per second.
Over a second (the time synchronization period), this leads to a skew of:
cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
So far this has gone unnoticed because the time synchronization rendezvous
forces the update of the tsc_timestamp and system_time fields in the vCPU
time info area, and hence the skew only accumulates up to the rendezvous
period. Attempting to remove the rendezvous causes the skew to grow
unbounded.
Fix by using the native TSC scaling values (as used by Xen) when the guest
TSC is not scaled.
Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I'm worried about the usage of cpu_khz beyond simple printing it for
informational purposes. Overall I think it would be safer to store the
frequency in Hz, as to avoid losing the least significant digits.
In any case, that's a different change.
---
xen/arch/x86/time.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 4233ea507d40..244277c0a921 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1710,17 +1710,25 @@ static void collect_time_info(const struct vcpu *v,
else
{
if ( is_hvm_domain(d) && hvm_tsc_scaling_supported )
- {
tsc_stamp = hvm_scale_tsc(d, t->stamp.local_tsc);
- u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
- u->tsc_shift = d->arch.vtsc_to_ns.shift;
- }
else
- {
tsc_stamp = t->stamp.local_tsc;
+
+ /*
+ * HVM guests using the native TSC ratio should use the same per-CPU
+ * scaling factors as Xen. This ensures time keeping is always in sync
+ * between Xen and the guest.
+ */
+ if ( tsc_stamp == t->stamp.local_tsc )
+ {
u->tsc_to_system_mul = t->tsc_scale.mul_frac;
u->tsc_shift = t->tsc_scale.shift;
}
+ else
+ {
+ u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
+ u->tsc_shift = d->arch.vtsc_to_ns.shift;
+ }
}
u->tsc_timestamp = tsc_stamp;
--
2.53.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] xen/cpu: round up cpu_khz calculations
2026-04-14 10:33 [PATCH 0/2] time: fix time accounting for x86 HVM guests Roger Pau Monne
2026-04-14 10:33 ` [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled Roger Pau Monne
@ 2026-04-14 10:33 ` Roger Pau Monne
2026-04-14 11:36 ` Tu Dinh
2026-04-16 7:55 ` Jan Beulich
2026-04-14 20:08 ` [PATCH 0/2] time: fix time accounting for x86 HVM guests Stefano Stabellini
2026-04-15 8:44 ` Marek Marczykowski-Górecki
3 siblings, 2 replies; 13+ messages in thread
From: Roger Pau Monne @ 2026-04-14 10:33 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Alistair Francis, Connor Davis, Oleksii Kurochko, Jan Beulich,
Andrew Cooper, Teddy Astie
All arches truncate the cpu_khz without taking into account the less
significant digits. Instead use DIV_ROUND_UP() when scaling from Hz to kHz
to get as more accurate kHz value.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
While the possibly more accurate value is nice, I'm not sure it's actually
fixing any functional bug, and hence the lack of "Fixes:" tag.
---
xen/arch/arm/time.c | 4 ++--
xen/arch/riscv/time.c | 2 +-
xen/arch/x86/time.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index a12912a106a0..9e0c485c77db 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -118,7 +118,7 @@ static void __init preinit_dt_xen_time(void)
res = dt_property_read_u32(timer, "clock-frequency", &rate);
if ( res )
{
- cpu_khz = rate / 1000;
+ cpu_khz = DIV_ROUND_UP(rate, 1000);
validate_timer_frequency();
timer_dt_clock_frequency = rate;
}
@@ -136,7 +136,7 @@ void __init preinit_xen_time(void)
if ( !cpu_khz )
{
- cpu_khz = (READ_SYSREG(CNTFRQ_EL0) & CNTFRQ_MASK) / 1000;
+ cpu_khz = DIV_ROUND_UP(READ_SYSREG(CNTFRQ_EL0) & CNTFRQ_MASK, 1000);
validate_timer_frequency();
}
diff --git a/xen/arch/riscv/time.c b/xen/arch/riscv/time.c
index 7efa76fdbcb1..faca7b70e13a 100644
--- a/xen/arch/riscv/time.c
+++ b/xen/arch/riscv/time.c
@@ -40,7 +40,7 @@ static void __init preinit_dt_xen_time(void)
if ( !dt_property_read_u32(timer, "timebase-frequency", &rate) )
panic("Unable to find clock frequency\n");
- cpu_khz = rate / 1000;
+ cpu_khz = DIV_ROUND_UP(rate, 1000);
}
int reprogram_timer(s_time_t timeout)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 244277c0a921..b84414f00d05 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2642,7 +2642,7 @@ void __init early_time_init(void)
set_time_scale(&t->tsc_scale, tmp);
t->stamp.local_tsc = boot_tsc_stamp;
- cpu_khz = tmp / 1000;
+ cpu_khz = DIV_ROUND_UP(tmp, 1000);
printk("Detected %lu.%03lu MHz processor.\n",
cpu_khz / 1000, cpu_khz % 1000);
--
2.53.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xen/cpu: round up cpu_khz calculations
2026-04-14 10:33 ` [PATCH 2/2] xen/cpu: round up cpu_khz calculations Roger Pau Monne
@ 2026-04-14 11:36 ` Tu Dinh
2026-04-14 11:56 ` Roger Pau Monné
2026-04-16 7:55 ` Jan Beulich
1 sibling, 1 reply; 13+ messages in thread
From: Tu Dinh @ 2026-04-14 11:36 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko, Jan Beulich, Andrew Cooper, Teddy Astie
On 14/04/2026 12:36, Roger Pau Monne wrote:
> All arches truncate the cpu_khz without taking into account the less
> significant digits. Instead use DIV_ROUND_UP() when scaling from Hz to kHz
> to get as more accurate kHz value.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Couldn't DIV_ROUND be used here instead for a round-to-closest?
> ---
> While the possibly more accurate value is nice, I'm not sure it's actually
> fixing any functional bug, and hence the lack of "Fixes:" tag.
> ---
> xen/arch/arm/time.c | 4 ++--
> xen/arch/riscv/time.c | 2 +-
> xen/arch/x86/time.c | 2 +-
> 3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index a12912a106a0..9e0c485c77db 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -118,7 +118,7 @@ static void __init preinit_dt_xen_time(void)
> res = dt_property_read_u32(timer, "clock-frequency", &rate);
> if ( res )
> {
> - cpu_khz = rate / 1000;
> + cpu_khz = DIV_ROUND_UP(rate, 1000);
> validate_timer_frequency();
> timer_dt_clock_frequency = rate;
> }
> @@ -136,7 +136,7 @@ void __init preinit_xen_time(void)
>
> if ( !cpu_khz )
> {
> - cpu_khz = (READ_SYSREG(CNTFRQ_EL0) & CNTFRQ_MASK) / 1000;
> + cpu_khz = DIV_ROUND_UP(READ_SYSREG(CNTFRQ_EL0) & CNTFRQ_MASK, 1000);
> validate_timer_frequency();
> }
>
> diff --git a/xen/arch/riscv/time.c b/xen/arch/riscv/time.c
> index 7efa76fdbcb1..faca7b70e13a 100644
> --- a/xen/arch/riscv/time.c
> +++ b/xen/arch/riscv/time.c
> @@ -40,7 +40,7 @@ static void __init preinit_dt_xen_time(void)
> if ( !dt_property_read_u32(timer, "timebase-frequency", &rate) )
> panic("Unable to find clock frequency\n");
>
> - cpu_khz = rate / 1000;
> + cpu_khz = DIV_ROUND_UP(rate, 1000);
> }
>
> int reprogram_timer(s_time_t timeout)
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 244277c0a921..b84414f00d05 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2642,7 +2642,7 @@ void __init early_time_init(void)
> set_time_scale(&t->tsc_scale, tmp);
> t->stamp.local_tsc = boot_tsc_stamp;
>
> - cpu_khz = tmp / 1000;
> + cpu_khz = DIV_ROUND_UP(tmp, 1000);
> printk("Detected %lu.%03lu MHz processor.\n",
> cpu_khz / 1000, cpu_khz % 1000);
>
--
Ngoc Tu Dinh | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xen/cpu: round up cpu_khz calculations
2026-04-14 11:36 ` Tu Dinh
@ 2026-04-14 11:56 ` Roger Pau Monné
0 siblings, 0 replies; 13+ messages in thread
From: Roger Pau Monné @ 2026-04-14 11:56 UTC (permalink / raw)
To: Tu Dinh
Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis,
Michal Orzel, Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko, Jan Beulich, Andrew Cooper, Teddy Astie
On Tue, Apr 14, 2026 at 11:36:22AM +0000, Tu Dinh wrote:
> On 14/04/2026 12:36, Roger Pau Monne wrote:
> > All arches truncate the cpu_khz without taking into account the less
> > significant digits. Instead use DIV_ROUND_UP() when scaling from Hz to kHz
> > to get as more accurate kHz value.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Couldn't DIV_ROUND be used here instead for a round-to-closest?
My bad, I got confused with the macro names.
Roger.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] time: fix time accounting for x86 HVM guests
2026-04-14 10:33 [PATCH 0/2] time: fix time accounting for x86 HVM guests Roger Pau Monne
2026-04-14 10:33 ` [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled Roger Pau Monne
2026-04-14 10:33 ` [PATCH 2/2] xen/cpu: round up cpu_khz calculations Roger Pau Monne
@ 2026-04-14 20:08 ` Stefano Stabellini
2026-04-15 8:44 ` Marek Marczykowski-Górecki
3 siblings, 0 replies; 13+ messages in thread
From: Stefano Stabellini @ 2026-04-14 20:08 UTC (permalink / raw)
To: Roger Pau Monne
Cc: xen-devel, Jan Beulich, Andrew Cooper, Teddy Astie,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko
On Tue, 14 Apr 2026, Roger Pau Monne wrote:
> Hello,
>
> When not emulating the TSC the guest time value calculated by using the
> vCPU time info page in HVM mode would drift between time synchronization
> intervals. First patch fixes the drift, second patch makes the
> calculation of cpu_khz round up the value for better accuracy.
>
> Thanks, Roger.
>
> Roger Pau Monne (2):
> x86/time: use native TSC scaling factors when TSC is not scaled
> xen/cpu: round up cpu_khz calculations
Thanks Roger, this fixed an outstanding bug I was seeing!
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] time: fix time accounting for x86 HVM guests
2026-04-14 10:33 [PATCH 0/2] time: fix time accounting for x86 HVM guests Roger Pau Monne
` (2 preceding siblings ...)
2026-04-14 20:08 ` [PATCH 0/2] time: fix time accounting for x86 HVM guests Stefano Stabellini
@ 2026-04-15 8:44 ` Marek Marczykowski-Górecki
3 siblings, 0 replies; 13+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-04-15 8:44 UTC (permalink / raw)
To: Roger Pau Monne
Cc: xen-devel, Jan Beulich, Andrew Cooper, Teddy Astie,
Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko
[-- Attachment #1: Type: text/plain, Size: 666 bytes --]
On Tue, Apr 14, 2026 at 12:33:25PM +0200, Roger Pau Monne wrote:
> Hello,
>
> When not emulating the TSC the guest time value calculated by using the
> vCPU time info page in HVM mode would drift between time synchronization
> intervals. First patch fixes the drift, second patch makes the
> calculation of cpu_khz round up the value for better accuracy.
I confirm with with those patches, and without the other fix ("x86/time:
do not kill calibration timer on suspend") the post-S3 issue is also
gone!
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xen/cpu: round up cpu_khz calculations
2026-04-14 10:33 ` [PATCH 2/2] xen/cpu: round up cpu_khz calculations Roger Pau Monne
2026-04-14 11:36 ` Tu Dinh
@ 2026-04-16 7:55 ` Jan Beulich
1 sibling, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2026-04-16 7:55 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
Volodymyr Babchuk, Alistair Francis, Connor Davis,
Oleksii Kurochko, Andrew Cooper, Teddy Astie, xen-devel
On 14.04.2026 12:33, Roger Pau Monne wrote:
> All arches truncate the cpu_khz without taking into account the less
> significant digits. Instead use DIV_ROUND_UP() when scaling from Hz to kHz
> to get as more accurate kHz value.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> While the possibly more accurate value is nice, I'm not sure it's actually
> fixing any functional bug, and hence the lack of "Fixes:" tag.
With, as suggested by Tu, DIV_ROUND() used instead (which then statistically
indeed is an improvement in accuracy)
Reviewed-by: Jan Beulich <jbeulich@suse.com>
As it's only statistically, nevertheless there is a risk of (perceived)
regressions, just to mention it.
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-14 10:33 ` [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled Roger Pau Monne
@ 2026-04-16 11:28 ` Jan Beulich
2026-04-16 12:51 ` Roger Pau Monné
2026-04-16 13:13 ` Roger Pau Monné
0 siblings, 2 replies; 13+ messages in thread
From: Jan Beulich @ 2026-04-16 11:28 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: Andrew Cooper, Teddy Astie, xen-devel
On 14.04.2026 12:33, Roger Pau Monne wrote:
> When running HVM guest in native TSC mode avoid using the recalculated vTSC
> scaling factors based on the cpu_khz value. Using the kHz based frequency
> leads to the TSC scaling values possibly not being the same as the ones
> used by the per CPU cpu_time->tsc_scale field, which introduces skew
> between the guest and Xen's calculations of the system time.
>
> On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
> (note this is a worse-case scenario), the cpu_khz variable will be set to
> 1999999kHz, and hence 999Hz cycles will be not accounted for per second.
> Over a second (the time synchronization period), this leads to a skew of:
>
> cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
>
> So far this has gone unnoticed because the time synchronization rendezvous
> forces the update of the tsc_timestamp and system_time fields in the vCPU
> time info area, and hence the skew only accumulates up to the rendezvous
> period. Attempting to remove the rendezvous causes the skew to grow
> unbounded.
>
> Fix by using the native TSC scaling values (as used by Xen) when the guest
> TSC is not scaled.
>
> Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> I'm worried about the usage of cpu_khz beyond simple printing it for
> informational purposes. Overall I think it would be safer to store the
> frequency in Hz, as to avoid losing the least significant digits.
>
> In any case, that's a different change.
I'm not quite sure - improving accuracy is of course a good thing, but will
we ever be able to do any such calculations error free, when already the
detected frequency isn't exactly precise?
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1710,17 +1710,25 @@ static void collect_time_info(const struct vcpu *v,
> else
> {
> if ( is_hvm_domain(d) && hvm_tsc_scaling_supported )
> - {
> tsc_stamp = hvm_scale_tsc(d, t->stamp.local_tsc);
This is a potentially imprecise calculation. How likely is it that its result
will indeed ...
> - u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> - u->tsc_shift = d->arch.vtsc_to_ns.shift;
> - }
> else
> - {
> tsc_stamp = t->stamp.local_tsc;
> +
> + /*
> + * HVM guests using the native TSC ratio should use the same per-CPU
> + * scaling factors as Xen. This ensures time keeping is always in sync
> + * between Xen and the guest.
> + */
> + if ( tsc_stamp == t->stamp.local_tsc )
... exactly match t->stamp.local_tsc? Don't we possibly need a (small) error
margin? (In which case of course the next question would be: How to establish
such a margin?)
Jan
> + {
> u->tsc_to_system_mul = t->tsc_scale.mul_frac;
> u->tsc_shift = t->tsc_scale.shift;
> }
> + else
> + {
> + u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> + u->tsc_shift = d->arch.vtsc_to_ns.shift;
> + }
> }
>
> u->tsc_timestamp = tsc_stamp;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-16 11:28 ` Jan Beulich
@ 2026-04-16 12:51 ` Roger Pau Monné
2026-04-16 12:57 ` Jan Beulich
2026-04-16 13:13 ` Roger Pau Monné
1 sibling, 1 reply; 13+ messages in thread
From: Roger Pau Monné @ 2026-04-16 12:51 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Teddy Astie, xen-devel
On Thu, Apr 16, 2026 at 01:28:11PM +0200, Jan Beulich wrote:
> On 14.04.2026 12:33, Roger Pau Monne wrote:
> > When running HVM guest in native TSC mode avoid using the recalculated vTSC
> > scaling factors based on the cpu_khz value. Using the kHz based frequency
> > leads to the TSC scaling values possibly not being the same as the ones
> > used by the per CPU cpu_time->tsc_scale field, which introduces skew
> > between the guest and Xen's calculations of the system time.
> >
> > On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
> > (note this is a worse-case scenario), the cpu_khz variable will be set to
> > 1999999kHz, and hence 999Hz cycles will be not accounted for per second.
> > Over a second (the time synchronization period), this leads to a skew of:
> >
> > cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
> >
> > So far this has gone unnoticed because the time synchronization rendezvous
> > forces the update of the tsc_timestamp and system_time fields in the vCPU
> > time info area, and hence the skew only accumulates up to the rendezvous
> > period. Attempting to remove the rendezvous causes the skew to grow
> > unbounded.
> >
> > Fix by using the native TSC scaling values (as used by Xen) when the guest
> > TSC is not scaled.
> >
> > Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > I'm worried about the usage of cpu_khz beyond simple printing it for
> > informational purposes. Overall I think it would be safer to store the
> > frequency in Hz, as to avoid losing the least significant digits.
> >
> > In any case, that's a different change.
>
> I'm not quite sure - improving accuracy is of course a good thing, but will
> we ever be able to do any such calculations error free, when already the
> detected frequency isn't exactly precise?
>
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1710,17 +1710,25 @@ static void collect_time_info(const struct vcpu *v,
> > else
> > {
> > if ( is_hvm_domain(d) && hvm_tsc_scaling_supported )
> > - {
> > tsc_stamp = hvm_scale_tsc(d, t->stamp.local_tsc);
>
> This is a potentially imprecise calculation. How likely is it that its result
> will indeed ...
>
> > - u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> > - u->tsc_shift = d->arch.vtsc_to_ns.shift;
> > - }
> > else
> > - {
> > tsc_stamp = t->stamp.local_tsc;
> > +
> > + /*
> > + * HVM guests using the native TSC ratio should use the same per-CPU
> > + * scaling factors as Xen. This ensures time keeping is always in sync
> > + * between Xen and the guest.
> > + */
> > + if ( tsc_stamp == t->stamp.local_tsc )
>
> ... exactly match t->stamp.local_tsc? Don't we possibly need a (small) error
> margin? (In which case of course the next question would be: How to establish
> such a margin?)
hvm_scale_tsc() has:
if ( ratio == hvm_default_tsc_scaling_ratio )
return tsc;
So when using no scaling the input value is the output value, and
hence tsc_stamp will match exactly t->stamp.local_tsc.
Thanks, Roger.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-16 12:51 ` Roger Pau Monné
@ 2026-04-16 12:57 ` Jan Beulich
0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2026-04-16 12:57 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Andrew Cooper, Teddy Astie, xen-devel
On 16.04.2026 14:51, Roger Pau Monné wrote:
> On Thu, Apr 16, 2026 at 01:28:11PM +0200, Jan Beulich wrote:
>> On 14.04.2026 12:33, Roger Pau Monne wrote:
>>> When running HVM guest in native TSC mode avoid using the recalculated vTSC
>>> scaling factors based on the cpu_khz value. Using the kHz based frequency
>>> leads to the TSC scaling values possibly not being the same as the ones
>>> used by the per CPU cpu_time->tsc_scale field, which introduces skew
>>> between the guest and Xen's calculations of the system time.
>>>
>>> On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
>>> (note this is a worse-case scenario), the cpu_khz variable will be set to
>>> 1999999kHz, and hence 999Hz cycles will be not accounted for per second.
>>> Over a second (the time synchronization period), this leads to a skew of:
>>>
>>> cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
>>>
>>> So far this has gone unnoticed because the time synchronization rendezvous
>>> forces the update of the tsc_timestamp and system_time fields in the vCPU
>>> time info area, and hence the skew only accumulates up to the rendezvous
>>> period. Attempting to remove the rendezvous causes the skew to grow
>>> unbounded.
>>>
>>> Fix by using the native TSC scaling values (as used by Xen) when the guest
>>> TSC is not scaled.
>>>
>>> Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> I'm worried about the usage of cpu_khz beyond simple printing it for
>>> informational purposes. Overall I think it would be safer to store the
>>> frequency in Hz, as to avoid losing the least significant digits.
>>>
>>> In any case, that's a different change.
>>
>> I'm not quite sure - improving accuracy is of course a good thing, but will
>> we ever be able to do any such calculations error free, when already the
>> detected frequency isn't exactly precise?
>>
>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -1710,17 +1710,25 @@ static void collect_time_info(const struct vcpu *v,
>>> else
>>> {
>>> if ( is_hvm_domain(d) && hvm_tsc_scaling_supported )
>>> - {
>>> tsc_stamp = hvm_scale_tsc(d, t->stamp.local_tsc);
>>
>> This is a potentially imprecise calculation. How likely is it that its result
>> will indeed ...
>>
>>> - u->tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
>>> - u->tsc_shift = d->arch.vtsc_to_ns.shift;
>>> - }
>>> else
>>> - {
>>> tsc_stamp = t->stamp.local_tsc;
>>> +
>>> + /*
>>> + * HVM guests using the native TSC ratio should use the same per-CPU
>>> + * scaling factors as Xen. This ensures time keeping is always in sync
>>> + * between Xen and the guest.
>>> + */
>>> + if ( tsc_stamp == t->stamp.local_tsc )
>>
>> ... exactly match t->stamp.local_tsc? Don't we possibly need a (small) error
>> margin? (In which case of course the next question would be: How to establish
>> such a margin?)
>
> hvm_scale_tsc() has:
>
> if ( ratio == hvm_default_tsc_scaling_ratio )
> return tsc;
>
> So when using no scaling the input value is the output value, and
> hence tsc_stamp will match exactly t->stamp.local_tsc.
Ouch. I did look at the function, but managed to have all my attention drawn
to the asm() there. I'm sorry for the noise. As it's strictly an improvement:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
The other, earlier remark remains applicable, though.
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-16 11:28 ` Jan Beulich
2026-04-16 12:51 ` Roger Pau Monné
@ 2026-04-16 13:13 ` Roger Pau Monné
2026-04-16 13:21 ` Jan Beulich
1 sibling, 1 reply; 13+ messages in thread
From: Roger Pau Monné @ 2026-04-16 13:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Teddy Astie, xen-devel
On Thu, Apr 16, 2026 at 01:28:11PM +0200, Jan Beulich wrote:
> On 14.04.2026 12:33, Roger Pau Monne wrote:
> > When running HVM guest in native TSC mode avoid using the recalculated vTSC
> > scaling factors based on the cpu_khz value. Using the kHz based frequency
> > leads to the TSC scaling values possibly not being the same as the ones
> > used by the per CPU cpu_time->tsc_scale field, which introduces skew
> > between the guest and Xen's calculations of the system time.
> >
> > On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
> > (note this is a worse-case scenario), the cpu_khz variable will be set to
> > 1999999kHz, and hence 999Hz cycles will be not accounted for per second.
> > Over a second (the time synchronization period), this leads to a skew of:
> >
> > cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
> >
> > So far this has gone unnoticed because the time synchronization rendezvous
> > forces the update of the tsc_timestamp and system_time fields in the vCPU
> > time info area, and hence the skew only accumulates up to the rendezvous
> > period. Attempting to remove the rendezvous causes the skew to grow
> > unbounded.
> >
> > Fix by using the native TSC scaling values (as used by Xen) when the guest
> > TSC is not scaled.
> >
> > Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > I'm worried about the usage of cpu_khz beyond simple printing it for
> > informational purposes. Overall I think it would be safer to store the
> > frequency in Hz, as to avoid losing the least significant digits.
> >
> > In any case, that's a different change.
>
> I'm not quite sure - improving accuracy is of course a good thing, but will
> we ever be able to do any such calculations error free, when already the
> detected frequency isn't exactly precise?
I think getting them fully accurate is not strictly required. The
specific issue here was that the guest was supposedly running with the
native TSC frequency, but the vCPU time info scaling factors where
(slightly) different from the ones using natively by Xen, hence resulting in a
time skew.
When the guest runs with a different TSC frequency Xen already
accounts for it properly, and hence there's no skew.
However, as noted in the next patch, I don't really see the benefit of
storing the frequency in kHz instead of using plain Hz.
Thanks, Roger.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled
2026-04-16 13:13 ` Roger Pau Monné
@ 2026-04-16 13:21 ` Jan Beulich
0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2026-04-16 13:21 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Andrew Cooper, Teddy Astie, xen-devel
On 16.04.2026 15:13, Roger Pau Monné wrote:
> On Thu, Apr 16, 2026 at 01:28:11PM +0200, Jan Beulich wrote:
>> On 14.04.2026 12:33, Roger Pau Monne wrote:
>>> When running HVM guest in native TSC mode avoid using the recalculated vTSC
>>> scaling factors based on the cpu_khz value. Using the kHz based frequency
>>> leads to the TSC scaling values possibly not being the same as the ones
>>> used by the per CPU cpu_time->tsc_scale field, which introduces skew
>>> between the guest and Xen's calculations of the system time.
>>>
>>> On a 2gHz system, where the frequency is possibly detected as 1999999999Hz
>>> (note this is a worse-case scenario), the cpu_khz variable will be set to
>>> 1999999kHz, and hence 999Hz cycles will be not accounted for per second.
>>> Over a second (the time synchronization period), this leads to a skew of:
>>>
>>> cycles * 1 / (Hz freq) = 999 / 1999999999 = 499,5ns
>>>
>>> So far this has gone unnoticed because the time synchronization rendezvous
>>> forces the update of the tsc_timestamp and system_time fields in the vCPU
>>> time info area, and hence the skew only accumulates up to the rendezvous
>>> period. Attempting to remove the rendezvous causes the skew to grow
>>> unbounded.
>>>
>>> Fix by using the native TSC scaling values (as used by Xen) when the guest
>>> TSC is not scaled.
>>>
>>> Fixes: eab8a90be723 ("x86/time: scale host TSC in pvclock properly")
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> I'm worried about the usage of cpu_khz beyond simple printing it for
>>> informational purposes. Overall I think it would be safer to store the
>>> frequency in Hz, as to avoid losing the least significant digits.
>>>
>>> In any case, that's a different change.
>>
>> I'm not quite sure - improving accuracy is of course a good thing, but will
>> we ever be able to do any such calculations error free, when already the
>> detected frequency isn't exactly precise?
>
> I think getting them fully accurate is not strictly required. The
> specific issue here was that the guest was supposedly running with the
> native TSC frequency, but the vCPU time info scaling factors where
> (slightly) different from the ones using natively by Xen, hence resulting in a
> time skew.
>
> When the guest runs with a different TSC frequency Xen already
> accounts for it properly, and hence there's no skew.
As "properly" isn't "accurate", I expect there'll still be some skew.
> However, as noted in the next patch, I don't really see the benefit of
> storing the frequency in kHz instead of using plain Hz.
That would reduce the error, yes.
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-04-16 13:21 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 10:33 [PATCH 0/2] time: fix time accounting for x86 HVM guests Roger Pau Monne
2026-04-14 10:33 ` [PATCH 1/2] x86/time: use native TSC scaling factors when TSC is not scaled Roger Pau Monne
2026-04-16 11:28 ` Jan Beulich
2026-04-16 12:51 ` Roger Pau Monné
2026-04-16 12:57 ` Jan Beulich
2026-04-16 13:13 ` Roger Pau Monné
2026-04-16 13:21 ` Jan Beulich
2026-04-14 10:33 ` [PATCH 2/2] xen/cpu: round up cpu_khz calculations Roger Pau Monne
2026-04-14 11:36 ` Tu Dinh
2026-04-14 11:56 ` Roger Pau Monné
2026-04-16 7:55 ` Jan Beulich
2026-04-14 20:08 ` [PATCH 0/2] time: fix time accounting for x86 HVM guests Stefano Stabellini
2026-04-15 8:44 ` Marek Marczykowski-Górecki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.