[BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
@ 2026-04-05 22:10 Thomas Lefebvre
  2026-04-06 14:11 ` Sean Christopherson
  2026-04-07  8:17 ` Vitaly Kuznetsov
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Lefebvre @ 2026-04-05 22:10 UTC (permalink / raw)
  To: seanjc, pbonzini; +Cc: kvm, linux-kernel, linux-hyperv, vkuznets

Hi,

I'm seeing KVM_GET_CLOCK return values ~253 years in the future when
running KVM inside a Hyper-V VM (nested virtualization).  I tracked
it down to an unsigned wraparound in __get_kvmclock() and have
bpftrace data showing the exact failure.

Setup:
  - Intel i7-11800H laptop running Windows with Hyper-V
  - L1 guest: Ubuntu 24.04, kernel 6.8.0, 4 vCPUs
  - Clocksource: hyperv_clocksource_tsc_page (VDSO_CLOCKMODE_HVCLOCK)
  - KVM running inside L1, hosting L2 guests

Root cause:

__get_kvmclock() does:

    hv_clock.tsc_timestamp = ka->master_cycle_now;
    hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
    ...
    data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);

and __pvclock_read_cycles() does:

    delta = tsc - src->tsc_timestamp;    /* unsigned */

master_cycle_now is a raw RDTSC captured by
pvclock_update_vm_gtod_copy().  host_tsc is a raw RDTSC read by
__get_kvmclock() on the current CPU.  Both go through the vgettsc()
HVCLOCK path which calls hv_read_tsc_page_tsc() -- this computes a
cross-CPU-consistent reference counter via scale/offset, but stores
the *raw* RDTSC in tsc_timestamp as a side effect.

Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
The hypervisor corrects them only through the TSC page scale/offset.
If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
later runs on CPU 1 where the raw TSC is lower, the unsigned
subtraction wraps.

I wrote a bpftrace tracer (included below) to instrument both
functions and captured two corruption events:

  Event 1:

    [GTOD_COPY] pid=2117649 cpu=0->0 use_master=1
                mcn=598992030530137 mkn=259977082393200

    [GET_CLOCK] pid=2117649 entry_cpu=1 exit_cpu=1 use_master=1
      clock=8006399342167092479 host_tsc=598991848289183
      master_cycle_now=598992030530137
      system_time(mkn+off)=5175860260
      TSC DEFICIT: 182240954 cycles

    master_cycle_now captured on CPU 0, host_tsc read on CPU 1.
    CPU 1's raw RDTSC was 182M cycles lower.

      598991848289183 - 598992030530137 = 18446744073527310662 (u64)

    Returned clock: 8,006,399,342,167,092,479 ns (~253.7 years)
    Correct system_time: 5,175,860,260 ns (~5.2 seconds)

  Event 2:

    [GTOD_COPY] pid=2117953 cpu=0->0 use_master=1
                mcn=599040238416510

    [GET_CLOCK] pid=2117953 entry_cpu=3 exit_cpu=3 use_master=1
      clock=8006399342464295526 host_tsc=599040211994220
      master_cycle_now=599040238416510
      TSC DEFICIT: 26422290 cycles

    Same pattern, CPU 0 vs CPU 3, 26M cycle deficit.

kvm_get_wall_clock_epoch() has the same pattern -- fresh host_tsc
vs stale master_cycle_now passed to __pvclock_read_cycles().

The simplest fix I can think of is guarding the __pvclock_read_cycles
call in __get_kvmclock():

    if (data->host_tsc >= hv_clock.tsc_timestamp)
        data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
    else
        data->clock = hv_clock.system_time;

system_time (= master_kernel_ns + kvmclock_offset) was computed from
the TSC page's corrected reference counter and is accurate regardless
of CPU.  The fallback loses sub-us interpolation but avoids a 253-year
jump.  On systems with consistent cross-CPU TSC, the branch is never
taken.

One thing I wasn't sure about: when the fallback triggers,
KVM_CLOCK_TSC_STABLE is still set in data->flags.  I left it alone
since the returned value is still correct (just less precise), but
I could see an argument for clearing it.

Disabling master clock entirely for HVCLOCK would also work but
seemed heavy -- it sacrifices PVCLOCK_TSC_STABLE_BIT, forces the
guest pvclock read into the atomic64_cmpxchg monotonicity guard,
and triggers KVM_REQ_GLOBAL_CLOCK_UPDATE on vCPU migration.

Reproducer bpftrace script (run while exercising KVM on a Hyper-V
host):

  #!/usr/bin/env bpftrace
  /*
   * Detect host_tsc < master_cycle_now in __get_kvmclock.
   *
   * struct kvm_clock_data layout (for raw offset reads):
   *   offset 0:  u64 clock
   *   offset 24: u64 host_tsc
   */

  kprobe:__get_kvmclock
  {
      $kvm = (struct kvm *)arg0;
      @get_data[tid] = (uint64)arg1;
      @get_use_master[tid] = (uint64)$kvm->arch.use_master_clock;
      @get_mcn[tid] = (uint64)$kvm->arch.master_cycle_now;
      @get_cpu[tid] = cpu;
  }

  kretprobe:__get_kvmclock
  {
      $data_ptr = @get_data[tid];
      if ($data_ptr != 0) {
          $clock = *(uint64 *)($data_ptr);
          $host_tsc = *(uint64 *)($data_ptr + 24);
          $use_master = @get_use_master[tid];
          $mcn = @get_mcn[tid];

          if ($use_master && $host_tsc != 0 && $host_tsc < $mcn) {
              printf("BUG: pid=%d cpu=%d->%d host_tsc=%lu mcn=%lu "
                     "deficit=%lu clock=%lu\n",
                     pid, @get_cpu[tid], cpu, $host_tsc,
                     $mcn, $mcn - $host_tsc, $clock);
          }
      }
      delete(@get_data[tid]);
      delete(@get_use_master[tid]);
      delete(@get_mcn[tid]);
      delete(@get_cpu[tid]);
  }

  kprobe:pvclock_update_vm_gtod_copy {
      @gtod_kvm[tid] = (uint64)arg0;
      @gtod_cpu[tid] = cpu;
  }
  kretprobe:pvclock_update_vm_gtod_copy
  {
      $kvm = (struct kvm *)@gtod_kvm[tid];
      if ($kvm != 0) {
          printf("GTOD: pid=%d cpu=%d->%d mcn=%lu use_master=%d\n",
                 pid, @gtod_cpu[tid], cpu,
                 $kvm->arch.master_cycle_now,
                 $kvm->arch.use_master_clock);
      }
      delete(@gtod_kvm[tid]);
      delete(@gtod_cpu[tid]);
  }

Thanks,
Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-05 22:10 [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency Thomas Lefebvre
@ 2026-04-06 14:11 ` Sean Christopherson
  2026-04-07  8:23   ` Vitaly Kuznetsov
  2026-04-07  8:17 ` Vitaly Kuznetsov
  1 sibling, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2026-04-06 14:11 UTC (permalink / raw)
  To: Thomas Lefebvre; +Cc: pbonzini, kvm, linux-kernel, linux-hyperv, vkuznets

On Sun, Apr 05, 2026, Thomas Lefebvre wrote:
> Hi,
> 
> I'm seeing KVM_GET_CLOCK return values ~253 years in the future when
> running KVM inside a Hyper-V VM (nested virtualization).  I tracked
> it down to an unsigned wraparound in __get_kvmclock() and have
> bpftrace data showing the exact failure.
> 
> Setup:
>   - Intel i7-11800H laptop running Windows with Hyper-V
>   - L1 guest: Ubuntu 24.04, kernel 6.8.0, 4 vCPUs
>   - Clocksource: hyperv_clocksource_tsc_page (VDSO_CLOCKMODE_HVCLOCK)
>   - KVM running inside L1, hosting L2 guests
> 
> Root cause:
> 
> __get_kvmclock() does:
> 
>     hv_clock.tsc_timestamp = ka->master_cycle_now;
>     hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
>     ...
>     data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> 
> and __pvclock_read_cycles() does:
> 
>     delta = tsc - src->tsc_timestamp;    /* unsigned */
> 
> master_cycle_now is a raw RDTSC captured by
> pvclock_update_vm_gtod_copy().  host_tsc is a raw RDTSC read by
> __get_kvmclock() on the current CPU.  Both go through the vgettsc()
> HVCLOCK path which calls hv_read_tsc_page_tsc() -- this computes a
> cross-CPU-consistent reference counter via scale/offset, but stores
> the *raw* RDTSC in tsc_timestamp as a side effect.
> 
> Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> The hypervisor corrects them only through the TSC page scale/offset.
> If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> later runs on CPU 1 where the raw TSC is lower, the unsigned
> subtraction wraps.
> 
> I wrote a bpftrace tracer (included below) to instrument both
> functions and captured two corruption events:
> 
>   Event 1:
> 
>     [GTOD_COPY] pid=2117649 cpu=0->0 use_master=1
>                 mcn=598992030530137 mkn=259977082393200
> 
>     [GET_CLOCK] pid=2117649 entry_cpu=1 exit_cpu=1 use_master=1
>       clock=8006399342167092479 host_tsc=598991848289183
>       master_cycle_now=598992030530137
>       system_time(mkn+off)=5175860260
>       TSC DEFICIT: 182240954 cycles
> 
>     master_cycle_now captured on CPU 0, host_tsc read on CPU 1.
>     CPU 1's raw RDTSC was 182M cycles lower.
> 
>       598991848289183 - 598992030530137 = 18446744073527310662 (u64)
> 
>     Returned clock: 8,006,399,342,167,092,479 ns (~253.7 years)
>     Correct system_time: 5,175,860,260 ns (~5.2 seconds)
> 
>   Event 2:
> 
>     [GTOD_COPY] pid=2117953 cpu=0->0 use_master=1
>                 mcn=599040238416510
> 
>     [GET_CLOCK] pid=2117953 entry_cpu=3 exit_cpu=3 use_master=1
>       clock=8006399342464295526 host_tsc=599040211994220
>       master_cycle_now=599040238416510
>       TSC DEFICIT: 26422290 cycles
> 
>     Same pattern, CPU 0 vs CPU 3, 26M cycle deficit.
> 
> kvm_get_wall_clock_epoch() has the same pattern -- fresh host_tsc
> vs stale master_cycle_now passed to __pvclock_read_cycles().
> 
> The simplest fix I can think of is guarding the __pvclock_read_cycles
> call in __get_kvmclock():
> 
>     if (data->host_tsc >= hv_clock.tsc_timestamp)
>         data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
>     else
>         data->clock = hv_clock.system_time;

That might kinda sorta work for one KVM-as-the-host path, but it's not a proper
fix.  The actual guest-side (L2) reads in __pvclock_clocksource_read() will also
be broken, because PVCLOCK_TSC_STABLE_BIT will be set.

I don't see how this scenario can possibly work, KVM is effectively mixing two
time domains.  The stable timestamp from the TSC page is (obviously) *derived*
from the raw, *unstable* TSC, but they are two distinct domains.

What really confuses me is why we thought this would work for Hyper-V but not for
kvmclock (i.e. KVM-on-KVM).  Hyper-V's TSC page and kvmclock are the exact same
concept, but vgettsc() only special cases VDSO_CLOCKMODE_HVCLOCK, not
VDSO_CLOCKMODE_PVCLOCK.

Shouldn't we just revert b0c39dc68e3b ("x86/kvm: Pass stable clocksource to guests
when running nested on Hyper-V")?

Vitaly, what am I missing?

> system_time (= master_kernel_ns + kvmclock_offset) was computed from
> the TSC page's corrected reference counter and is accurate regardless
> of CPU.  The fallback loses sub-us interpolation but avoids a 253-year
> jump.  On systems with consistent cross-CPU TSC, the branch is never
> taken.
> 
> One thing I wasn't sure about: when the fallback triggers,
> KVM_CLOCK_TSC_STABLE is still set in data->flags.  I left it alone
> since the returned value is still correct (just less precise), but
> I could see an argument for clearing it.
> 
> Disabling master clock entirely for HVCLOCK would also work but
> seemed heavy -- it sacrifices PVCLOCK_TSC_STABLE_BIT, forces the
> guest pvclock read into the atomic64_cmpxchg monotonicity guard,
> and triggers KVM_REQ_GLOBAL_CLOCK_UPDATE on vCPU migration.
> 
> Reproducer bpftrace script (run while exercising KVM on a Hyper-V
> host):
> 
>   #!/usr/bin/env bpftrace
>   /*
>    * Detect host_tsc < master_cycle_now in __get_kvmclock.
>    *
>    * struct kvm_clock_data layout (for raw offset reads):
>    *   offset 0:  u64 clock
>    *   offset 24: u64 host_tsc
>    */
> 
>   kprobe:__get_kvmclock
>   {
>       $kvm = (struct kvm *)arg0;
>       @get_data[tid] = (uint64)arg1;
>       @get_use_master[tid] = (uint64)$kvm->arch.use_master_clock;
>       @get_mcn[tid] = (uint64)$kvm->arch.master_cycle_now;
>       @get_cpu[tid] = cpu;
>   }
> 
>   kretprobe:__get_kvmclock
>   {
>       $data_ptr = @get_data[tid];
>       if ($data_ptr != 0) {
>           $clock = *(uint64 *)($data_ptr);
>           $host_tsc = *(uint64 *)($data_ptr + 24);
>           $use_master = @get_use_master[tid];
>           $mcn = @get_mcn[tid];
> 
>           if ($use_master && $host_tsc != 0 && $host_tsc < $mcn) {
>               printf("BUG: pid=%d cpu=%d->%d host_tsc=%lu mcn=%lu "
>                      "deficit=%lu clock=%lu\n",
>                      pid, @get_cpu[tid], cpu, $host_tsc,
>                      $mcn, $mcn - $host_tsc, $clock);
>           }
>       }
>       delete(@get_data[tid]);
>       delete(@get_use_master[tid]);
>       delete(@get_mcn[tid]);
>       delete(@get_cpu[tid]);
>   }
> 
>   kprobe:pvclock_update_vm_gtod_copy {
>       @gtod_kvm[tid] = (uint64)arg0;
>       @gtod_cpu[tid] = cpu;
>   }
>   kretprobe:pvclock_update_vm_gtod_copy
>   {
>       $kvm = (struct kvm *)@gtod_kvm[tid];
>       if ($kvm != 0) {
>           printf("GTOD: pid=%d cpu=%d->%d mcn=%lu use_master=%d\n",
>                  pid, @gtod_cpu[tid], cpu,
>                  $kvm->arch.master_cycle_now,
>                  $kvm->arch.use_master_clock);
>       }
>       delete(@gtod_kvm[tid]);
>       delete(@gtod_cpu[tid]);
>   }
> 
> Thanks,
> Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-05 22:10 [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency Thomas Lefebvre
  2026-04-06 14:11 ` Sean Christopherson
@ 2026-04-07  8:17 ` Vitaly Kuznetsov
  2026-04-07 16:43   ` Sean Christopherson
  1 sibling, 1 reply; 9+ messages in thread
From: Vitaly Kuznetsov @ 2026-04-07  8:17 UTC (permalink / raw)
  To: Thomas Lefebvre, seanjc, pbonzini; +Cc: kvm, linux-kernel, linux-hyperv

Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:

...

>
> Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> The hypervisor corrects them only through the TSC page scale/offset.
> If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> later runs on CPU 1 where the raw TSC is lower, the unsigned
> subtraction wraps.
>

According to the TLFS, reference TSC page is partition wide:

"The hypervisor provides a partition-wide virtual reference TSC page
which is overlaid on the partition’s GPA space. A partition’s reference
time stamp counter page is accessed through the Reference TSC MSR."

so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
hardly see how we can use this time source at all, even without
KVM. scale/offset are the same for all vCPUs.

I think the fix here is to avoid setting up Hyper-V TSC page clocksource
in L1. Unfortunately, with unsynchronized TSCs this will leave us the
only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
reads.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-06 14:11 ` Sean Christopherson
@ 2026-04-07  8:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 9+ messages in thread
From: Vitaly Kuznetsov @ 2026-04-07  8:23 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Lefebvre
  Cc: pbonzini, kvm, linux-kernel, linux-hyperv

Sean Christopherson <seanjc@google.com> writes:

> On Sun, Apr 05, 2026, Thomas Lefebvre wrote:
>> Hi,
>> 
>> I'm seeing KVM_GET_CLOCK return values ~253 years in the future when
>> running KVM inside a Hyper-V VM (nested virtualization).  I tracked
>> it down to an unsigned wraparound in __get_kvmclock() and have
>> bpftrace data showing the exact failure.
>> 
>> Setup:
>>   - Intel i7-11800H laptop running Windows with Hyper-V
>>   - L1 guest: Ubuntu 24.04, kernel 6.8.0, 4 vCPUs
>>   - Clocksource: hyperv_clocksource_tsc_page (VDSO_CLOCKMODE_HVCLOCK)
>>   - KVM running inside L1, hosting L2 guests
>> 
>> Root cause:
>> 
>> __get_kvmclock() does:
>> 
>>     hv_clock.tsc_timestamp = ka->master_cycle_now;
>>     hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
>>     ...
>>     data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
>> 
>> and __pvclock_read_cycles() does:
>> 
>>     delta = tsc - src->tsc_timestamp;    /* unsigned */
>> 
>> master_cycle_now is a raw RDTSC captured by
>> pvclock_update_vm_gtod_copy().  host_tsc is a raw RDTSC read by
>> __get_kvmclock() on the current CPU.  Both go through the vgettsc()
>> HVCLOCK path which calls hv_read_tsc_page_tsc() -- this computes a
>> cross-CPU-consistent reference counter via scale/offset, but stores
>> the *raw* RDTSC in tsc_timestamp as a side effect.
>> 
>> Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
>> The hypervisor corrects them only through the TSC page scale/offset.
>> If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
>> later runs on CPU 1 where the raw TSC is lower, the unsigned
>> subtraction wraps.
>> 
>> I wrote a bpftrace tracer (included below) to instrument both
>> functions and captured two corruption events:
>> 
>>   Event 1:
>> 
>>     [GTOD_COPY] pid=2117649 cpu=0->0 use_master=1
>>                 mcn=598992030530137 mkn=259977082393200
>> 
>>     [GET_CLOCK] pid=2117649 entry_cpu=1 exit_cpu=1 use_master=1
>>       clock=8006399342167092479 host_tsc=598991848289183
>>       master_cycle_now=598992030530137
>>       system_time(mkn+off)=5175860260
>>       TSC DEFICIT: 182240954 cycles
>> 
>>     master_cycle_now captured on CPU 0, host_tsc read on CPU 1.
>>     CPU 1's raw RDTSC was 182M cycles lower.
>> 
>>       598991848289183 - 598992030530137 = 18446744073527310662 (u64)
>> 
>>     Returned clock: 8,006,399,342,167,092,479 ns (~253.7 years)
>>     Correct system_time: 5,175,860,260 ns (~5.2 seconds)
>> 
>>   Event 2:
>> 
>>     [GTOD_COPY] pid=2117953 cpu=0->0 use_master=1
>>                 mcn=599040238416510
>> 
>>     [GET_CLOCK] pid=2117953 entry_cpu=3 exit_cpu=3 use_master=1
>>       clock=8006399342464295526 host_tsc=599040211994220
>>       master_cycle_now=599040238416510
>>       TSC DEFICIT: 26422290 cycles
>> 
>>     Same pattern, CPU 0 vs CPU 3, 26M cycle deficit.
>> 
>> kvm_get_wall_clock_epoch() has the same pattern -- fresh host_tsc
>> vs stale master_cycle_now passed to __pvclock_read_cycles().
>> 
>> The simplest fix I can think of is guarding the __pvclock_read_cycles
>> call in __get_kvmclock():
>> 
>>     if (data->host_tsc >= hv_clock.tsc_timestamp)
>>         data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
>>     else
>>         data->clock = hv_clock.system_time;
>
> That might kinda sorta work for one KVM-as-the-host path, but it's not a proper
> fix.  The actual guest-side (L2) reads in __pvclock_clocksource_read() will also
> be broken, because PVCLOCK_TSC_STABLE_BIT will be set.
>
> I don't see how this scenario can possibly work, KVM is effectively mixing two
> time domains.  The stable timestamp from the TSC page is (obviously) *derived*
> from the raw, *unstable* TSC, but they are two distinct domains.
>
> What really confuses me is why we thought this would work for Hyper-V but not for
> kvmclock (i.e. KVM-on-KVM).  Hyper-V's TSC page and kvmclock are the exact same
> concept, but vgettsc() only special cases VDSO_CLOCKMODE_HVCLOCK, not
> VDSO_CLOCKMODE_PVCLOCK.
>
> Shouldn't we just revert b0c39dc68e3b ("x86/kvm: Pass stable clocksource to guests
> when running nested on Hyper-V")?
>
> Vitaly, what am I missing?
>

It's probably me who's missing somethings :-) but my understanding is
that we can't be using TSC page clocksource with unsyncronized TSCs in
L1 at all as TSC page (unlike kvmclock) is always partition-wide and
thus can't lead to a sane result in case raw TSC readings diverge. The
idea of b0c39dc68e3b was that in Hyper-V guests *with stable,
syncronized TSC* we may still be using Hyper-V TSC page clocksource and
thus we can pass it to L2.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-07  8:17 ` Vitaly Kuznetsov
@ 2026-04-07 16:43   ` Sean Christopherson
  2026-04-07 16:44     ` Sean Christopherson
  2026-04-07 18:37     ` Michael Kelley
  0 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2026-04-07 16:43 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Thomas Lefebvre, pbonzini, kvm, linux-kernel, linux-hyperv,
	Michael Kelley

+Michael

On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > The hypervisor corrects them only through the TSC page scale/offset.
> > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > subtraction wraps.
> >
> 
> According to the TLFS, reference TSC page is partition wide:
> 
> "The hypervisor provides a partition-wide virtual reference TSC page
> which is overlaid on the partition’s GPA space. A partition’s reference
> time stamp counter page is accessed through the Reference TSC MSR."
> 
> so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> hardly see how we can use this time source at all, even without
> KVM. scale/offset are the same for all vCPUs.
> 
> I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> reads.

This feels like either a Hyper-V bug or a Linux-as-a-guest bug.  For "Reference
Counter"[1]:

  The hypervisor maintains a per-partition reference time counter. It has the
  characteristic that successive accesses to it return strictly monotonically
  increasing (time) values as seen by any and all virtual processors of a
  partition. Furthermore, the reference counter is rate constant and unaffected
  by processor or bus speed transitions or deep processor power savings states. A
  partition’s reference time counter is initialized to zero when the partition is
  created. The reference counter for all partitions count at the same rate, but
  at any time, their absolute values will typically differ because partitions
  will have different creation times.

  The reference counter continues to count up as long as at least one virtual
  processor is not explicitly suspended.

And then "Partition Reference Time Enlightenment"[2]:

  The partition reference time enlightenment presents a reference time source to
  a partition which does not require an intercept into the hypervisor. This
  enlightenment is available only when the underlying platform provides support
  of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
  the processor TSC frequency remains constant irrespective of changes in the
  processor’s clock frequency due to the use of power management states such as
  ACPI processor performance states, processor idle sleep states (ACPI C-states),
  etc.

  The partition reference time enlightenment uses a virtual TSC value, an offset
  and a multiplier to enable a guest partition to compute the normalized
  reference time since partition creation, in 100nS units. The mechanism also
  allows a guest partition to atomically compute the reference time when the
  guest partition is migrated to a platform with a different TSC rate, and
  provides a fallback mechanism to support migration to platforms without the
  constant rate TSC feature.

My read of "Partition Reference Time Enlightenment" is that it should only be
advertised if the TSC is synchronized and constant.  I can't figure out where
that feature is actually advertised though, because IIUC it's not the same as
HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
invariant even across live migration.  And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.

Michael, help?

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
[2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-07 16:43   ` Sean Christopherson
@ 2026-04-07 16:44     ` Sean Christopherson
  2026-04-07 18:37     ` Michael Kelley
  1 sibling, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2026-04-07 16:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Thomas Lefebvre, pbonzini, kvm, linux-kernel, linux-hyperv,
	Michael Kelley

On Tue, Apr 07, 2026, Sean Christopherson wrote:
> +Michael

Let's try that again.  Email address #1 bounced.

> On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> > Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > > The hypervisor corrects them only through the TSC page scale/offset.
> > > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > > subtraction wraps.
> > >
> > 
> > According to the TLFS, reference TSC page is partition wide:
> > 
> > "The hypervisor provides a partition-wide virtual reference TSC page
> > which is overlaid on the partition’s GPA space. A partition’s reference
> > time stamp counter page is accessed through the Reference TSC MSR."
> > 
> > so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> > hardly see how we can use this time source at all, even without
> > KVM. scale/offset are the same for all vCPUs.
> > 
> > I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> > in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> > only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> > reads.
> 
> This feels like either a Hyper-V bug or a Linux-as-a-guest bug.  For "Reference
> Counter"[1]:
> 
>   The hypervisor maintains a per-partition reference time counter. It has the
>   characteristic that successive accesses to it return strictly monotonically
>   increasing (time) values as seen by any and all virtual processors of a
>   partition. Furthermore, the reference counter is rate constant and unaffected
>   by processor or bus speed transitions or deep processor power savings states. A
>   partition’s reference time counter is initialized to zero when the partition is
>   created. The reference counter for all partitions count at the same rate, but
>   at any time, their absolute values will typically differ because partitions
>   will have different creation times.
>   
>   The reference counter continues to count up as long as at least one virtual
>   processor is not explicitly suspended.
> 
> 
> And then "Partition Reference Time Enlightenment"[2]:
> 
>   The partition reference time enlightenment presents a reference time source to
>   a partition which does not require an intercept into the hypervisor. This
>   enlightenment is available only when the underlying platform provides support
>   of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
>   the processor TSC frequency remains constant irrespective of changes in the
>   processor’s clock frequency due to the use of power management states such as
>   ACPI processor performance states, processor idle sleep states (ACPI C-states),
>   etc.
> 
>   The partition reference time enlightenment uses a virtual TSC value, an offset
>   and a multiplier to enable a guest partition to compute the normalized
>   reference time since partition creation, in 100nS units. The mechanism also
>   allows a guest partition to atomically compute the reference time when the
>   guest partition is migrated to a platform with a different TSC rate, and
>   provides a fallback mechanism to support migration to platforms without the
>   constant rate TSC feature.
> 
> My read of "Partition Reference Time Enlightenment" is that it should only be
> advertised if the TSC is synchronized and constant.  I can't figure out where
> that feature is actually advertised though, because IIUC it's not the same as
> HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
> invariant even across live migration.  And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
> because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.
> 
> Michael, help?
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
> [2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-07 16:43   ` Sean Christopherson
  2026-04-07 16:44     ` Sean Christopherson
@ 2026-04-07 18:37     ` Michael Kelley
  2026-04-07 19:13       ` Thomas Lefebvre
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Kelley @ 2026-04-07 18:37 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov, Thomas Lefebvre
  Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org

From: Sean Christopherson <seanjc@google.com> Sent: Tuesday, April 7, 2026 9:43 AM
> 
> +Michael
> 
> On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> > Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > > The hypervisor corrects them only through the TSC page scale/offset.
> > > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > > subtraction wraps.
> > >
> >
> > According to the TLFS, reference TSC page is partition wide:
> >
> > "The hypervisor provides a partition-wide virtual reference TSC page
> > which is overlaid on the partition’s GPA space. A partition’s reference
> > time stamp counter page is accessed through the Reference TSC MSR."
> >
> > so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> > hardly see how we can use this time source at all, even without
> > KVM. scale/offset are the same for all vCPUs.
> >
> > I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> > in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> > only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> > reads.
> 
> This feels like either a Hyper-V bug or a Linux-as-a-guest bug.  For "Reference
> Counter"[1]:
> 
>   The hypervisor maintains a per-partition reference time counter. It has the
>   characteristic that successive accesses to it return strictly monotonically
>   increasing (time) values as seen by any and all virtual processors of a
>   partition. Furthermore, the reference counter is rate constant and unaffected
>   by processor or bus speed transitions or deep processor power savings states. A
>   partition’s reference time counter is initialized to zero when the partition is
>   created. The reference counter for all partitions count at the same rate, but
>   at any time, their absolute values will typically differ because partitions
>   will have different creation times.
> 
>   The reference counter continues to count up as long as at least one virtual
>   processor is not explicitly suspended.
> 
> 
> And then "Partition Reference Time Enlightenment"[2]:
> 
>   The partition reference time enlightenment presents a reference time source to
>   a partition which does not require an intercept into the hypervisor. This
>   enlightenment is available only when the underlying platform provides support
>   of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
>   the processor TSC frequency remains constant irrespective of changes in the
>   processor’s clock frequency due to the use of power management states such as
>   ACPI processor performance states, processor idle sleep states (ACPI C-states),
>   etc.
> 
>   The partition reference time enlightenment uses a virtual TSC value, an offset
>   and a multiplier to enable a guest partition to compute the normalized
>   reference time since partition creation, in 100nS units. The mechanism also
>   allows a guest partition to atomically compute the reference time when the
>   guest partition is migrated to a platform with a different TSC rate, and
>   provides a fallback mechanism to support migration to platforms without the
>   constant rate TSC feature.
> 
> My read of "Partition Reference Time Enlightenment" is that it should only be
> advertised if the TSC is synchronized and constant.  I can't figure out where
> that feature is actually advertised though, because IIUC it's not the same as
> HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
> invariant even across live migration.  And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
> because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.
> 
> Michael, help?
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
> [2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment

Yes, TSC page enlightenment is per VM, so it does not compensate for
discrepancies in raw TSC values across physical CPUs. RDTSC in a
Hyper-V VM is executed directly by the hardware (i.e., does not trap to
the hypervisor), so there's no opportunity for the hypervisor to compensate
for discrepancies. The hypervisor is expected to present a VM with TSCs
that are already synchronized. I'll need to double-check, but I don't think
Linux guests on Hyper-V run their own TSC synchronization.

The relevant Hyper-V flags are:
* HV_MSR_TIME_REF_COUNT_AVAILABLE:  The synthetic MSR for reading
   the partition reference time is available.
* HV_MSR_REFERENCE_TSC_AVAILABLE: The partition reference time
   enlightenment (i.e., "the TSC page") is available as a faster way to read
   the reference counter.
* HV_ACCESS_TSC_INVARIANT: As Sean said, this says the hardware and
   Hyper-V support TSC scaling, so live migration can be done across hosts
   without the guest seeing a change in TSC frequency.

Yes, this does feel like an issue where Hyper-V is not presenting the guest
with TSCs that are already synchronized. But I'm not aware of having seen
such a problem before. I'll try to imagine a scenario where a problem like
this could happen via some other path.

@Thomas Lefebvre:  Let me double-check a few things via these follow-up
questions/actions:

1. You said the clocksource is hyperv_clocksource_tsc_page. Just to
confirm, that's for the L1 guest, right? Does the output of the "lscpu"
command in the L1 guest show the flags "tsc_reliable" and "constant_tsc"?
I'm assume "no", since if these flags were set, the clocksource (i.e.,
/sys/devices/system/clocksource/clocksource0/current_clocksource)
should be the standard "tsc". I've got a laptop with a i7-13700H processor,
and my L1 VMs show "tsc" as the clocksource, but I haven't been running
KVM with L2 nested VMs.

2. What is the version of Windows/Hyper-V you are running? Get the
output of the "winver.exe" command. It should be something like this:

Windows 11 [as the top banner]
Version 25H2 (OS Build 26200.8037)

3. In the dmesg output of your L1 VM, find the line like this one and reply
with what you have:

    Hyper-V: privilege flags low 0xae7f, high 0x3b8030, hints 0x9a4e24, misc 0xe0bed7b2

From there, I can decode the Hyper-V settings and see if anything jumps out
as anomalous. 

4. Does the laptop where you are seeing this problem ever hibernate and
then resume? If so, do you recall if the problem occurs after a full reboot but
before it ever does a hibernate/resume cycle?

Michael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-07 18:37     ` Michael Kelley
@ 2026-04-07 19:13       ` Thomas Lefebvre
  2026-04-07 20:40         ` Michael Kelley
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Lefebvre @ 2026-04-07 19:13 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Sean Christopherson, Vitaly Kuznetsov, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org

Hi everyone, thank you for your attention to this bug report.

Michael,

1. No, lscpu in the L1 guest does not show the flags "tsc_reliable"
and "constant_tsc".
$ lscpu | grep tsc_reliable
$ lscpu | grep constant_tsc
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hyperv_clocksource_tsc_page

2. Windows 10
Version 22H2 (OS Build 19045.6466)

3. Hyper-V: privilege flags low 0x2e7f, high 0x3b8030, ext 0x2, hints
0x24e24, misc 0xbed7b2

4. Yes, the laptop hibernates and then resumes.
When the problem occurred, the laptop had gone through multiple
hibernate and resume cycles.
I haven't seen it happen after a full reboot before a hibernate/resume cycle.

Thomas

On Tue, Apr 7, 2026 at 11:37 AM Michael Kelley <mhklinux@outlook.com> wrote:
>
> From: Sean Christopherson <seanjc@google.com> Sent: Tuesday, April 7, 2026 9:43 AM
> >
> > +Michael
> >
> > On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> > > Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > > > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > > > The hypervisor corrects them only through the TSC page scale/offset.
> > > > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > > > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > > > subtraction wraps.
> > > >
> > >
> > > According to the TLFS, reference TSC page is partition wide:
> > >
> > > "The hypervisor provides a partition-wide virtual reference TSC page
> > > which is overlaid on the partition’s GPA space. A partition’s reference
> > > time stamp counter page is accessed through the Reference TSC MSR."
> > >
> > > so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> > > hardly see how we can use this time source at all, even without
> > > KVM. scale/offset are the same for all vCPUs.
> > >
> > > I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> > > in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> > > only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> > > reads.
> >
> > This feels like either a Hyper-V bug or a Linux-as-a-guest bug.  For "Reference
> > Counter"[1]:
> >
> >   The hypervisor maintains a per-partition reference time counter. It has the
> >   characteristic that successive accesses to it return strictly monotonically
> >   increasing (time) values as seen by any and all virtual processors of a
> >   partition. Furthermore, the reference counter is rate constant and unaffected
> >   by processor or bus speed transitions or deep processor power savings states. A
> >   partition’s reference time counter is initialized to zero when the partition is
> >   created. The reference counter for all partitions count at the same rate, but
> >   at any time, their absolute values will typically differ because partitions
> >   will have different creation times.
> >
> >   The reference counter continues to count up as long as at least one virtual
> >   processor is not explicitly suspended.
> >
> >
> > And then "Partition Reference Time Enlightenment"[2]:
> >
> >   The partition reference time enlightenment presents a reference time source to
> >   a partition which does not require an intercept into the hypervisor. This
> >   enlightenment is available only when the underlying platform provides support
> >   of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
> >   the processor TSC frequency remains constant irrespective of changes in the
> >   processor’s clock frequency due to the use of power management states such as
> >   ACPI processor performance states, processor idle sleep states (ACPI C-states),
> >   etc.
> >
> >   The partition reference time enlightenment uses a virtual TSC value, an offset
> >   and a multiplier to enable a guest partition to compute the normalized
> >   reference time since partition creation, in 100nS units. The mechanism also
> >   allows a guest partition to atomically compute the reference time when the
> >   guest partition is migrated to a platform with a different TSC rate, and
> >   provides a fallback mechanism to support migration to platforms without the
> >   constant rate TSC feature.
> >
> > My read of "Partition Reference Time Enlightenment" is that it should only be
> > advertised if the TSC is synchronized and constant.  I can't figure out where
> > that feature is actually advertised though, because IIUC it's not the same as
> > HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
> > invariant even across live migration.  And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
> > because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.
> >
> > Michael, help?
> >
> > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
> > [2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment
>
> Yes, TSC page enlightenment is per VM, so it does not compensate for
> discrepancies in raw TSC values across physical CPUs. RDTSC in a
> Hyper-V VM is executed directly by the hardware (i.e., does not trap to
> the hypervisor), so there's no opportunity for the hypervisor to compensate
> for discrepancies. The hypervisor is expected to present a VM with TSCs
> that are already synchronized. I'll need to double-check, but I don't think
> Linux guests on Hyper-V run their own TSC synchronization.
>
> The relevant Hyper-V flags are:
> * HV_MSR_TIME_REF_COUNT_AVAILABLE:  The synthetic MSR for reading
>    the partition reference time is available.
> * HV_MSR_REFERENCE_TSC_AVAILABLE: The partition reference time
>    enlightenment (i.e., "the TSC page") is available as a faster way to read
>    the reference counter.
> * HV_ACCESS_TSC_INVARIANT: As Sean said, this says the hardware and
>    Hyper-V support TSC scaling, so live migration can be done across hosts
>    without the guest seeing a change in TSC frequency.
>
> Yes, this does feel like an issue where Hyper-V is not presenting the guest
> with TSCs that are already synchronized. But I'm not aware of having seen
> such a problem before. I'll try to imagine a scenario where a problem like
> this could happen via some other path.
>
> @Thomas Lefebvre:  Let me double-check a few things via these follow-up
> questions/actions:
>
> 1. You said the clocksource is hyperv_clocksource_tsc_page. Just to
> confirm, that's for the L1 guest, right? Does the output of the "lscpu"
> command in the L1 guest show the flags "tsc_reliable" and "constant_tsc"?
> I'm assume "no", since if these flags were set, the clocksource (i.e.,
> /sys/devices/system/clocksource/clocksource0/current_clocksource)
> should be the standard "tsc". I've got a laptop with a i7-13700H processor,
> and my L1 VMs show "tsc" as the clocksource, but I haven't been running
> KVM with L2 nested VMs.
>
> 2. What is the version of Windows/Hyper-V you are running? Get the
> output of the "winver.exe" command. It should be something like this:
>
> Windows 11 [as the top banner]
> Version 25H2 (OS Build 26200.8037)
>
> 3. In the dmesg output of your L1 VM, find the line like this one and reply
> with what you have:
>
>     Hyper-V: privilege flags low 0xae7f, high 0x3b8030, hints 0x9a4e24, misc 0xe0bed7b2
>
> From there, I can decode the Hyper-V settings and see if anything jumps out
> as anomalous.
>
> 4. Does the laptop where you are seeing this problem ever hibernate and
> then resume? If so, do you recall if the problem occurs after a full reboot but
> before it ever does a hibernate/resume cycle?
>
> Michael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
  2026-04-07 19:13       ` Thomas Lefebvre
@ 2026-04-07 20:40         ` Michael Kelley
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Kelley @ 2026-04-07 20:40 UTC (permalink / raw)
  To: Thomas Lefebvre
  Cc: Sean Christopherson, Vitaly Kuznetsov, pbonzini@redhat.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org

From: Thomas Lefebvre <thomas.lefebvre3@gmail.com> Sent: Tuesday, April 7, 2026 12:13 PM
> 
> Hi everyone, thank you for your attention to this bug report.
> 
> Michael,
> 
> 1. No, lscpu in the L1 guest does not show the flags "tsc_reliable"
> and "constant_tsc".
> $ lscpu | grep tsc_reliable
> $ lscpu | grep constant_tsc
> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> hyperv_clocksource_tsc_page
> 
> 2. Windows 10
> Version 22H2 (OS Build 19045.6466)
> 
> 3. Hyper-V: privilege flags low 0x2e7f, high 0x3b8030, ext 0x2, hints
> 0x24e24, misc 0xbed7b2
> 
> 4. Yes, the laptop hibernates and then resumes.
> When the problem occurred, the laptop had gone through multiple
> hibernate and resume cycles.
> I haven't seen it happen after a full reboot before a hibernate/resume cycle.
> 
> Thomas
> 

How easy is it for you to reproduce the problem? Would it be feasible
to get a definitive answer on whether the problem repros after a
full reboot, but before a hibernate/resume cycle?

There's a known bug Windows 10 Hyper-V where the hardware TSC
scaling gets messed up after a hibernate/resume cycle, causing the TSC
values read in the guest to drift from what the Hyper-V host thinks
the guest's TSC value is. A summary of the problem is here:
https://github.com/microsoft/WSL/issues/6982#issuecomment-2294892954

Of course, this doesn't sound like your symptom. And Hyper-V is not
telling your guest that it supports hardware TSC scaling, because the
HV_ACCESS_TSC_INVARIANT flag is *not* set and the clocksource
is hyperv_clocksource_tsc_page. But my understanding is that the code
changes to fix the Hyper-V problem weren't trivial, and I'm speculating
that maybe you are seeing some other symptom of whatever the
underlying Hyper-V issue was.

Of course, this is just speculation. If the problem can occur before
any hibernate/resume cycles are done, then my speculation is
wrong. But if the problem only happens after a hibernate/resume
cycle, then this known problem, or something related to it, becomes
a pretty good candidate. Unfortunately, I'm pretty sure there's no
fix for Windows 10 Hyper-V. You would need to upgrade to 
Windows 11 22H2 or later.

Michael

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-07 20:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-05 22:10 [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency Thomas Lefebvre
2026-04-06 14:11 ` Sean Christopherson
2026-04-07  8:23   ` Vitaly Kuznetsov
2026-04-07  8:17 ` Vitaly Kuznetsov
2026-04-07 16:43   ` Sean Christopherson
2026-04-07 16:44     ` Sean Christopherson
2026-04-07 18:37     ` Michael Kelley
2026-04-07 19:13       ` Thomas Lefebvre
2026-04-07 20:40         ` Michael Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox