public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Lefebvre <thomas.lefebvre3@gmail.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	 Michael Kelley <mikelley@microsoft.com>
Subject: Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
Date: Tue, 7 Apr 2026 09:43:24 -0700	[thread overview]
Message-ID: <adU0LAW1h8q9HsGu@google.com> (raw)
In-Reply-To: <87v7e3mbgj.fsf@redhat.com>

+Michael

On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > The hypervisor corrects them only through the TSC page scale/offset.
> > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > subtraction wraps.
> >
> 
> According to the TLFS, reference TSC page is partition wide:
> 
> "The hypervisor provides a partition-wide virtual reference TSC page
> which is overlaid on the partition’s GPA space. A partition’s reference
> time stamp counter page is accessed through the Reference TSC MSR."
> 
> so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> hardly see how we can use this time source at all, even without
> KVM. scale/offset are the same for all vCPUs.
> 
> I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> reads.

This feels like either a Hyper-V bug or a Linux-as-a-guest bug.  For "Reference
Counter"[1]:

  The hypervisor maintains a per-partition reference time counter. It has the
  characteristic that successive accesses to it return strictly monotonically
  increasing (time) values as seen by any and all virtual processors of a
  partition. Furthermore, the reference counter is rate constant and unaffected
  by processor or bus speed transitions or deep processor power savings states. A
  partition’s reference time counter is initialized to zero when the partition is
  created. The reference counter for all partitions count at the same rate, but
  at any time, their absolute values will typically differ because partitions
  will have different creation times.
  
  The reference counter continues to count up as long as at least one virtual
  processor is not explicitly suspended.


And then "Partition Reference Time Enlightenment"[2]:

  The partition reference time enlightenment presents a reference time source to
  a partition which does not require an intercept into the hypervisor. This
  enlightenment is available only when the underlying platform provides support
  of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
  the processor TSC frequency remains constant irrespective of changes in the
  processor’s clock frequency due to the use of power management states such as
  ACPI processor performance states, processor idle sleep states (ACPI C-states),
  etc.

  The partition reference time enlightenment uses a virtual TSC value, an offset
  and a multiplier to enable a guest partition to compute the normalized
  reference time since partition creation, in 100nS units. The mechanism also
  allows a guest partition to atomically compute the reference time when the
  guest partition is migrated to a platform with a different TSC rate, and
  provides a fallback mechanism to support migration to platforms without the
  constant rate TSC feature.

My read of "Partition Reference Time Enlightenment" is that it should only be
advertised if the TSC is synchronized and constant.  I can't figure out where
that feature is actually advertised though, because IIUC it's not the same as
HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
invariant even across live migration.  And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.

Michael, help?

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
[2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment

  reply	other threads:[~2026-04-07 16:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-05 22:10 [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency Thomas Lefebvre
2026-04-06 14:11 ` Sean Christopherson
2026-04-07  8:23   ` Vitaly Kuznetsov
2026-04-07  8:17 ` Vitaly Kuznetsov
2026-04-07 16:43   ` Sean Christopherson [this message]
2026-04-07 16:44     ` Sean Christopherson
2026-04-07 18:37     ` Michael Kelley
2026-04-07 19:13       ` Thomas Lefebvre
2026-04-07 20:40         ` Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adU0LAW1h8q9HsGu@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=thomas.lefebvre3@gmail.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox