From: Sean Christopherson <seanjc@google.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Lefebvre <thomas.lefebvre3@gmail.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
Michael Kelley <mhklinux@outlook.com>
Subject: Re: [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency
Date: Tue, 7 Apr 2026 09:44:51 -0700 [thread overview]
Message-ID: <adU0g-trGz8CRjeM@google.com> (raw)
In-Reply-To: <adU0LAW1h8q9HsGu@google.com>
On Tue, Apr 07, 2026, Sean Christopherson wrote:
> +Michael
Let's try that again. Email address #1 bounced.
> On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote:
> > Thomas Lefebvre <thomas.lefebvre3@gmail.com> writes:
> > > Under Hyper-V, raw RDTSC values are not consistent across vCPUs.
> > > The hypervisor corrects them only through the TSC page scale/offset.
> > > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock()
> > > later runs on CPU 1 where the raw TSC is lower, the unsigned
> > > subtraction wraps.
> > >
> >
> > According to the TLFS, reference TSC page is partition wide:
> >
> > "The hypervisor provides a partition-wide virtual reference TSC page
> > which is overlaid on the partition’s GPA space. A partition’s reference
> > time stamp counter page is accessed through the Reference TSC MSR."
> >
> > so if as you say RAW rdtsc value is inconsistent across vCPUs, I can
> > hardly see how we can use this time source at all, even without
> > KVM. scale/offset are the same for all vCPUs.
> >
> > I think the fix here is to avoid setting up Hyper-V TSC page clocksource
> > in L1. Unfortunately, with unsynchronized TSCs this will leave us the
> > only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR
> > reads.
>
> This feels like either a Hyper-V bug or a Linux-as-a-guest bug. For "Reference
> Counter"[1]:
>
> The hypervisor maintains a per-partition reference time counter. It has the
> characteristic that successive accesses to it return strictly monotonically
> increasing (time) values as seen by any and all virtual processors of a
> partition. Furthermore, the reference counter is rate constant and unaffected
> by processor or bus speed transitions or deep processor power savings states. A
> partition’s reference time counter is initialized to zero when the partition is
> created. The reference counter for all partitions count at the same rate, but
> at any time, their absolute values will typically differ because partitions
> will have different creation times.
>
> The reference counter continues to count up as long as at least one virtual
> processor is not explicitly suspended.
>
>
> And then "Partition Reference Time Enlightenment"[2]:
>
> The partition reference time enlightenment presents a reference time source to
> a partition which does not require an intercept into the hypervisor. This
> enlightenment is available only when the underlying platform provides support
> of an invariant processor Time Stamp Counter (TSC), or iTSC. In such platforms,
> the processor TSC frequency remains constant irrespective of changes in the
> processor’s clock frequency due to the use of power management states such as
> ACPI processor performance states, processor idle sleep states (ACPI C-states),
> etc.
>
> The partition reference time enlightenment uses a virtual TSC value, an offset
> and a multiplier to enable a guest partition to compute the normalized
> reference time since partition creation, in 100nS units. The mechanism also
> allows a guest partition to atomically compute the reference time when the
> guest partition is migrated to a platform with a different TSC rate, and
> provides a fallback mechanism to support migration to platforms without the
> constant rate TSC feature.
>
> My read of "Partition Reference Time Enlightenment" is that it should only be
> advertised if the TSC is synchronized and constant. I can't figure out where
> that feature is actually advertised though, because IIUC it's not the same as
> HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be
> invariant even across live migration. And it's not HV_MSR_REFERENCE_TSC_AVAILABLE,
> because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available.
>
> Michael, help?
>
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter
> [2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment
next prev parent reply other threads:[~2026-04-07 16:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-05 22:10 [BUG] KVM: x86: kvmclock jumps ~253 years on Hyper-V nested virt due to cross-CPU raw TSC inconsistency Thomas Lefebvre
2026-04-06 14:11 ` Sean Christopherson
2026-04-07 8:23 ` Vitaly Kuznetsov
2026-04-07 8:17 ` Vitaly Kuznetsov
2026-04-07 16:43 ` Sean Christopherson
2026-04-07 16:44 ` Sean Christopherson [this message]
2026-04-07 18:37 ` Michael Kelley
2026-04-07 19:13 ` Thomas Lefebvre
2026-04-07 20:40 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adU0g-trGz8CRjeM@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhklinux@outlook.com \
--cc=pbonzini@redhat.com \
--cc=thomas.lefebvre3@gmail.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox