Linux-HyperV List
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@kernel.org>,
	John Stultz <jstultz@google.com>,
	Michael Kelley <mhklinux@outlook.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>,
	Stephen Boyd <sboyd@kernel.org>,
	Miroslav Lichvar <mlichvar@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Daniel Lezcano <daniel.lezcano@kernel.org>,
	kvm@vger.kernel.org, linux-hyperv@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] KVM/x86: Killing kvm_get_time_and_clockread() in favour of ktime_get_snapshot()
Date: Wed, 27 May 2026 10:30:21 +0200	[thread overview]
Message-ID: <87zf1ljluq.fsf@redhat.com> (raw)
In-Reply-To: <b4895a532344ba6a879d922be8536f9000cd398c.camel@infradead.org>

David Woodhouse <dwmw2@infradead.org> writes:

...

>
> Then in 2018, Vitaly Kuznetsov added Hyper-V TSC page support in
> commit b0c39dc68e3b ("x86/kvm: Pass stable clocksource to guests when
> running nested on Hyper-V"), which extended vgettsc() to handle the
> HVCLOCK case.
>
> I'd quite like to kill it all with fire and make KVM use
> ktime_get_snapshot() instead.

The main motivation is reducing the complexity of KVM's timekeeping code
I guess?

>
> However, to correlate with the TSC provided to guests, KVM needs the
> underlying host TSC counter value, *not* the cycles count from the
> hyperv_clocksource_tsc_page clocksource which is scaled to 10MHz.
>
> If we wanted to support master clock mode while nesting under KVM and
> bizarrely using the kvmclock for system timing, we'd have the same
> problem with the kvmclock clocksource, which similarly scales to 1GHz.
>
> One option is to say "Don't Do That Then™": if you want to provide a
> masterclock kvmclock to guests then *don't* use the silly pvclocks for
> your own kernel's timekeeping, use the damn TSC. Because if the TSC
> *isn't* reliable then you can't do masterclock mode for your guests
> anyway.

The statement "TSC isn't reliable" deserves a book of its own :-)
Historically, we've seen all sorts of issues with it, but by the time of
b0c39dc68e3b, they were mostly gone. The real problem the Hyper-V/Azure
folks were solving back then was that while the TSC *was* reliable
(synchronized across CPUs, not jumping backwards, stable frequency,
...), tons of hardware out there (Azure is quite big) did not support
TSC scaling. VMs on Azure don't migrate very often, but they do migrate
when hardware maintenance is needed. Migrating to a host with a
different TSC frequency would've been a problem, so the Hyper-V TSC page
was introduced. Note: it is a *single* page for all CPUs, so the
clocksource was never intended to be used in a situation where TSCs are
unsynchronized across CPUs.

To deal with migrations, the Hyper-V folks came up with a mechanism
called 'reenlightenment notifications', and we support it in KVM. It's
not really great, as we need to stop all the nested VMs, but it does the
job: we can re-compute guest PV clocksources (kvmclock, TSC page,
... Xen?) and live happily ever after.

>
> Perhaps that should have been the response when commit b0c39dc68e3b was
> submitted, but I guess we're stuck supporting that mode now.

Times are changing, and it is becoming increasingly difficult to find
x86 hardware without TSC scaling support. Linux guests on Hyper-V now
prefer TSC if possible (HV_ACCESS_TSC_INVARIANT; see, e.g., commit
4c78738ead4e), so I expect that in a few years, there will be no need
for the Hyper-V TSC page clocksource or the reenlightenment logic
anyway.

> But I really do want to kill the KVM hacks and use ktime_get_snapshot().
>
> Reverse-engineering the original TSC reading from the clocksource
> counter value doesn't look sane, without a loss of precision and/or
> 128-bit division.
>
> One simple option that occurs to me would be to add a 'cycles_raw'
> value to the system_time_snapshot, for PV clocksources like hyperv and
> kvmclock to populate with the original TSC reading.

Personally, I don't see this as such an ugly hack.

>
> That might actually let us clean up some of the PTP code that currently
> has to deal with TSC vs. kvmclock in counter snapshots too. I think I
> could kill the use of get_cycles() in vmclock for the kvmclock case,
> which might make Thomas happy...
>
> Any better ideas?

-- 
Vitaly


  parent reply	other threads:[~2026-05-27  8:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 13:57 [RFC] KVM/x86: Killing kvm_get_time_and_clockread() in favour of ktime_get_snapshot() David Woodhouse
2026-05-26 23:04 ` David Woodhouse
2026-05-27  8:30 ` Vitaly Kuznetsov [this message]
2026-05-27  8:42   ` David Woodhouse
2026-05-27  8:49 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zf1ljluq.fsf@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=bp@alien8.de \
    --cc=christopher.s.hall@intel.com \
    --cc=daniel.lezcano@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=dwmw2@infradead.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=jstultz@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=mingo@redhat.com \
    --cc=mlichvar@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sboyd@kernel.org \
    --cc=seanjc@google.com \
    --cc=tglx@kernel.org \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox