From: David Woodhouse <dwmw2@infradead.org>
To: Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Juergen Gross <jgross@suse.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
David Woodhouse <dwmw2@infradead.org>,
Paul Durrant <paul@xen.org>, Jonathan Cameron <jic23@kernel.org>,
Sascha Bischoff <Sascha.Bischoff@arm.com>,
Marc Zyngier <maz@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
Jack Allister <jalliste@amazon.com>,
Dongli Zhang <dongli.zhang@oracle.com>,
joe.jin@oracle.com, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org
Subject: [PATCH v5 21/34] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
Date: Mon, 8 Jun 2026 15:48:02 +0100 [thread overview]
Message-ID: <20260608145455.89187-22-dwmw2@infradead.org> (raw)
In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
Previously, a guest writing different TSC values on different vCPUs could
force KVM out of master clock mode. With this change, only a frequency
mismatch disables master clock. The only ways for non-master-clock mode
to happen now are archaic hardware without a TSC-based clocksource, or a
VMM that sets different TSC frequencies across vCPUs.
Running at a different frequency would lead to a systemic skew between
the clock(s) as observed by different vCPUs due to arithmetic precision
in the scaling. So that should indeed force the clock to be based on the
host's CLOCK_MONOTONIC_RAW instead of being in masterclock mode where it
is defined by the guest TSC.
But when the vCPUs merely have a different TSC *offset*, that's not a
problem. The offset is applied to that vCPU's kvmclock->tsc_timestamp
field, and it all comes out in the wash.
Track frequency matching separately from offset matching using a
dedicated freq generation counter (cur_tsc_freq_generation) that only
bumps on actual frequency changes. Each vCPU is counted exactly once per
freq generation via a per-vCPU this_tsc_freq_generation field, preventing
repeated syncs of the same vCPU from falsely re-enabling master clock.
Note that the generation-based counting has a known limitation: if all
vCPUs are in sync and one changes away and then back again, the other
vCPUs are still at the old generation and won't be counted until they
sync again (which may never happen). This was always the case for the
offset tracking and isn't expected VMM behaviour — although it is the
scenario that the VM-wide KVM_SET_TSC_KHZ ioctl was introduced to handle
cleanly.
While at it, restructure the existing TSC offset generation tracking to
use the same pattern: reset counter to zero on new generation, then
unconditionally count vCPUs that haven't been seen in this generation.
Both counters now use a consistent >= online_vcpus threshold (1-based
counting where the reference vCPU is included in the count).
Use frequency match for master clock eligibility, and full TSC match
(including offset) only for PVCLOCK_TSC_STABLE_BIT, which tells the
guest it is safe to skip cross-vCPU monotonicity enforcement.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
arch/x86/include/asm/kvm_host.h | 4 ++
arch/x86/kvm/x86.c | 68 ++++++++++++++++++++++++++-------
2 files changed, 58 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index eb81f90284ba..699a1a197194 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -970,6 +970,7 @@ struct kvm_vcpu_arch {
u64 this_tsc_nsec;
u64 this_tsc_write;
u64 this_tsc_generation;
+ u64 this_tsc_freq_generation;
bool tsc_catchup;
bool tsc_always_catchup;
s8 virtual_tsc_shift;
@@ -1493,6 +1494,9 @@ struct kvm_arch {
u64 cur_tsc_offset;
u64 cur_tsc_generation;
bool all_vcpus_matched_tsc;
+ bool all_vcpus_matched_freq;
+ int nr_vcpus_matched_freq;
+ u64 cur_tsc_freq_generation;
int nr_vcpus_matched_tsc;
u32 default_tsc_khz;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac66f8e7116f..86c30be4c5d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2647,14 +2647,37 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generation)
struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
/*
- * To use the masterclock, the host clocksource must be based on TSC
- * and all vCPUs must have matching TSCs. Note, the count for matching
- * vCPUs doesn't include the reference vCPU, hence "+1".
+ * Track whether all vCPUs have matching TSC offsets (for
+ * PVCLOCK_TSC_STABLE_BIT) and matching frequencies (for
+ * master clock eligibility).
+ */
+
+ /*
+ * A new vCPU might already have incremented ->online_vcpus
+ * and cause a temporary false negative here. But will then
+ * call kvm_synchronize_tsc() from kvm_arch_vcpu_postcreate()
+ * and finish the job.
*/
- ka->all_vcpus_matched_tsc = (ka->nr_vcpus_matched_tsc + 1 ==
- atomic_read(&vcpu->kvm->online_vcpus));
+ int online = atomic_read(&vcpu->kvm->online_vcpus);
- bool use_master_clock = ka->all_vcpus_matched_tsc &&
+ ka->all_vcpus_matched_tsc = (ka->nr_vcpus_matched_tsc >= online);
+ /*
+ * all_vcpus_matched_freq starts true and is cleared when
+ * __kvm_synchronize_tsc() detects a frequency mismatch.
+ * Re-enable when all vCPUs have synced with matching frequency.
+ * If all offsets also match, that implies frequencies match too.
+ */
+ if (ka->all_vcpus_matched_tsc ||
+ ka->nr_vcpus_matched_freq >= online)
+ ka->all_vcpus_matched_freq = true;
+
+ /*
+ * To use the masterclock, the host clocksource must be based on TSC
+ * and all vCPUs must have matching TSC *frequency*. Different offsets
+ * are fine — each vCPU's pvclock has its own tsc_timestamp that
+ * accounts for its offset.
+ */
+ bool use_master_clock = ka->all_vcpus_matched_freq &&
gtod_is_based_on_tsc(gtod->clock.vclock_mode);
/*
@@ -2818,7 +2841,22 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
* Track the TSC frequency, scaling ratio, and offset for the current
* generation. These are used to detect matching TSC writes and to
* compute the guest TSC from the host clock.
+ *
+ * If the frequency changed, master clock mode can no longer be used
+ * since the kvmclock scaling factors differ between vCPUs.
*/
+ if (vcpu->arch.virtual_tsc_khz != kvm->arch.cur_tsc_khz) {
+ kvm->arch.cur_tsc_freq_generation++;
+ kvm->arch.all_vcpus_matched_freq = false;
+ kvm->arch.nr_vcpus_matched_freq = 0;
+ }
+
+ /* Count each vCPU once per freq generation */
+ if (vcpu->arch.this_tsc_freq_generation != kvm->arch.cur_tsc_freq_generation) {
+ vcpu->arch.this_tsc_freq_generation = kvm->arch.cur_tsc_freq_generation;
+ kvm->arch.nr_vcpus_matched_freq++;
+ }
+
kvm->arch.cur_tsc_khz = vcpu->arch.virtual_tsc_khz;
kvm->arch.cur_tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
@@ -2835,17 +2873,18 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
* exact software computation in compute_guest_tsc()
*/
kvm->arch.cur_tsc_generation++;
+ kvm->arch.all_vcpus_matched_tsc = false;
+ kvm->arch.nr_vcpus_matched_tsc = 0;
kvm->arch.cur_tsc_nsec = ns;
kvm->arch.cur_tsc_write = tsc;
kvm->arch.cur_tsc_offset = offset;
- kvm->arch.nr_vcpus_matched_tsc = 0;
- kvm->arch.all_vcpus_matched_tsc = false;
- } else if (vcpu->arch.this_tsc_generation != kvm->arch.cur_tsc_generation) {
+ }
+
+ if (vcpu->arch.this_tsc_generation != kvm->arch.cur_tsc_generation) {
+ vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
kvm->arch.nr_vcpus_matched_tsc++;
}
- /* Keep track of which generation this VCPU has synchronized to */
- vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
@@ -3180,7 +3219,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
bool host_tsc_clocksource, vcpus_matched;
lockdep_assert_held(&kvm->arch.tsc_write_lock);
- vcpus_matched = ka->all_vcpus_matched_tsc;
+ vcpus_matched = ka->all_vcpus_matched_freq;
/*
* If the host uses TSC clock, then passthrough TSC as stable
@@ -3527,7 +3566,7 @@ int kvm_guest_time_update(struct kvm_vcpu *v)
/* If the host uses TSC clocksource, then it is stable */
hv_clock.flags = 0;
- if (use_master_clock)
+ if (use_master_clock && ka->all_vcpus_matched_tsc)
hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
if (vcpu->pv_time.active) {
@@ -6354,7 +6393,7 @@ static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_vcpu *v, void __user *argp)
hv_clock.tsc_shift = vcpu->pvclock_tsc_shift;
hv_clock.tsc_to_system_mul = vcpu->pvclock_tsc_mul;
- hv_clock.flags = PVCLOCK_TSC_STABLE_BIT;
+ hv_clock.flags = ka->all_vcpus_matched_tsc ? PVCLOCK_TSC_STABLE_BIT : 0;
if (copy_to_user(argp, &hv_clock, sizeof(hv_clock)))
return -EFAULT;
@@ -13649,6 +13688,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_init(&kvm->arch.apic_map_lock);
seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
+ kvm->arch.all_vcpus_matched_freq = true;
raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
pvclock_update_vm_gtod_copy(kvm);
--
2.54.0
next prev parent reply other threads:[~2026-06-08 14:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 14:47 [PATCH v5 00/34] Cleaning up the KVM clock mess David Woodhouse
2026-06-08 14:47 ` [PATCH v5 01/34] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 02/34] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2026-06-08 14:47 ` [PATCH v5 03/34] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms David Woodhouse
2026-06-08 14:47 ` [PATCH v5 04/34] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2026-06-16 6:47 ` Dongli Zhang
2026-06-16 11:13 ` David Woodhouse
2026-06-23 8:50 ` David Woodhouse
2026-06-08 14:47 ` [PATCH v5 05/34] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2026-06-08 14:47 ` [PATCH v5 06/34] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2026-07-01 21:38 ` Sean Christopherson
2026-06-08 14:47 ` [PATCH v5 07/34] KVM: x86: Activate master clock immediately on vCPU creation David Woodhouse
2026-06-08 14:47 ` [PATCH v5 08/34] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2026-06-09 23:26 ` Randy Dunlap
2026-07-01 21:47 ` Sean Christopherson
2026-06-08 14:47 ` [PATCH v5 09/34] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2026-06-08 14:47 ` [PATCH v5 10/34] KVM: x86: Fold __get_kvmclock() into get_kvmclock() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 11/34] KVM: x86: Restructure get_kvmclock() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 12/34] KVM: x86: Fix KVM clock precision in get_kvmclock() with TSC scaling David Woodhouse
2026-06-08 14:47 ` [PATCH v5 13/34] KVM: x86: Use get_kvmclock() in kvm_get_wall_clock_epoch() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 14/34] KVM: x86: Fix compute_guest_tsc() to handle negative time deltas David Woodhouse
2026-06-08 14:47 ` [PATCH v5 15/34] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling David Woodhouse
2026-06-08 14:47 ` [PATCH v5 16/34] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 17/34] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2026-06-08 14:47 ` [PATCH v5 18/34] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2026-06-08 14:48 ` [PATCH v5 19/34] KVM: x86: Kill last_tsc_{nsec,write,offset} fields David Woodhouse
2026-06-08 14:48 ` [PATCH v5 20/34] KVM: x86: Replace nr_vcpus_matched_tsc count with all_vcpus_matched_tsc bool David Woodhouse
2026-06-08 14:48 ` David Woodhouse [this message]
2026-06-08 14:48 ` [PATCH v5 22/34] KVM: selftests: Add master clock offset test David Woodhouse
2026-06-08 14:48 ` [PATCH v5 23/34] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2026-06-08 14:48 ` [PATCH v5 24/34] KVM: x86: Avoid gratuitous global clock updates David Woodhouse
2026-06-08 14:48 ` [PATCH v5 25/34] KVM: x86/xen: Prevent runstate times from becoming negative David Woodhouse
2026-06-08 14:48 ` [PATCH v5 26/34] KVM: x86: Avoid redundant masterclock updates from multiple vCPUs David Woodhouse
2026-06-08 14:48 ` [PATCH v5 27/34] KVM: x86: Remove runtime Xen TSC frequency CPUID update David Woodhouse
2026-06-08 14:48 ` [PATCH v5 28/34] KVM: selftests: Add Xen/generic CPUID timing leaf test David Woodhouse
2026-06-08 14:48 ` [PATCH v5 29/34] KVM: x86: Re-synchronize TSC after KVM_SET_TSC_KHZ David Woodhouse
2026-06-08 14:48 ` [PATCH v5 30/34] KVM: selftests: Add Xen runstate migration test David Woodhouse
2026-06-08 14:48 ` [PATCH v5 31/34] KVM: x86: Use ktime_get_snapshot_id() for master clock David Woodhouse
2026-06-08 14:48 ` [PATCH v5 32/34] KVM: x86: Compute kvmclock base without pvclock_gtod_data David Woodhouse
2026-06-08 14:48 ` [PATCH v5 33/34] KVM: x86: Replace pvclock_gtod_data vclock_mode with boolean David Woodhouse
2026-06-08 14:48 ` [PATCH v5 34/34] KVM: x86: Remove pvclock_gtod_data and private timekeeping code David Woodhouse
2026-06-09 18:50 ` [PATCH v5 00/34] Cleaning up the KVM clock mess David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608145455.89187-22-dwmw2@infradead.org \
--to=dwmw2@infradead.org \
--cc=Sascha.Bischoff@arm.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dongli.zhang@oracle.com \
--cc=hpa@zytor.com \
--cc=jalliste@amazon.com \
--cc=jgross@suse.com \
--cc=jic23@kernel.org \
--cc=joe.jin@oracle.com \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=vkuznets@redhat.com \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox