kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] KVM: x86: pvclock fixes and cleanups
@ 2025-01-18  0:55 Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Fix a lockdep splat in KVM's suspend notifier by simply removing a
spurious kvm->lock acquisition related to kvmclock, and then try to
wrangle KVM's pvclock handling into something approaching sanity (I
made the mistake of looking at how KVM handled PVCLOCK_GUEST_STOPPED).

David,

I didn't look too closely to see how this interacts with your overhaul of
the pvclock madness[*].  When I first started poking at this, I didn't
realize vcpu->arch.hv_lock had tendrils in so many places.  Please holler
if you want me to drop the vcpu->arch.hv_lock changes and/or tweak
something to make it play nice with your series.

The Xen changes are *very* lightly tested, so I definitely won't apply
the potentially problematic changes, i.e. anything past "Don't bleed
PVCLOCK_GUEST_STOPPED across PV clocks", until I get a thumbs up from
you and/or Paul.

[*] https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org

Sean Christopherson (10):
  KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend
    notifier
  KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV
    clock
  KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  KVM: x86/xen: Use guest's copy of pvclock when starting timer
  KVM: x86: Pass reference pvclock as a param to
    kvm_setup_guest_pvclock()
  KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock
    update)
  KVM: x86: Override TSC_STABLE flag for Xen PV clocks in
    kvm_guest_time_update()

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/x86.c              | 115 ++++++++++++++++----------------
 arch/x86/kvm/xen.c              |  62 +++++++++++++++--
 3 files changed, 114 insertions(+), 66 deletions(-)


base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:01   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
don't take kvm->lock as doing so can trigger a largely theoretical
deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
any way (pv_time.active and pvclock_set_guest_stopped_request are
protected by vcpu->mutex, not kvm->lock).

Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d9a16fd4d3..26e18c9b0375 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6907,7 +6907,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 	unsigned long i;
 	int ret = 0;
 
-	mutex_lock(&kvm->lock);
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (!vcpu->arch.pv_time.active)
 			continue;
@@ -6919,7 +6918,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 			break;
 		}
 	}
-	mutex_unlock(&kvm->lock);
 
 	return ret ? NOTIFY_BAD : NOTIFY_DONE;
 }
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:03   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
is active.  I.e. barring some bizarre edge case that shouldn't be treated
as an error in the first place, kvm_arch_suspend_notifier() can't fail.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26e18c9b0375..ef21158ec6b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6905,21 +6905,15 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 {
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
-	int ret = 0;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!vcpu->arch.pv_time.active)
-			continue;
+	/*
+	 * Ignore the return, marking the guest paused only "fails" if the vCPU
+	 * isn't using kvmclock; continuing on is correct and desirable.
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		(void)kvm_set_guest_paused(vcpu);
 
-		ret = kvm_set_guest_paused(vcpu);
-		if (ret) {
-			kvm_err("Failed to pause guest VCPU%d: %d\n",
-				vcpu->vcpu_id, ret);
-			break;
-		}
-	}
-
-	return ret ? NOTIFY_BAD : NOTIFY_DONE;
+	return NOTIFY_DONE;
 }
 
 int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state)
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:05   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
is immediately shoved into the per-vCPU "cache", i.e. the local variable
serves no purpose.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef21158ec6b2..d8ee37dd2b57 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3178,7 +3178,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	struct kvm_arch *ka = &v->kvm->arch;
 	s64 kernel_ns;
 	u64 tsc_timestamp, host_tsc;
-	u8 pvclock_flags;
 	bool use_master_clock;
 #ifdef CONFIG_KVM_XEN
 	/*
@@ -3261,11 +3260,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	vcpu->last_guest_tsc = tsc_timestamp;
 
 	/* If the host uses TSC clocksource, then it is stable */
-	pvclock_flags = 0;
+	vcpu->hv_clock.flags = 0;
 	if (use_master_clock)
-		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
-
-	vcpu->hv_clock.flags = pvclock_flags;
+		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
 	if (vcpu->pv_time.active)
 		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:42   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Handle "guest stopped" propagation only for kvmclock, as the flag is set
if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
bizarre behavior for the guest.

Simply restrict the flag to kvmclock, e.g. instead of trying to handle
Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
added during a refactoring, and while Xen proper defines
XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
support the flag.

Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
is active to preserve the original behavior, i.e. keep the flag pending
if kvmclock happens to be disabled when KVM processes the initial request.

Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d8ee37dd2b57..3c4d210e8a9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
 	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
 
-	if (vcpu->pvclock_set_guest_stopped_request) {
-		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
-		vcpu->pvclock_set_guest_stopped_request = false;
-	}
-
 	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
 
 	if (force_tsc_unstable)
@@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	if (use_master_clock)
 		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
-	if (vcpu->pv_time.active)
+	if (vcpu->pv_time.active) {
+		/*
+		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
+		 * historic behavior is to only process the request if kvmclock
+		 * is active/enabled.
+		 */
+		if (vcpu->pvclock_set_guest_stopped_request) {
+			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+			vcpu->pvclock_set_guest_stopped_request = false;
+		}
 		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+
+		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+	}
+
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:54   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating a specific PV clock, make a full copy of KVM's reference
copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
clock.

Using a local copy of the pvclock structure also sets the stage for
eliminating the per-vCPU copy/cache (only the TSC frequency information
actually "needs" to be cached/persisted).

Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c4d210e8a9e..5f3ad13a8ac7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info *guest_hv_clock;
+	struct pvclock_vcpu_time_info hv_clock;
 	unsigned long flags;
 
+	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+
 	read_lock_irqsave(&gpc->lock, flags);
 	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
 		read_unlock_irqrestore(&gpc->lock, flags);
@@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	 * it is consistent.
 	 */
 
-	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
+	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
 	smp_wmb();
 
 	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
-	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
 
-	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
+	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
 
 	if (force_tsc_unstable)
 		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
 
 	smp_wmb();
 
-	guest_hv_clock->version = ++vcpu->hv_clock.version;
+	guest_hv_clock->version = ++hv_clock.version;
 
 	kvm_gpc_mark_dirty_in_slot(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
-	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
+	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
 }
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:58   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
reference copy may not be up-to-date, i.e. may yield a false positive of
sorts.  In the unlikely scenario that the guest is starting a Xen timer
and has used a Xen pvclock in the past, but has since but turned it "off",
then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
if and only if at least pvclock is enabled.

Furthermore, vcpu->arch.hv_clock is currently used by three different
pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
guest would ever enable multiple pvclocks, effectively sharing KVM's
reference clock could yield very weird behavior.  Using the guest's active
Xen pvclock instead of KVM's reference will allow dropping KVM's
reference copy.

Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index a909b817b9c0..b82c28223585 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
+				 struct pvclock_vcpu_time_info *hv_clock,
+				 struct gfn_to_pfn_cache *gpc,
+				 unsigned int offset)
+{
+	struct pvclock_vcpu_time_info *guest_hv_clock;
+	unsigned long flags;
+	int r;
+
+	read_lock_irqsave(&gpc->lock, flags);
+	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
+		read_unlock_irqrestore(&gpc->lock, flags);
+
+		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
+		if (r)
+			return r;
+
+		read_lock_irqsave(&gpc->lock, flags);
+	}
+
+	memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
+	read_unlock_irqrestore(&gpc->lock, flags);
+
+	/*
+	 * Sanity check TSC shift+multiplier to verify the guest's view of time
+	 * is more or less consistent.
+	 */
+	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
+	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+		return -EINVAL;
+	return 0;
+}
+
 static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 				bool linux_wa)
 {
+	struct kvm_vcpu_xen *xen;
 	int64_t kernel_now, delta;
 	uint64_t guest_now;
+	int r = -EOPNOTSUPP;
 
 	/*
 	 * The guest provides the requested timeout in absolute nanoseconds
@@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 	 * the absolute CLOCK_MONOTONIC time at which the timer should
 	 * fire.
 	 */
-	if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
-	    static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+	do {
+		struct pvclock_vcpu_time_info hv_clock;
 		uint64_t host_tsc, guest_tsc;
 
+		if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
+		    !vcpu->kvm->arch.use_master_clock)
+			break;
+
+		if (xen->vcpu_info_cache.active)
+			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
+						offsetof(struct compat_vcpu_info, time));
+		else if (xen->vcpu_time_info_cache.active)
+			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
+		if (r)
+			break;
+
 		if (!IS_ENABLED(CONFIG_64BIT) ||
 		    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
 			/*
@@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 
 		/* Calculate the guest kvmclock as the guest would do it. */
 		guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
-		guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
-						  guest_tsc);
-	} else {
+		guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
+	} while (0);
+
+	if (r) {
 		/*
 		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
 		 *
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:00   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Pass the reference pvclock structure that's used to setup each individual
pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
toward removing kvm_vcpu_arch.hv_clock.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f3ad13a8ac7..06d27b3cc207 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3116,17 +3116,17 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	return data.clock;
 }
 
-static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
+static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
+				    struct kvm_vcpu *vcpu,
 				    struct gfn_to_pfn_cache *gpc,
 				    unsigned int offset,
 				    bool force_tsc_unstable)
 {
-	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info *guest_hv_clock;
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned long flags;
 
-	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+	memcpy(&hv_clock, ref_hv_clock, sizeof(hv_clock));
 
 	read_lock_irqsave(&gpc->lock, flags);
 	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
@@ -3165,7 +3165,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	kvm_gpc_mark_dirty_in_slot(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
-	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
+	trace_kvm_pvclock_update(vcpu->vcpu_id, &hv_clock);
 }
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
@@ -3272,18 +3272,18 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
 
 		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
-		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
 					offsetof(struct compat_vcpu_info, time),
 					xen_pvclock_tsc_unstable);
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0,
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
 	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:03   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Remove the per-vCPU "cache" of the reference pvclock and instead cache
only the TSC shift+multiplier.  All other fields in pvclock are fully
recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.

In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
fields makes it easier to see that those fields are tied to hw_tsc_khz
(they exist to avoid having to do expensive math in the common case).
And conversely, not tracking the other fields makes it easier to see that
things like the version number are pulled from the guest's copy, not from
KVM's reference.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/x86.c              | 27 +++++++++++++++------------
 arch/x86/kvm/xen.c              |  8 ++++----
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5193c3dfbce1..f26105654ec4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -900,7 +900,8 @@ struct kvm_vcpu_arch {
 	int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
 
 	gpa_t time;
-	struct pvclock_vcpu_time_info hv_clock;
+	u8  pvclock_tsc_shift;
+	u32 pvclock_tsc_mul;
 	unsigned int hw_tsc_khz;
 	struct gfn_to_pfn_cache pv_time;
 	/* set guest stopped flag in pvclock flags field */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 06d27b3cc207..9eabd70891dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3170,6 +3170,7 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
+	struct pvclock_vcpu_time_info hv_clock = {};
 	unsigned long flags, tgt_tsc_khz;
 	unsigned seq;
 	struct kvm_vcpu_arch *vcpu = &v->arch;
@@ -3247,20 +3248,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 
 	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
 		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
-				   &vcpu->hv_clock.tsc_shift,
-				   &vcpu->hv_clock.tsc_to_system_mul);
+				   &vcpu->pvclock_tsc_shift,
+				   &vcpu->pvclock_tsc_mul);
 		vcpu->hw_tsc_khz = tgt_tsc_khz;
 		kvm_xen_update_tsc_info(v);
 	}
 
-	vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
-	vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
+	hv_clock.tsc_shift = vcpu->pvclock_tsc_shift;
+	hv_clock.tsc_to_system_mul = vcpu->pvclock_tsc_mul;
+	hv_clock.tsc_timestamp = tsc_timestamp;
+	hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
 	vcpu->last_guest_tsc = tsc_timestamp;
 
 	/* If the host uses TSC clocksource, then it is stable */
-	vcpu->hv_clock.flags = 0;
+	hv_clock.flags = 0;
 	if (use_master_clock)
-		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
+		hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
 	if (vcpu->pv_time.active) {
 		/*
@@ -3269,24 +3272,24 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		 * is active/enabled.
 		 */
 		if (vcpu->pvclock_set_guest_stopped_request) {
-			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+			hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
 
-		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
 					offsetof(struct compat_vcpu_info, time),
 					xen_pvclock_tsc_unstable);
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
-	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 	return 0;
 }
 
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b82c28223585..7c6e4172527a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -177,8 +177,8 @@ static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
 	 * Sanity check TSC shift+multiplier to verify the guest's view of time
 	 * is more or less consistent.
 	 */
-	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
-	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+	if (hv_clock->tsc_shift != vcpu->arch.pvclock_tsc_shift ||
+	    hv_clock->tsc_to_system_mul != vcpu->arch.pvclock_tsc_mul)
 		return -EINVAL;
 	return 0;
 }
@@ -2309,8 +2309,8 @@ void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu)
 
 	entry = kvm_find_cpuid_entry_index(vcpu, function, 1);
 	if (entry) {
-		entry->ecx = vcpu->arch.hv_clock.tsc_to_system_mul;
-		entry->edx = vcpu->arch.hv_clock.tsc_shift;
+		entry->ecx = vcpu->arch.pvclock_tsc_mul;
+		entry->edx = vcpu->arch.pvclock_tsc_shift;
 	}
 
 	entry = kvm_find_cpuid_entry_index(vcpu, function, 2);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-20 14:49   ` Vitaly Kuznetsov
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating paravirtual clocks, setup the Hyper-V TSC page before
Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eabd70891dd..c68e7f7ba69d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
+	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
+
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
@@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
-	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 	return 0;
 }
 
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:05   ` Paul Durrant
  9 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
in the flags of the reference pvclock structure.  Expand the comment to
(hopefully) make it obvious that Xen clocks need to be processed after all
clocks that care about the TSC_STABLE flag.

No functional change intended.

Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c68e7f7ba69d..065b349a0218 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3119,8 +3119,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 				    struct kvm_vcpu *vcpu,
 				    struct gfn_to_pfn_cache *gpc,
-				    unsigned int offset,
-				    bool force_tsc_unstable)
+				    unsigned int offset)
 {
 	struct pvclock_vcpu_time_info *guest_hv_clock;
 	struct pvclock_vcpu_time_info hv_clock;
@@ -3155,9 +3154,6 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 
 	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
 
-	if (force_tsc_unstable)
-		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
-
 	smp_wmb();
 
 	guest_hv_clock->version = ++hv_clock.version;
@@ -3178,16 +3174,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	s64 kernel_ns;
 	u64 tsc_timestamp, host_tsc;
 	bool use_master_clock;
-#ifdef CONFIG_KVM_XEN
-	/*
-	 * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
-	 * explicitly told to use TSC as its clocksource Xen will not set this bit.
-	 * This default behaviour led to bugs in some guest kernels which cause
-	 * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
-	 */
-	bool xen_pvclock_tsc_unstable =
-		ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
-#endif
 
 	kernel_ns = 0;
 	host_tsc = 0;
@@ -3275,7 +3261,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 			hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0);
 
 		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
@@ -3283,13 +3269,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 
 #ifdef CONFIG_KVM_XEN
+	/*
+	 * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
+	 * explicitly told to use TSC as its clocksource Xen will not set this bit.
+	 * This default behaviour led to bugs in some guest kernels which cause
+	 * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
+	 *
+	 * Note!  Clear TSC_STABLE only for Xen clocks, i.e. the order matters!
+	 */
+	if (ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE)
+		hv_clock.flags &= ~PVCLOCK_TSC_STABLE_BIT;
+
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
-					offsetof(struct compat_vcpu_info, time),
-					xen_pvclock_tsc_unstable);
+					offsetof(struct compat_vcpu_info, time));
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
-					xen_pvclock_tsc_unstable);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0);
 #endif
 	return 0;
 }
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-20 14:49   ` Vitaly Kuznetsov
  2025-01-21 15:44     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2025-01-20 14:49 UTC (permalink / raw)
  To: Sean Christopherson, Sean Christopherson, Paolo Bonzini,
	David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse

Sean Christopherson <seanjc@google.com> writes:

> When updating paravirtual clocks, setup the Hyper-V TSC page before
> Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
> of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9eabd70891dd..c68e7f7ba69d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>  	}
>  
> +	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> +
>  #ifdef CONFIG_KVM_XEN
>  	if (vcpu->xen.vcpu_info_cache.active)
>  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
>  					xen_pvclock_tsc_unstable);
>  #endif
> -	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
>  	return 0;
>  }

"No functional change detected".

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

(What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
*some* of the PV clocks as stable and some as unstable to the same
guest, i.e. if it would make sense to disable Hyper-V TSC page when
KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too. I don't know if anyone
combines Xen and Hyper-V emulation capabilities for the same guest on
KVM though.)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-20 14:49   ` Vitaly Kuznetsov
@ 2025-01-21 15:44     ` Sean Christopherson
  2025-01-21 15:59       ` Paul Durrant
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-21 15:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, David Woodhouse, Paul Durrant, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse

On Mon, Jan 20, 2025, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > When updating paravirtual clocks, setup the Hyper-V TSC page before
> > Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
> > of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
> >
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/x86.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9eabd70891dd..c68e7f7ba69d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >  		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> >  	}
> >  
> > +	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> > +
> >  #ifdef CONFIG_KVM_XEN
> >  	if (vcpu->xen.vcpu_info_cache.active)
> >  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> > @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
> >  					xen_pvclock_tsc_unstable);
> >  #endif
> > -	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> >  	return 0;
> >  }
> 
> "No functional change detected".
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> (What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
> *some* of the PV clocks as stable and some as unstable to the same
> guest, i.e. if it would make sense to disable Hyper-V TSC page when
> KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too.

I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
PV clock is truly unstable, it's that some guests get tripped up by the STABLE
flag.  A guest that can't handle the STABLE flag has bigger problems than the
existence of a completely unrelated clock that is implied to be stable.

> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> the same guest on KVM though.)

That someone would have to be quite "brave" :-D

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 15:44     ` Sean Christopherson
@ 2025-01-21 15:59       ` Paul Durrant
  2025-01-21 17:16         ` David Woodhouse
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 15:59 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse

On 21/01/2025 15:44, Sean Christopherson wrote:
[snip]
> 
> I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> flag.  A guest that can't handle the STABLE flag has bigger problems than the
> existence of a completely unrelated clock that is implied to be stable.
> 

Agreed.

>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>> the same guest on KVM though.)
> 
> That someone would have to be quite "brave" :-D

Maybe :-)

Reviewed-by: Paul Durrant <paul@xen.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-21 16:01   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:01 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
> don't take kvm->lock as doing so can trigger a largely theoretical
> deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
> holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
> any way (pv_time.active and pvclock_set_guest_stopped_request are
> protected by vcpu->mutex, not kvm->lock).
> 
> Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
> Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-21 16:03   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
> SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
> isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
> is active.  I.e. barring some bizarre edge case that shouldn't be treated
> as an error in the first place, kvm_arch_suspend_notifier() can't fail.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 20 +++++++-------------
>   1 file changed, 7 insertions(+), 13 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 16:05   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:05 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
> is immediately shoved into the per-vCPU "cache", i.e. the local variable
> serves no purpose.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-21 16:42   ` Paul Durrant
  2025-01-21 17:09     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Handle "guest stopped" propagation only for kvmclock, as the flag is set
> if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
> if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
> bizarre behavior for the guest.
> 
> Simply restrict the flag to kvmclock, e.g. instead of trying to handle
> Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
> added during a refactoring, and while Xen proper defines
> XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
> support the flag.

Indeed. I can find no consumers of the flag.

> 
> Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
> is active to preserve the original behavior, i.e. keep the flag pending
> if kvmclock happens to be disabled when KVM processes the initial request.
> 
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 20 ++++++++++++++------
>   1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d8ee37dd2b57..3c4d210e8a9e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>   	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>   
> -	if (vcpu->pvclock_set_guest_stopped_request) {
> -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> -		vcpu->pvclock_set_guest_stopped_request = false;
> -	}
> -
>   	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>   
>   	if (force_tsc_unstable)
> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   	if (use_master_clock)
>   		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>   
> -	if (vcpu->pv_time.active)
> +	if (vcpu->pv_time.active) {
> +		/*
> +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> +		 * historic behavior is to only process the request if kvmclock
> +		 * is active/enabled.
> +		 */
> +		if (vcpu->pvclock_set_guest_stopped_request) {
> +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> +			vcpu->pvclock_set_guest_stopped_request = false;
> +		}
>   		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> +
> +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;

Is this intentional? The line above your change in 
kvm_setup_guest_pvclock() clearly keeps the flag enabled if it already 
set and, without this patch, I don't see anything clearing it.

> +	}
> +
>   #ifdef CONFIG_KVM_XEN
>   	if (vcpu->xen.vcpu_info_cache.active)
>   		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-21 16:54   ` Paul Durrant
  2025-01-21 17:11     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:54 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating a specific PV clock, make a full copy of KVM's reference
> copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> clock.

... but the line I queried in the previous patch squashes the flag 
before the Xen PV clock is set up, so no bleed?

> 
> Using a local copy of the pvclock structure also sets the stage for
> eliminating the per-vCPU copy/cache (only the TSC frequency information
> actually "needs" to be cached/persisted).
> 
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c4d210e8a9e..5f3ad13a8ac7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   {
>   	struct kvm_vcpu_arch *vcpu = &v->arch;
>   	struct pvclock_vcpu_time_info *guest_hv_clock;
> +	struct pvclock_vcpu_time_info hv_clock;
>   	unsigned long flags;
>   
> +	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> +
>   	read_lock_irqsave(&gpc->lock, flags);
>   	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
>   		read_unlock_irqrestore(&gpc->lock, flags);
> @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   	 * it is consistent.
>   	 */
>   
> -	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> +	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
>   	smp_wmb();
>   
>   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> -	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> +	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>   
> -	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> +	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
>   
>   	if (force_tsc_unstable)
>   		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
>   
>   	smp_wmb();
>   
> -	guest_hv_clock->version = ++vcpu->hv_clock.version;
> +	guest_hv_clock->version = ++hv_clock.version;
>   
>   	kvm_gpc_mark_dirty_in_slot(gpc);
>   	read_unlock_irqrestore(&gpc->lock, flags);
>   
> -	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> +	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
>   }
>   
>   static int kvm_guest_time_update(struct kvm_vcpu *v)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-21 16:58   ` Paul Durrant
  2025-01-21 18:45     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 16:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> reference copy may not be up-to-date, i.e. may yield a false positive of
> sorts.  In the unlikely scenario that the guest is starting a Xen timer
> and has used a Xen pvclock in the past, but has since but turned it "off",
> then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> if and only if at least pvclock is enabled.
> 
> Furthermore, vcpu->arch.hv_clock is currently used by three different
> pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
> guest would ever enable multiple pvclocks, effectively sharing KVM's
> reference clock could yield very weird behavior.  Using the guest's active
> Xen pvclock instead of KVM's reference will allow dropping KVM's
> reference copy.
> 
> Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index a909b817b9c0..b82c28223585 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
>   	return HRTIMER_NORESTART;
>   }
>   
> +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> +				 struct pvclock_vcpu_time_info *hv_clock,
> +				 struct gfn_to_pfn_cache *gpc,
> +				 unsigned int offset)
> +{
> +	struct pvclock_vcpu_time_info *guest_hv_clock;
> +	unsigned long flags;
> +	int r;
> +
> +	read_lock_irqsave(&gpc->lock, flags);
> +	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> +		read_unlock_irqrestore(&gpc->lock, flags);
> +
> +		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> +		if (r)
> +			return r;
> +
> +		read_lock_irqsave(&gpc->lock, flags);
> +	}
> +

I guess I must be missing something subtle... What is setting 
guest_hv_clock to point at something meaningful before this line?

> +	memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
> +	read_unlock_irqrestore(&gpc->lock, flags);
> +
> +	/*
> +	 * Sanity check TSC shift+multiplier to verify the guest's view of time
> +	 * is more or less consistent.
> +	 */
> +	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
> +	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
> +		return -EINVAL;
> +	return 0;
> +}
> +
>   static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   				bool linux_wa)
>   {
> +	struct kvm_vcpu_xen *xen;
>   	int64_t kernel_now, delta;
>   	uint64_t guest_now;
> +	int r = -EOPNOTSUPP;
>   
>   	/*
>   	 * The guest provides the requested timeout in absolute nanoseconds
> @@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   	 * the absolute CLOCK_MONOTONIC time at which the timer should
>   	 * fire.
>   	 */
> -	if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
> -	    static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +	do {
> +		struct pvclock_vcpu_time_info hv_clock;
>   		uint64_t host_tsc, guest_tsc;
>   
> +		if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
> +		    !vcpu->kvm->arch.use_master_clock)
> +			break;
> +
> +		if (xen->vcpu_info_cache.active)
> +			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
> +						offsetof(struct compat_vcpu_info, time));
> +		else if (xen->vcpu_time_info_cache.active)
> +			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
> +		if (r)
> +			break;
> +
>   		if (!IS_ENABLED(CONFIG_64BIT) ||
>   		    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
>   			/*
> @@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   
>   		/* Calculate the guest kvmclock as the guest would do it. */
>   		guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
> -		guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
> -						  guest_tsc);
> -	} else {
> +		guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
> +	} while (0);
> +
> +	if (r) {
>   		/*
>   		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
>   		 *


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-21 17:00   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 17:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Pass the reference pvclock structure that's used to setup each individual
> pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
> toward removing kvm_vcpu_arch.hv_clock.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-21 17:03   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 17:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Remove the per-vCPU "cache" of the reference pvclock and instead cache
> only the TSC shift+multiplier.  All other fields in pvclock are fully
> recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.
> 
> In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
> fields makes it easier to see that those fields are tied to hw_tsc_khz
> (they exist to avoid having to do expensive math in the common case).
> And conversely, not tracking the other fields makes it easier to see that
> things like the version number are pulled from the guest's copy, not from
> KVM's reference.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  3 ++-
>   arch/x86/kvm/x86.c              | 27 +++++++++++++++------------
>   arch/x86/kvm/xen.c              |  8 ++++----
>   3 files changed, 21 insertions(+), 17 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 17:05   ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 17:05 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
> the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
> in the flags of the reference pvclock structure.  Expand the comment to
> (hopefully) make it obvious that Xen clocks need to be processed after all
> clocks that care about the TSC_STABLE flag.
> 
> No functional change intended.
> 
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 16:42   ` Paul Durrant
@ 2025-01-21 17:09     ` Sean Christopherson
  2025-01-21 17:15       ` Paul Durrant
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:09 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> > ---
> >   arch/x86/kvm/x86.c | 20 ++++++++++++++------
> >   1 file changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d8ee37dd2b57..3c4d210e8a9e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> >   	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > -	if (vcpu->pvclock_set_guest_stopped_request) {
> > -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > -		vcpu->pvclock_set_guest_stopped_request = false;
> > -	}
> > -
> >   	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> >   	if (force_tsc_unstable)
> > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >   	if (use_master_clock)
> >   		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > -	if (vcpu->pv_time.active)
> > +	if (vcpu->pv_time.active) {
> > +		/*
> > +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > +		 * historic behavior is to only process the request if kvmclock
> > +		 * is active/enabled.
> > +		 */
> > +		if (vcpu->pvclock_set_guest_stopped_request) {
> > +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > +			vcpu->pvclock_set_guest_stopped_request = false;
> > +		}
> >   		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > +
> > +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> 
> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> clearly keeps the flag enabled if it already set and, without this patch, I
> don't see anything clearing it.

Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
only for kvmclock).

Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
applied to the first active clock (kvmclock).

The only way I can think of to fully isolate the changes would be to split this
into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".

4a would be quite ugly, because to avoid introducing a functional change, it
would need to be:

	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
	    vcpu->xen.vcpu_time_info_cache.active) {
		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
		vcpu->pvclock_set_guest_stopped_request = false;
	}

But it's not the worst intermediate code, so I'm not opposed to going that
route.

> > +	}
> > +
> >   #ifdef CONFIG_KVM_XEN
> >   	if (vcpu->xen.vcpu_info_cache.active)
> >   		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-21 16:54   ` Paul Durrant
@ 2025-01-21 17:11     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:11 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > When updating a specific PV clock, make a full copy of KVM's reference
> > copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> > E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> > PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> > clock.
> 
> ... but the line I queried in the previous patch squashes the flag before
> the Xen PV clock is set up, so no bleed?

Yeah, in practice, no bleed after the previous patch.  But very theoretically,
there could be bleed if the guest set PVCLOCK_GUEST_STOPPED in the compat clock
*and* had both compat and non-compat Xen PV clocks active (is that even possible?)

> > Using a local copy of the pvclock structure also sets the stage for
> > eliminating the per-vCPU copy/cache (only the TSC frequency information
> > actually "needs" to be cached/persisted).
> > 
> > Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/x86.c | 13 ++++++++-----
> >   1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3c4d210e8a9e..5f3ad13a8ac7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   {
> >   	struct kvm_vcpu_arch *vcpu = &v->arch;
> >   	struct pvclock_vcpu_time_info *guest_hv_clock;
> > +	struct pvclock_vcpu_time_info hv_clock;
> >   	unsigned long flags;
> > +	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> > +
> >   	read_lock_irqsave(&gpc->lock, flags);
> >   	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> >   		read_unlock_irqrestore(&gpc->lock, flags);
> > @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   	 * it is consistent.
> >   	 */
> > -	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> > +	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
> >   	smp_wmb();
> >   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > -	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > +	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > -	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > +	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
> >   	if (force_tsc_unstable)
> >   		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
> >   	smp_wmb();
> > -	guest_hv_clock->version = ++vcpu->hv_clock.version;
> > +	guest_hv_clock->version = ++hv_clock.version;
> >   	kvm_gpc_mark_dirty_in_slot(gpc);
> >   	read_unlock_irqrestore(&gpc->lock, flags);
> > -	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> > +	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
> >   }
> >   static int kvm_guest_time_update(struct kvm_vcpu *v)
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 17:09     ` Sean Christopherson
@ 2025-01-21 17:15       ` Paul Durrant
  2025-01-21 18:32         ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 17:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On 21/01/2025 17:09, Sean Christopherson wrote:
> On Tue, Jan 21, 2025, Paul Durrant wrote:
>>> ---
>>>    arch/x86/kvm/x86.c | 20 ++++++++++++++------
>>>    1 file changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index d8ee37dd2b57..3c4d210e8a9e 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>>>    	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>>>    	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>>> -	if (vcpu->pvclock_set_guest_stopped_request) {
>>> -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> -		vcpu->pvclock_set_guest_stopped_request = false;
>>> -	}
>>> -
>>>    	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>>>    	if (force_tsc_unstable)
>>> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>>    	if (use_master_clock)
>>>    		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>>> -	if (vcpu->pv_time.active)
>>> +	if (vcpu->pv_time.active) {
>>> +		/*
>>> +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
>>> +		 * historic behavior is to only process the request if kvmclock
>>> +		 * is active/enabled.
>>> +		 */
>>> +		if (vcpu->pvclock_set_guest_stopped_request) {
>>> +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> +			vcpu->pvclock_set_guest_stopped_request = false;
>>> +		}
>>>    		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
>>> +
>>> +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>>
>> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
>> clearly keeps the flag enabled if it already set and, without this patch, I
>> don't see anything clearing it.
> 
> Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
> otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> only for kvmclock).
> 
> Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> applied to the first active clock (kvmclock).
> 
> The only way I can think of to fully isolate the changes would be to split this
> into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
> 
> 4a would be quite ugly, because to avoid introducing a functional change, it
> would need to be:
> 
> 	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> 	    vcpu->xen.vcpu_time_info_cache.active) {
> 		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> 		vcpu->pvclock_set_guest_stopped_request = false;
> 	}
> 
> But it's not the worst intermediate code, so I'm not opposed to going that
> route.
> 

What about putting this change after patch 7. Then you could take a 
local copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and 
so avoid bleeding the flag that way?

>>> +	}
>>> +
>>>    #ifdef CONFIG_KVM_XEN
>>>    	if (vcpu->xen.vcpu_info_cache.active)
>>>    		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 15:59       ` Paul Durrant
@ 2025-01-21 17:16         ` David Woodhouse
  2025-01-21 17:30           ` Paul Durrant
  0 siblings, 1 reply; 30+ messages in thread
From: David Woodhouse @ 2025-01-21 17:16 UTC (permalink / raw)
  To: paul, Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
	Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
> On 21/01/2025 15:44, Sean Christopherson wrote:
> [snip]
> > 
> > I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
> > PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> > flag.  A guest that can't handle the STABLE flag has bigger problems than the
> > existence of a completely unrelated clock that is implied to be stable.
> > 
> 
> Agreed.
> 
> > > I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> > > the same guest on KVM though.)
> > 
> > That someone would have to be quite "brave" :-D
> 
> Maybe :-)

Xen itself does offer some Hyper-V enlightenments, and we might
reasonably expect KVM-based hypervisors to offer the same. We
explicitly do account for the KVM CPUID leaves moving up to let the
Hyper-V ones exist.

I don't recall if Xen's Hyper-V support includes the TSC page though.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 17:16         ` David Woodhouse
@ 2025-01-21 17:30           ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2025-01-21 17:30 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
	Paul Durrant

On 21/01/2025 17:16, David Woodhouse wrote:
> On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
>> On 21/01/2025 15:44, Sean Christopherson wrote:
>> [snip]
>>>
>>> I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
>>> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
>>> flag.  A guest that can't handle the STABLE flag has bigger problems than the
>>> existence of a completely unrelated clock that is implied to be stable.
>>>
>>
>> Agreed.
>>
>>>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>>>> the same guest on KVM though.)
>>>
>>> That someone would have to be quite "brave" :-D
>>
>> Maybe :-)
> 
> Xen itself does offer some Hyper-V enlightenments, and we might
> reasonably expect KVM-based hypervisors to offer the same. We
> explicitly do account for the KVM CPUID leaves moving up to let the
> Hyper-V ones exist.
> 
> I don't recall if Xen's Hyper-V support includes the TSC page though.

It does :-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 17:15       ` Paul Durrant
@ 2025-01-21 18:32         ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:32 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 21/01/2025 17:09, Sean Christopherson wrote:
> > On Tue, Jan 21, 2025, Paul Durrant wrote:
> > > > ---
> > > >    arch/x86/kvm/x86.c | 20 ++++++++++++++------
> > > >    1 file changed, 14 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index d8ee37dd2b57..3c4d210e8a9e 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > > >    	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > > >    	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > > > -	if (vcpu->pvclock_set_guest_stopped_request) {
> > > > -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > -		vcpu->pvclock_set_guest_stopped_request = false;
> > > > -	}
> > > > -
> > > >    	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > > >    	if (force_tsc_unstable)
> > > > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > > >    	if (use_master_clock)
> > > >    		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > > > -	if (vcpu->pv_time.active)
> > > > +	if (vcpu->pv_time.active) {
> > > > +		/*
> > > > +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > > > +		 * historic behavior is to only process the request if kvmclock
> > > > +		 * is active/enabled.
> > > > +		 */
> > > > +		if (vcpu->pvclock_set_guest_stopped_request) {
> > > > +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > +			vcpu->pvclock_set_guest_stopped_request = false;
> > > > +		}
> > > >    		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > > > +
> > > > +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> > > 
> > > Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> > > clearly keeps the flag enabled if it already set and, without this patch, I
> > > don't see anything clearing it.
> > 
> > Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
> > otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> > only for kvmclock).
> > 
> > Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> > break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> > applied to the first active clock (kvmclock).
> > 
> > The only way I can think of to fully isolate the changes would be to split this
> > into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> > kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> > ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
> > 
> > 4a would be quite ugly, because to avoid introducing a functional change, it
> > would need to be:
> > 
> > 	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> > 	    vcpu->xen.vcpu_time_info_cache.active) {
> > 		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > 		vcpu->pvclock_set_guest_stopped_request = false;
> > 	}
> > 
> > But it's not the worst intermediate code, so I'm not opposed to going that
> > route.
> > 
> 
> What about putting this change after patch 7. Then you could take a local
> copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and so avoid
> bleeding the flag that way?

But to preserve the current behavior of setting PVCLOCK_GUEST_STOPPED for all
clocks, processing pvclock_set_guest_stopped_request needs to be moved out of
kvm_setup_guest_pvclock() before said helper can make a copy of the reference.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-21 16:58   ` Paul Durrant
@ 2025-01-21 18:45     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:45 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> > reference copy may not be up-to-date, i.e. may yield a false positive of
> > sorts.  In the unlikely scenario that the guest is starting a Xen timer
> > and has used a Xen pvclock in the past, but has since but turned it "off",
> > then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> > if and only if at least pvclock is enabled.
> > 
> > Furthermore, vcpu->arch.hv_clock is currently used by three different
> > pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
> > guest would ever enable multiple pvclocks, effectively sharing KVM's
> > reference clock could yield very weird behavior.  Using the guest's active
> > Xen pvclock instead of KVM's reference will allow dropping KVM's
> > reference copy.
> > 
> > Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> > Cc: Paul Durrant <pdurrant@amazon.com>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
> >   1 file changed, 53 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index a909b817b9c0..b82c28223585 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
> >   	return HRTIMER_NORESTART;
> >   }
> > +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> > +				 struct pvclock_vcpu_time_info *hv_clock,
> > +				 struct gfn_to_pfn_cache *gpc,
> > +				 unsigned int offset)
> > +{
> > +	struct pvclock_vcpu_time_info *guest_hv_clock;
> > +	unsigned long flags;
> > +	int r;
> > +
> > +	read_lock_irqsave(&gpc->lock, flags);
> > +	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> > +		read_unlock_irqrestore(&gpc->lock, flags);
> > +
> > +		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> > +		if (r)
> > +			return r;
> > +
> > +		read_lock_irqsave(&gpc->lock, flags);
> > +	}
> > +
> 
> I guess I must be missing something subtle... What is setting guest_hv_clock
> to point at something meaningful before this line?

Nope, you're not missing anything, this code is completely broken.  As pointed
out by the kernel test bot, the caller is also busted, because the "xen" pointer
is never initialied.

	struct kvm_vcpu_xen *xen;

	...

	do {
		...

		if (xen->vcpu_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
						offsetof(struct compat_vcpu_info, time));
		else if (xen->vcpu_time_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
		if (r)
			break;
	}


I suspect the selftest passes because the @gpc passed to xen_get_guest_pvclock()
is garbage, which likely results in kvm_gpc_refresh() failing, and so KVM falls
backs to the less precise method:

	if (r) {
		/*
		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
		 *
		 * Also if the guest PV clock hasn't been set up yet, as is
		 * likely to be the case during migration when the vCPU has
		 * not been run yet. It would be possible to calculate the
		 * scaling factors properly in that case but there's not much
		 * point in doing so. The get_kvmclock_ns() drift accumulates
		 * over time, so it's OK to use it at startup. Besides, on
		 * migration there's going to be a little bit of skew in the
		 * precise moment at which timers fire anyway. Often they'll
		 * be in the "past" by the time the VM is running again after
		 * migration.
		 */
		guest_now = get_kvmclock_ns(vcpu->kvm);
		kernel_now = ktime_get();
	}

Ugh.  And the reason my build tests didn't catch this is because the only config
I test with KVM_XEN=y also has KASAN=y, which is incompatible with KVM_ERROR=y
(unless the global WERROR=y is enabled).

Time to punt KASAN=y to it's own Kconfig I guess...

I'll verify the happy path is actually being tested before posting v2.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-01-21 18:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
2025-01-21 16:01   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
2025-01-21 16:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
2025-01-21 16:05   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
2025-01-21 16:42   ` Paul Durrant
2025-01-21 17:09     ` Sean Christopherson
2025-01-21 17:15       ` Paul Durrant
2025-01-21 18:32         ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
2025-01-21 16:54   ` Paul Durrant
2025-01-21 17:11     ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
2025-01-21 16:58   ` Paul Durrant
2025-01-21 18:45     ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
2025-01-21 17:00   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
2025-01-21 17:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
2025-01-20 14:49   ` Vitaly Kuznetsov
2025-01-21 15:44     ` Sean Christopherson
2025-01-21 15:59       ` Paul Durrant
2025-01-21 17:16         ` David Woodhouse
2025-01-21 17:30           ` Paul Durrant
2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
2025-01-21 17:05   ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).