[PATCH 00/10] KVM: x86: pvclock fixes and cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] KVM: x86: pvclock fixes and cleanups
@ 2025-01-18  0:55 Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Fix a lockdep splat in KVM's suspend notifier by simply removing a
spurious kvm->lock acquisition related to kvmclock, and then try to
wrangle KVM's pvclock handling into something approaching sanity (I
made the mistake of looking at how KVM handled PVCLOCK_GUEST_STOPPED).

David,

I didn't look too closely to see how this interacts with your overhaul of
the pvclock madness[*].  When I first started poking at this, I didn't
realize vcpu->arch.hv_lock had tendrils in so many places.  Please holler
if you want me to drop the vcpu->arch.hv_lock changes and/or tweak
something to make it play nice with your series.

The Xen changes are *very* lightly tested, so I definitely won't apply
the potentially problematic changes, i.e. anything past "Don't bleed
PVCLOCK_GUEST_STOPPED across PV clocks", until I get a thumbs up from
you and/or Paul.

[*] https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org

Sean Christopherson (10):
  KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend
    notifier
  KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV
    clock
  KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  KVM: x86/xen: Use guest's copy of pvclock when starting timer
  KVM: x86: Pass reference pvclock as a param to
    kvm_setup_guest_pvclock()
  KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock
    update)
  KVM: x86: Override TSC_STABLE flag for Xen PV clocks in
    kvm_guest_time_update()

 arch/x86/include/asm/kvm_host.h |   3 +-
 arch/x86/kvm/x86.c              | 115 ++++++++++++++++----------------
 arch/x86/kvm/xen.c              |  62 +++++++++++++++--
 3 files changed, 114 insertions(+), 66 deletions(-)

base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
-- 
2.48.0.rc2.279.g1de40edade-goog

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:01   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
don't take kvm->lock as doing so can trigger a largely theoretical
deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
any way (pv_time.active and pvclock_set_guest_stopped_request are
protected by vcpu->mutex, not kvm->lock).

Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d9a16fd4d3..26e18c9b0375 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6907,7 +6907,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 	unsigned long i;
 	int ret = 0;
 
-	mutex_lock(&kvm->lock);
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (!vcpu->arch.pv_time.active)
 			continue;
@@ -6919,7 +6918,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 			break;
 		}
 	}
-	mutex_unlock(&kvm->lock);
 
 	return ret ? NOTIFY_BAD : NOTIFY_DONE;
 }
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:03   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
is active.  I.e. barring some bizarre edge case that shouldn't be treated
as an error in the first place, kvm_arch_suspend_notifier() can't fail.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26e18c9b0375..ef21158ec6b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6905,21 +6905,15 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
 {
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
-	int ret = 0;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!vcpu->arch.pv_time.active)
-			continue;
+	/*
+	 * Ignore the return, marking the guest paused only "fails" if the vCPU
+	 * isn't using kvmclock; continuing on is correct and desirable.
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		(void)kvm_set_guest_paused(vcpu);
 
-		ret = kvm_set_guest_paused(vcpu);
-		if (ret) {
-			kvm_err("Failed to pause guest VCPU%d: %d\n",
-				vcpu->vcpu_id, ret);
-			break;
-		}
-	}
-
-	return ret ? NOTIFY_BAD : NOTIFY_DONE;
+	return NOTIFY_DONE;
 }
 
 int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state)
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:05   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
is immediately shoved into the per-vCPU "cache", i.e. the local variable
serves no purpose.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef21158ec6b2..d8ee37dd2b57 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3178,7 +3178,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	struct kvm_arch *ka = &v->kvm->arch;
 	s64 kernel_ns;
 	u64 tsc_timestamp, host_tsc;
-	u8 pvclock_flags;
 	bool use_master_clock;
 #ifdef CONFIG_KVM_XEN
 	/*
@@ -3261,11 +3260,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	vcpu->last_guest_tsc = tsc_timestamp;
 
 	/* If the host uses TSC clocksource, then it is stable */
-	pvclock_flags = 0;
+	vcpu->hv_clock.flags = 0;
 	if (use_master_clock)
-		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
-
-	vcpu->hv_clock.flags = pvclock_flags;
+		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
 	if (vcpu->pv_time.active)
 		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:42   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Handle "guest stopped" propagation only for kvmclock, as the flag is set
if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
bizarre behavior for the guest.

Simply restrict the flag to kvmclock, e.g. instead of trying to handle
Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
added during a refactoring, and while Xen proper defines
XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
support the flag.

Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
is active to preserve the original behavior, i.e. keep the flag pending
if kvmclock happens to be disabled when KVM processes the initial request.

Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d8ee37dd2b57..3c4d210e8a9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
 	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
 
-	if (vcpu->pvclock_set_guest_stopped_request) {
-		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
-		vcpu->pvclock_set_guest_stopped_request = false;
-	}
-
 	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
 
 	if (force_tsc_unstable)
@@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	if (use_master_clock)
 		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
-	if (vcpu->pv_time.active)
+	if (vcpu->pv_time.active) {
+		/*
+		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
+		 * historic behavior is to only process the request if kvmclock
+		 * is active/enabled.
+		 */
+		if (vcpu->pvclock_set_guest_stopped_request) {
+			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+			vcpu->pvclock_set_guest_stopped_request = false;
+		}
 		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+
+		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+	}
+
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 16:54   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating a specific PV clock, make a full copy of KVM's reference
copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
clock.

Using a local copy of the pvclock structure also sets the stage for
eliminating the per-vCPU copy/cache (only the TSC frequency information
actually "needs" to be cached/persisted).

Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c4d210e8a9e..5f3ad13a8ac7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 {
 	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info *guest_hv_clock;
+	struct pvclock_vcpu_time_info hv_clock;
 	unsigned long flags;
 
+	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+
 	read_lock_irqsave(&gpc->lock, flags);
 	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
 		read_unlock_irqrestore(&gpc->lock, flags);
@@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	 * it is consistent.
 	 */
 
-	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
+	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
 	smp_wmb();
 
 	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
-	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
 
-	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
+	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
 
 	if (force_tsc_unstable)
 		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
 
 	smp_wmb();
 
-	guest_hv_clock->version = ++vcpu->hv_clock.version;
+	guest_hv_clock->version = ++hv_clock.version;
 
 	kvm_gpc_mark_dirty_in_slot(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
-	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
+	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
 }
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-19  3:00   ` kernel test robot
  2025-01-21 16:58   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
reference copy may not be up-to-date, i.e. may yield a false positive of
sorts.  In the unlikely scenario that the guest is starting a Xen timer
and has used a Xen pvclock in the past, but has since but turned it "off",
then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
if and only if at least pvclock is enabled.

Furthermore, vcpu->arch.hv_clock is currently used by three different
pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
guest would ever enable multiple pvclocks, effectively sharing KVM's
reference clock could yield very weird behavior.  Using the guest's active
Xen pvclock instead of KVM's reference will allow dropping KVM's
reference copy.

Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index a909b817b9c0..b82c28223585 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
+				 struct pvclock_vcpu_time_info *hv_clock,
+				 struct gfn_to_pfn_cache *gpc,
+				 unsigned int offset)
+{
+	struct pvclock_vcpu_time_info *guest_hv_clock;
+	unsigned long flags;
+	int r;
+
+	read_lock_irqsave(&gpc->lock, flags);
+	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
+		read_unlock_irqrestore(&gpc->lock, flags);
+
+		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
+		if (r)
+			return r;
+
+		read_lock_irqsave(&gpc->lock, flags);
+	}
+
+	memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
+	read_unlock_irqrestore(&gpc->lock, flags);
+
+	/*
+	 * Sanity check TSC shift+multiplier to verify the guest's view of time
+	 * is more or less consistent.
+	 */
+	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
+	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+		return -EINVAL;
+	return 0;
+}
+
 static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 				bool linux_wa)
 {
+	struct kvm_vcpu_xen *xen;
 	int64_t kernel_now, delta;
 	uint64_t guest_now;
+	int r = -EOPNOTSUPP;
 
 	/*
 	 * The guest provides the requested timeout in absolute nanoseconds
@@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 	 * the absolute CLOCK_MONOTONIC time at which the timer should
 	 * fire.
 	 */
-	if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
-	    static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+	do {
+		struct pvclock_vcpu_time_info hv_clock;
 		uint64_t host_tsc, guest_tsc;
 
+		if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
+		    !vcpu->kvm->arch.use_master_clock)
+			break;
+
+		if (xen->vcpu_info_cache.active)
+			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
+						offsetof(struct compat_vcpu_info, time));
+		else if (xen->vcpu_time_info_cache.active)
+			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
+		if (r)
+			break;
+
 		if (!IS_ENABLED(CONFIG_64BIT) ||
 		    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
 			/*
@@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
 
 		/* Calculate the guest kvmclock as the guest would do it. */
 		guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
-		guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
-						  guest_tsc);
-	} else {
+		guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
+	} while (0);
+
+	if (r) {
 		/*
 		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
 		 *
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:00   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Pass the reference pvclock structure that's used to setup each individual
pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
toward removing kvm_vcpu_arch.hv_clock.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f3ad13a8ac7..06d27b3cc207 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3116,17 +3116,17 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	return data.clock;
 }
 
-static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
+static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
+				    struct kvm_vcpu *vcpu,
 				    struct gfn_to_pfn_cache *gpc,
 				    unsigned int offset,
 				    bool force_tsc_unstable)
 {
-	struct kvm_vcpu_arch *vcpu = &v->arch;
 	struct pvclock_vcpu_time_info *guest_hv_clock;
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned long flags;
 
-	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+	memcpy(&hv_clock, ref_hv_clock, sizeof(hv_clock));
 
 	read_lock_irqsave(&gpc->lock, flags);
 	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
@@ -3165,7 +3165,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 	kvm_gpc_mark_dirty_in_slot(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
-	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
+	trace_kvm_pvclock_update(vcpu->vcpu_id, &hv_clock);
 }
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
@@ -3272,18 +3272,18 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
 
 		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
-		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
 					offsetof(struct compat_vcpu_info, time),
 					xen_pvclock_tsc_unstable);
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0,
+		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
 	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:03   ` Paul Durrant
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

Remove the per-vCPU "cache" of the reference pvclock and instead cache
only the TSC shift+multiplier.  All other fields in pvclock are fully
recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.

In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
fields makes it easier to see that those fields are tied to hw_tsc_khz
(they exist to avoid having to do expensive math in the common case).
And conversely, not tracking the other fields makes it easier to see that
things like the version number are pulled from the guest's copy, not from
KVM's reference.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/x86.c              | 27 +++++++++++++++------------
 arch/x86/kvm/xen.c              |  8 ++++----
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5193c3dfbce1..f26105654ec4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -900,7 +900,8 @@ struct kvm_vcpu_arch {
 	int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
 
 	gpa_t time;
-	struct pvclock_vcpu_time_info hv_clock;
+	u8  pvclock_tsc_shift;
+	u32 pvclock_tsc_mul;
 	unsigned int hw_tsc_khz;
 	struct gfn_to_pfn_cache pv_time;
 	/* set guest stopped flag in pvclock flags field */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 06d27b3cc207..9eabd70891dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3170,6 +3170,7 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
+	struct pvclock_vcpu_time_info hv_clock = {};
 	unsigned long flags, tgt_tsc_khz;
 	unsigned seq;
 	struct kvm_vcpu_arch *vcpu = &v->arch;
@@ -3247,20 +3248,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 
 	if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
 		kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
-				   &vcpu->hv_clock.tsc_shift,
-				   &vcpu->hv_clock.tsc_to_system_mul);
+				   &vcpu->pvclock_tsc_shift,
+				   &vcpu->pvclock_tsc_mul);
 		vcpu->hw_tsc_khz = tgt_tsc_khz;
 		kvm_xen_update_tsc_info(v);
 	}
 
-	vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
-	vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
+	hv_clock.tsc_shift = vcpu->pvclock_tsc_shift;
+	hv_clock.tsc_to_system_mul = vcpu->pvclock_tsc_mul;
+	hv_clock.tsc_timestamp = tsc_timestamp;
+	hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
 	vcpu->last_guest_tsc = tsc_timestamp;
 
 	/* If the host uses TSC clocksource, then it is stable */
-	vcpu->hv_clock.flags = 0;
+	hv_clock.flags = 0;
 	if (use_master_clock)
-		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
+		hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
 
 	if (vcpu->pv_time.active) {
 		/*
@@ -3269,24 +3272,24 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		 * is active/enabled.
 		 */
 		if (vcpu->pvclock_set_guest_stopped_request) {
-			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+			hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
 
-		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
 					offsetof(struct compat_vcpu_info, time),
 					xen_pvclock_tsc_unstable);
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
-	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 	return 0;
 }
 
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b82c28223585..7c6e4172527a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -177,8 +177,8 @@ static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
 	 * Sanity check TSC shift+multiplier to verify the guest's view of time
 	 * is more or less consistent.
 	 */
-	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
-	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+	if (hv_clock->tsc_shift != vcpu->arch.pvclock_tsc_shift ||
+	    hv_clock->tsc_to_system_mul != vcpu->arch.pvclock_tsc_mul)
 		return -EINVAL;
 	return 0;
 }
@@ -2309,8 +2309,8 @@ void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu)
 
 	entry = kvm_find_cpuid_entry_index(vcpu, function, 1);
 	if (entry) {
-		entry->ecx = vcpu->arch.hv_clock.tsc_to_system_mul;
-		entry->edx = vcpu->arch.hv_clock.tsc_shift;
+		entry->ecx = vcpu->arch.pvclock_tsc_mul;
+		entry->edx = vcpu->arch.pvclock_tsc_shift;
 	}
 
 	entry = kvm_find_cpuid_entry_index(vcpu, function, 2);
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-20 14:49   ` Vitaly Kuznetsov
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating paravirtual clocks, setup the Hyper-V TSC page before
Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eabd70891dd..c68e7f7ba69d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
 
+	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
+
 #ifdef CONFIG_KVM_XEN
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
@@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
 					xen_pvclock_tsc_unstable);
 #endif
-	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 	return 0;
 }
 
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
  2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-18  0:55 ` Sean Christopherson
  2025-01-21 17:05   ` Paul Durrant
  9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
in the flags of the reference pvclock structure.  Expand the comment to
(hopefully) make it obvious that Xen clocks need to be processed after all
clocks that care about the TSC_STABLE flag.

No functional change intended.

Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c68e7f7ba69d..065b349a0218 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3119,8 +3119,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 				    struct kvm_vcpu *vcpu,
 				    struct gfn_to_pfn_cache *gpc,
-				    unsigned int offset,
-				    bool force_tsc_unstable)
+				    unsigned int offset)
 {
 	struct pvclock_vcpu_time_info *guest_hv_clock;
 	struct pvclock_vcpu_time_info hv_clock;
@@ -3155,9 +3154,6 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
 
 	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
 
-	if (force_tsc_unstable)
-		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
-
 	smp_wmb();
 
 	guest_hv_clock->version = ++hv_clock.version;
@@ -3178,16 +3174,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	s64 kernel_ns;
 	u64 tsc_timestamp, host_tsc;
 	bool use_master_clock;
-#ifdef CONFIG_KVM_XEN
-	/*
-	 * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
-	 * explicitly told to use TSC as its clocksource Xen will not set this bit.
-	 * This default behaviour led to bugs in some guest kernels which cause
-	 * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
-	 */
-	bool xen_pvclock_tsc_unstable =
-		ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
-#endif
 
 	kernel_ns = 0;
 	host_tsc = 0;
@@ -3275,7 +3261,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 			hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
 			vcpu->pvclock_set_guest_stopped_request = false;
 		}
-		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0);
 
 		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
 	}
@@ -3283,13 +3269,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
 
 #ifdef CONFIG_KVM_XEN
+	/*
+	 * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
+	 * explicitly told to use TSC as its clocksource Xen will not set this bit.
+	 * This default behaviour led to bugs in some guest kernels which cause
+	 * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
+	 *
+	 * Note!  Clear TSC_STABLE only for Xen clocks, i.e. the order matters!
+	 */
+	if (ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE)
+		hv_clock.flags &= ~PVCLOCK_TSC_STABLE_BIT;
+
 	if (vcpu->xen.vcpu_info_cache.active)
 		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
-					offsetof(struct compat_vcpu_info, time),
-					xen_pvclock_tsc_unstable);
+					offsetof(struct compat_vcpu_info, time));
 	if (vcpu->xen.vcpu_time_info_cache.active)
-		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
-					xen_pvclock_tsc_unstable);
+		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0);
 #endif
 	return 0;
 }
-- 
2.48.0.rc2.279.g1de40edade-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-19  3:00   ` kernel test robot
  2025-01-21 16:58   ` Paul Durrant
  1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2025-01-19  3:00 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: llvm, oe-kbuild-all

Hi Sean,

kernel test robot noticed the following build warnings:

[auto build test WARNING on eb723766b1030a23c38adf2348b7c3d1409d11f0]

url:    https://github.com/intel-lab-lkp/linux/commits/Sean-Christopherson/KVM-x86-Don-t-take-kvm-lock-when-iterating-over-vCPUs-in-suspend-notifier/20250118-085939
base:   eb723766b1030a23c38adf2348b7c3d1409d11f0
patch link:    https://lore.kernel.org/r/20250118005552.2626804-7-seanjc%40google.com
patch subject: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20250119/202501191041.ew4CA984-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250119/202501191041.ew4CA984-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501191041.ew4CA984-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/x86/kvm/xen.c:10:
   In file included from arch/x86/kvm/x86.h:5:
   In file included from include/linux/kvm_host.h:16:
   In file included from include/linux/mm.h:2223:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> arch/x86/kvm/xen.c:219:7: warning: variable 'xen' is uninitialized when used here [-Wuninitialized]
     219 |                 if (xen->vcpu_info_cache.active)
         |                     ^~~
   arch/x86/kvm/xen.c:189:26: note: initialize the variable 'xen' to silence this warning
     189 |         struct kvm_vcpu_xen *xen;
         |                                 ^
         |                                  = NULL
   5 warnings generated.


vim +/xen +219 arch/x86/kvm/xen.c

   185	
   186	static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
   187					bool linux_wa)
   188	{
   189		struct kvm_vcpu_xen *xen;
   190		int64_t kernel_now, delta;
   191		uint64_t guest_now;
   192		int r = -EOPNOTSUPP;
   193	
   194		/*
   195		 * The guest provides the requested timeout in absolute nanoseconds
   196		 * of the KVM clock — as *it* sees it, based on the scaled TSC and
   197		 * the pvclock information provided by KVM.
   198		 *
   199		 * The kernel doesn't support hrtimers based on CLOCK_MONOTONIC_RAW
   200		 * so use CLOCK_MONOTONIC. In the timescales covered by timers, the
   201		 * difference won't matter much as there is no cumulative effect.
   202		 *
   203		 * Calculate the time for some arbitrary point in time around "now"
   204		 * in terms of both kvmclock and CLOCK_MONOTONIC. Calculate the
   205		 * delta between the kvmclock "now" value and the guest's requested
   206		 * timeout, apply the "Linux workaround" described below, and add
   207		 * the resulting delta to the CLOCK_MONOTONIC "now" value, to get
   208		 * the absolute CLOCK_MONOTONIC time at which the timer should
   209		 * fire.
   210		 */
   211		do {
   212			struct pvclock_vcpu_time_info hv_clock;
   213			uint64_t host_tsc, guest_tsc;
   214	
   215			if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
   216			    !vcpu->kvm->arch.use_master_clock)
   217				break;
   218	
 > 219			if (xen->vcpu_info_cache.active)
   220				r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
   221							offsetof(struct compat_vcpu_info, time));
   222			else if (xen->vcpu_time_info_cache.active)
   223				r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
   224			if (r)
   225				break;
   226	
   227			if (!IS_ENABLED(CONFIG_64BIT) ||
   228			    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
   229				/*
   230				 * Don't fall back to get_kvmclock_ns() because it's
   231				 * broken; it has a systemic error in its results
   232				 * because it scales directly from host TSC to
   233				 * nanoseconds, and doesn't scale first to guest TSC
   234				 * and *then* to nanoseconds as the guest does.
   235				 *
   236				 * There is a small error introduced here because time
   237				 * continues to elapse between the ktime_get() and the
   238				 * subsequent rdtsc(). But not the systemic drift due
   239				 * to get_kvmclock_ns().
   240				 */
   241				kernel_now = ktime_get(); /* This is CLOCK_MONOTONIC */
   242				host_tsc = rdtsc();
   243			}
   244	
   245			/* Calculate the guest kvmclock as the guest would do it. */
   246			guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
   247			guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
   248		} while (0);
   249	
   250		if (r) {
   251			/*
   252			 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
   253			 *
   254			 * Also if the guest PV clock hasn't been set up yet, as is
   255			 * likely to be the case during migration when the vCPU has
   256			 * not been run yet. It would be possible to calculate the
   257			 * scaling factors properly in that case but there's not much
   258			 * point in doing so. The get_kvmclock_ns() drift accumulates
   259			 * over time, so it's OK to use it at startup. Besides, on
   260			 * migration there's going to be a little bit of skew in the
   261			 * precise moment at which timers fire anyway. Often they'll
   262			 * be in the "past" by the time the VM is running again after
   263			 * migration.
   264			 */
   265			guest_now = get_kvmclock_ns(vcpu->kvm);
   266			kernel_now = ktime_get();
   267		}
   268	
   269		delta = guest_abs - guest_now;
   270	
   271		/*
   272		 * Xen has a 'Linux workaround' in do_set_timer_op() which checks for
   273		 * negative absolute timeout values (caused by integer overflow), and
   274		 * for values about 13 days in the future (2^50ns) which would be
   275		 * caused by jiffies overflow. For those cases, Xen sets the timeout
   276		 * 100ms in the future (not *too* soon, since if a guest really did
   277		 * set a long timeout on purpose we don't want to keep churning CPU
   278		 * time by waking it up).  Emulate Xen's workaround when starting the
   279		 * timer in response to __HYPERVISOR_set_timer_op.
   280		 */
   281		if (linux_wa &&
   282		    unlikely((int64_t)guest_abs < 0 ||
   283			     (delta > 0 && (uint32_t) (delta >> 50) != 0))) {
   284			delta = 100 * NSEC_PER_MSEC;
   285			guest_abs = guest_now + delta;
   286		}
   287	
   288		/*
   289		 * Avoid races with the old timer firing. Checking timer_expires
   290		 * to avoid calling hrtimer_cancel() will only have false positives
   291		 * so is fine.
   292		 */
   293		if (vcpu->arch.xen.timer_expires)
   294			hrtimer_cancel(&vcpu->arch.xen.timer);
   295	
   296		atomic_set(&vcpu->arch.xen.timer_pending, 0);
   297		vcpu->arch.xen.timer_expires = guest_abs;
   298	
   299		if (delta <= 0)
   300			xen_timer_callback(&vcpu->arch.xen.timer);
   301		else
   302			hrtimer_start(&vcpu->arch.xen.timer,
   303				      ktime_add_ns(kernel_now, delta),
   304				      HRTIMER_MODE_ABS_HARD);
   305	}
   306	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-20 14:49   ` Vitaly Kuznetsov
  2025-01-21 15:44     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Vitaly Kuznetsov @ 2025-01-20 14:49 UTC (permalink / raw)
  To: Sean Christopherson, Sean Christopherson, Paolo Bonzini,
	David Woodhouse, Paul Durrant
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse

Sean Christopherson <seanjc@google.com> writes:

> When updating paravirtual clocks, setup the Hyper-V TSC page before
> Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
> of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9eabd70891dd..c68e7f7ba69d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>  	}
>  
> +	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> +
>  #ifdef CONFIG_KVM_XEN
>  	if (vcpu->xen.vcpu_info_cache.active)
>  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
>  					xen_pvclock_tsc_unstable);
>  #endif
> -	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
>  	return 0;
>  }

"No functional change detected".

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

(What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
*some* of the PV clocks as stable and some as unstable to the same
guest, i.e. if it would make sense to disable Hyper-V TSC page when
KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too. I don't know if anyone
combines Xen and Hyper-V emulation capabilities for the same guest on
KVM though.)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-20 14:49   ` Vitaly Kuznetsov
@ 2025-01-21 15:44     ` Sean Christopherson
  2025-01-21 15:59       ` Paul Durrant
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 15:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, David Woodhouse, Paul Durrant, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse

On Mon, Jan 20, 2025, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > When updating paravirtual clocks, setup the Hyper-V TSC page before
> > Xen PV clocks.  This will allow dropping xen_pvclock_tsc_unstable in favor
> > of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
> >
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/x86.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9eabd70891dd..c68e7f7ba69d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >  		hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> >  	}
> >  
> > +	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> > +
> >  #ifdef CONFIG_KVM_XEN
> >  	if (vcpu->xen.vcpu_info_cache.active)
> >  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> > @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >  		kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
> >  					xen_pvclock_tsc_unstable);
> >  #endif
> > -	kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> >  	return 0;
> >  }
> 
> "No functional change detected".
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> (What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
> *some* of the PV clocks as stable and some as unstable to the same
> guest, i.e. if it would make sense to disable Hyper-V TSC page when
> KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too.

I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
PV clock is truly unstable, it's that some guests get tripped up by the STABLE
flag.  A guest that can't handle the STABLE flag has bigger problems than the
existence of a completely unrelated clock that is implied to be stable.

> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> the same guest on KVM though.)

That someone would have to be quite "brave" :-D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 15:44     ` Sean Christopherson
@ 2025-01-21 15:59       ` Paul Durrant
  2025-01-21 17:16         ` David Woodhouse
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 15:59 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse

On 21/01/2025 15:44, Sean Christopherson wrote:
[snip]
> 
> I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> flag.  A guest that can't handle the STABLE flag has bigger problems than the
> existence of a completely unrelated clock that is implied to be stable.
> 

Agreed.

>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>> the same guest on KVM though.)
> 
> That someone would have to be quite "brave" :-D

Maybe :-)

Reviewed-by: Paul Durrant <paul@xen.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
  2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-21 16:01   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:01 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
> don't take kvm->lock as doing so can trigger a largely theoretical
> deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
> holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
> any way (pv_time.active and pvclock_set_guest_stopped_request are
> protected by vcpu->mutex, not kvm->lock).
> 
> Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
> Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
  2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-21 16:03   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
> SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
> isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
> is active.  I.e. barring some bizarre edge case that shouldn't be treated
> as an error in the first place, kvm_arch_suspend_notifier() can't fail.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 20 +++++++-------------
>   1 file changed, 7 insertions(+), 13 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
  2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 16:05   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:05 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
> is immediately shoved into the per-vCPU "cache", i.e. the local variable
> serves no purpose.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-21 16:42   ` Paul Durrant
  2025-01-21 17:09     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Handle "guest stopped" propagation only for kvmclock, as the flag is set
> if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
> if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
> bizarre behavior for the guest.
> 
> Simply restrict the flag to kvmclock, e.g. instead of trying to handle
> Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
> added during a refactoring, and while Xen proper defines
> XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
> support the flag.

Indeed. I can find no consumers of the flag.

> 
> Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
> is active to preserve the original behavior, i.e. keep the flag pending
> if kvmclock happens to be disabled when KVM processes the initial request.
> 
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 20 ++++++++++++++------
>   1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d8ee37dd2b57..3c4d210e8a9e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>   	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>   
> -	if (vcpu->pvclock_set_guest_stopped_request) {
> -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> -		vcpu->pvclock_set_guest_stopped_request = false;
> -	}
> -
>   	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>   
>   	if (force_tsc_unstable)
> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   	if (use_master_clock)
>   		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>   
> -	if (vcpu->pv_time.active)
> +	if (vcpu->pv_time.active) {
> +		/*
> +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> +		 * historic behavior is to only process the request if kvmclock
> +		 * is active/enabled.
> +		 */
> +		if (vcpu->pvclock_set_guest_stopped_request) {
> +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> +			vcpu->pvclock_set_guest_stopped_request = false;
> +		}
>   		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> +
> +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;

Is this intentional? The line above your change in 
kvm_setup_guest_pvclock() clearly keeps the flag enabled if it already 
set and, without this patch, I don't see anything clearing it.

> +	}
> +
>   #ifdef CONFIG_KVM_XEN
>   	if (vcpu->xen.vcpu_info_cache.active)
>   		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-21 16:54   ` Paul Durrant
  2025-01-21 17:11     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:54 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating a specific PV clock, make a full copy of KVM's reference
> copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> clock.

... but the line I queried in the previous patch squashes the flag 
before the Xen PV clock is set up, so no bleed?

> 
> Using a local copy of the pvclock structure also sets the stage for
> eliminating the per-vCPU copy/cache (only the TSC frequency information
> actually "needs" to be cached/persisted).
> 
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c4d210e8a9e..5f3ad13a8ac7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   {
>   	struct kvm_vcpu_arch *vcpu = &v->arch;
>   	struct pvclock_vcpu_time_info *guest_hv_clock;
> +	struct pvclock_vcpu_time_info hv_clock;
>   	unsigned long flags;
>   
> +	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> +
>   	read_lock_irqsave(&gpc->lock, flags);
>   	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
>   		read_unlock_irqrestore(&gpc->lock, flags);
> @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>   	 * it is consistent.
>   	 */
>   
> -	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> +	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
>   	smp_wmb();
>   
>   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> -	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> +	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>   
> -	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> +	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
>   
>   	if (force_tsc_unstable)
>   		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
>   
>   	smp_wmb();
>   
> -	guest_hv_clock->version = ++vcpu->hv_clock.version;
> +	guest_hv_clock->version = ++hv_clock.version;
>   
>   	kvm_gpc_mark_dirty_in_slot(gpc);
>   	read_unlock_irqrestore(&gpc->lock, flags);
>   
> -	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> +	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
>   }
>   
>   static int kvm_guest_time_update(struct kvm_vcpu *v)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
  2025-01-19  3:00   ` kernel test robot
@ 2025-01-21 16:58   ` Paul Durrant
  2025-01-21 18:45     ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> reference copy may not be up-to-date, i.e. may yield a false positive of
> sorts.  In the unlikely scenario that the guest is starting a Xen timer
> and has used a Xen pvclock in the past, but has since but turned it "off",
> then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> if and only if at least pvclock is enabled.
> 
> Furthermore, vcpu->arch.hv_clock is currently used by three different
> pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
> guest would ever enable multiple pvclocks, effectively sharing KVM's
> reference clock could yield very weird behavior.  Using the guest's active
> Xen pvclock instead of KVM's reference will allow dropping KVM's
> reference copy.
> 
> Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index a909b817b9c0..b82c28223585 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
>   	return HRTIMER_NORESTART;
>   }
>   
> +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> +				 struct pvclock_vcpu_time_info *hv_clock,
> +				 struct gfn_to_pfn_cache *gpc,
> +				 unsigned int offset)
> +{
> +	struct pvclock_vcpu_time_info *guest_hv_clock;
> +	unsigned long flags;
> +	int r;
> +
> +	read_lock_irqsave(&gpc->lock, flags);
> +	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> +		read_unlock_irqrestore(&gpc->lock, flags);
> +
> +		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> +		if (r)
> +			return r;
> +
> +		read_lock_irqsave(&gpc->lock, flags);
> +	}
> +

I guess I must be missing something subtle... What is setting 
guest_hv_clock to point at something meaningful before this line?

> +	memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
> +	read_unlock_irqrestore(&gpc->lock, flags);
> +
> +	/*
> +	 * Sanity check TSC shift+multiplier to verify the guest's view of time
> +	 * is more or less consistent.
> +	 */
> +	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
> +	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
> +		return -EINVAL;
> +	return 0;
> +}
> +
>   static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   				bool linux_wa)
>   {
> +	struct kvm_vcpu_xen *xen;
>   	int64_t kernel_now, delta;
>   	uint64_t guest_now;
> +	int r = -EOPNOTSUPP;
>   
>   	/*
>   	 * The guest provides the requested timeout in absolute nanoseconds
> @@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   	 * the absolute CLOCK_MONOTONIC time at which the timer should
>   	 * fire.
>   	 */
> -	if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
> -	    static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +	do {
> +		struct pvclock_vcpu_time_info hv_clock;
>   		uint64_t host_tsc, guest_tsc;
>   
> +		if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
> +		    !vcpu->kvm->arch.use_master_clock)
> +			break;
> +
> +		if (xen->vcpu_info_cache.active)
> +			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
> +						offsetof(struct compat_vcpu_info, time));
> +		else if (xen->vcpu_time_info_cache.active)
> +			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
> +		if (r)
> +			break;
> +
>   		if (!IS_ENABLED(CONFIG_64BIT) ||
>   		    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
>   			/*
> @@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>   
>   		/* Calculate the guest kvmclock as the guest would do it. */
>   		guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
> -		guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
> -						  guest_tsc);
> -	} else {
> +		guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
> +	} while (0);
> +
> +	if (r) {
>   		/*
>   		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
>   		 *


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
  2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-21 17:00   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:00 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Pass the reference pvclock structure that's used to setup each individual
> pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
> toward removing kvm_vcpu_arch.hv_clock.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
  2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-21 17:03   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:03 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> Remove the per-vCPU "cache" of the reference pvclock and instead cache
> only the TSC shift+multiplier.  All other fields in pvclock are fully
> recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.
> 
> In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
> fields makes it easier to see that those fields are tied to hw_tsc_khz
> (they exist to avoid having to do expensive math in the common case).
> And conversely, not tracking the other fields makes it easier to see that
> things like the version number are pulled from the guest's copy, not from
> KVM's reference.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  3 ++-
>   arch/x86/kvm/x86.c              | 27 +++++++++++++++------------
>   arch/x86/kvm/xen.c              |  8 ++++----
>   3 files changed, 21 insertions(+), 17 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
  2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 17:05   ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:05 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, David Woodhouse
  Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
	David Woodhouse, Vitaly Kuznetsov

On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
> the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
> in the flags of the reference pvclock structure.  Expand the comment to
> (hopefully) make it obvious that Xen clocks need to be processed after all
> clocks that care about the TSC_STABLE flag.
> 
> No functional change intended.
> 
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 16:42   ` Paul Durrant
@ 2025-01-21 17:09     ` Sean Christopherson
  2025-01-21 17:15       ` Paul Durrant
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:09 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> > ---
> >   arch/x86/kvm/x86.c | 20 ++++++++++++++------
> >   1 file changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d8ee37dd2b57..3c4d210e8a9e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> >   	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > -	if (vcpu->pvclock_set_guest_stopped_request) {
> > -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > -		vcpu->pvclock_set_guest_stopped_request = false;
> > -	}
> > -
> >   	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> >   	if (force_tsc_unstable)
> > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >   	if (use_master_clock)
> >   		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > -	if (vcpu->pv_time.active)
> > +	if (vcpu->pv_time.active) {
> > +		/*
> > +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > +		 * historic behavior is to only process the request if kvmclock
> > +		 * is active/enabled.
> > +		 */
> > +		if (vcpu->pvclock_set_guest_stopped_request) {
> > +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > +			vcpu->pvclock_set_guest_stopped_request = false;
> > +		}
> >   		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > +
> > +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> 
> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> clearly keeps the flag enabled if it already set and, without this patch, I
> don't see anything clearing it.

Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
only for kvmclock).

Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
applied to the first active clock (kvmclock).

The only way I can think of to fully isolate the changes would be to split this
into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".

4a would be quite ugly, because to avoid introducing a functional change, it
would need to be:

	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
	    vcpu->xen.vcpu_time_info_cache.active) {
		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
		vcpu->pvclock_set_guest_stopped_request = false;
	}

But it's not the worst intermediate code, so I'm not opposed to going that
route.

> > +	}
> > +
> >   #ifdef CONFIG_KVM_XEN
> >   	if (vcpu->xen.vcpu_info_cache.active)
> >   		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
  2025-01-21 16:54   ` Paul Durrant
@ 2025-01-21 17:11     ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:11 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > When updating a specific PV clock, make a full copy of KVM's reference
> > copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> > E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> > PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> > clock.
> 
> ... but the line I queried in the previous patch squashes the flag before
> the Xen PV clock is set up, so no bleed?

Yeah, in practice, no bleed after the previous patch.  But very theoretically,
there could be bleed if the guest set PVCLOCK_GUEST_STOPPED in the compat clock
*and* had both compat and non-compat Xen PV clocks active (is that even possible?)

> > Using a local copy of the pvclock structure also sets the stage for
> > eliminating the per-vCPU copy/cache (only the TSC frequency information
> > actually "needs" to be cached/persisted).
> > 
> > Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/x86.c | 13 ++++++++-----
> >   1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3c4d210e8a9e..5f3ad13a8ac7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   {
> >   	struct kvm_vcpu_arch *vcpu = &v->arch;
> >   	struct pvclock_vcpu_time_info *guest_hv_clock;
> > +	struct pvclock_vcpu_time_info hv_clock;
> >   	unsigned long flags;
> > +	memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> > +
> >   	read_lock_irqsave(&gpc->lock, flags);
> >   	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> >   		read_unlock_irqrestore(&gpc->lock, flags);
> > @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> >   	 * it is consistent.
> >   	 */
> > -	guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> > +	guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
> >   	smp_wmb();
> >   	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > -	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > +	hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > -	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > +	memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
> >   	if (force_tsc_unstable)
> >   		guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
> >   	smp_wmb();
> > -	guest_hv_clock->version = ++vcpu->hv_clock.version;
> > +	guest_hv_clock->version = ++hv_clock.version;
> >   	kvm_gpc_mark_dirty_in_slot(gpc);
> >   	read_unlock_irqrestore(&gpc->lock, flags);
> > -	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> > +	trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
> >   }
> >   static int kvm_guest_time_update(struct kvm_vcpu *v)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 17:09     ` Sean Christopherson
@ 2025-01-21 17:15       ` Paul Durrant
  2025-01-21 18:32         ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On 21/01/2025 17:09, Sean Christopherson wrote:
> On Tue, Jan 21, 2025, Paul Durrant wrote:
>>> ---
>>>    arch/x86/kvm/x86.c | 20 ++++++++++++++------
>>>    1 file changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index d8ee37dd2b57..3c4d210e8a9e 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>>>    	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>>>    	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>>> -	if (vcpu->pvclock_set_guest_stopped_request) {
>>> -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> -		vcpu->pvclock_set_guest_stopped_request = false;
>>> -	}
>>> -
>>>    	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>>>    	if (force_tsc_unstable)
>>> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>>    	if (use_master_clock)
>>>    		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>>> -	if (vcpu->pv_time.active)
>>> +	if (vcpu->pv_time.active) {
>>> +		/*
>>> +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
>>> +		 * historic behavior is to only process the request if kvmclock
>>> +		 * is active/enabled.
>>> +		 */
>>> +		if (vcpu->pvclock_set_guest_stopped_request) {
>>> +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> +			vcpu->pvclock_set_guest_stopped_request = false;
>>> +		}
>>>    		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
>>> +
>>> +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>>
>> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
>> clearly keeps the flag enabled if it already set and, without this patch, I
>> don't see anything clearing it.
> 
> Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
> otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> only for kvmclock).
> 
> Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> applied to the first active clock (kvmclock).
> 
> The only way I can think of to fully isolate the changes would be to split this
> into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
> 
> 4a would be quite ugly, because to avoid introducing a functional change, it
> would need to be:
> 
> 	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> 	    vcpu->xen.vcpu_time_info_cache.active) {
> 		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> 		vcpu->pvclock_set_guest_stopped_request = false;
> 	}
> 
> But it's not the worst intermediate code, so I'm not opposed to going that
> route.
> 

What about putting this change after patch 7. Then you could take a 
local copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and 
so avoid bleeding the flag that way?

>>> +	}
>>> +
>>>    #ifdef CONFIG_KVM_XEN
>>>    	if (vcpu->xen.vcpu_info_cache.active)
>>>    		kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 15:59       ` Paul Durrant
@ 2025-01-21 17:16         ` David Woodhouse
  2025-01-21 17:30           ` Paul Durrant
  0 siblings, 1 reply; 32+ messages in thread
From: David Woodhouse @ 2025-01-21 17:16 UTC (permalink / raw)
  To: paul, Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
	Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
> On 21/01/2025 15:44, Sean Christopherson wrote:
> [snip]
> > 
> > I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
> > PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> > flag.  A guest that can't handle the STABLE flag has bigger problems than the
> > existence of a completely unrelated clock that is implied to be stable.
> > 
> 
> Agreed.
> 
> > > I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> > > the same guest on KVM though.)
> > 
> > That someone would have to be quite "brave" :-D
> 
> Maybe :-)

Xen itself does offer some Hyper-V enlightenments, and we might
reasonably expect KVM-based hypervisors to offer the same. We
explicitly do account for the KVM CPUID leaves moving up to let the
Hyper-V ones exist.

I don't recall if Xen's Hyper-V support includes the TSC page though.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
  2025-01-21 17:16         ` David Woodhouse
@ 2025-01-21 17:30           ` Paul Durrant
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:30 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson, Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
	Paul Durrant

On 21/01/2025 17:16, David Woodhouse wrote:
> On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
>> On 21/01/2025 15:44, Sean Christopherson wrote:
>> [snip]
>>>
>>> I think it's ok to keep the Hyper-V TSC page in this case.  It's not that the Xen
>>> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
>>> flag.  A guest that can't handle the STABLE flag has bigger problems than the
>>> existence of a completely unrelated clock that is implied to be stable.
>>>
>>
>> Agreed.
>>
>>>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>>>> the same guest on KVM though.)
>>>
>>> That someone would have to be quite "brave" :-D
>>
>> Maybe :-)
> 
> Xen itself does offer some Hyper-V enlightenments, and we might
> reasonably expect KVM-based hypervisors to offer the same. We
> explicitly do account for the KVM CPUID leaves moving up to let the
> Hyper-V ones exist.
> 
> I don't recall if Xen's Hyper-V support includes the TSC page though.

It does :-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
  2025-01-21 17:15       ` Paul Durrant
@ 2025-01-21 18:32         ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:32 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 21/01/2025 17:09, Sean Christopherson wrote:
> > On Tue, Jan 21, 2025, Paul Durrant wrote:
> > > > ---
> > > >    arch/x86/kvm/x86.c | 20 ++++++++++++++------
> > > >    1 file changed, 14 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index d8ee37dd2b57..3c4d210e8a9e 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > > >    	/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > > >    	vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > > > -	if (vcpu->pvclock_set_guest_stopped_request) {
> > > > -		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > -		vcpu->pvclock_set_guest_stopped_request = false;
> > > > -	}
> > > > -
> > > >    	memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > > >    	if (force_tsc_unstable)
> > > > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > > >    	if (use_master_clock)
> > > >    		vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > > > -	if (vcpu->pv_time.active)
> > > > +	if (vcpu->pv_time.active) {
> > > > +		/*
> > > > +		 * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > > > +		 * historic behavior is to only process the request if kvmclock
> > > > +		 * is active/enabled.
> > > > +		 */
> > > > +		if (vcpu->pvclock_set_guest_stopped_request) {
> > > > +			vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > +			vcpu->pvclock_set_guest_stopped_request = false;
> > > > +		}
> > > >    		kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > > > +
> > > > +		vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> > > 
> > > Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> > > clearly keeps the flag enabled if it already set and, without this patch, I
> > > don't see anything clearing it.
> > 
> > Oh, I see what you're getting at.  Hrm.  Yes, clearing the flag is intentional,
> > otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> > only for kvmclock).
> > 
> > Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> > break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> > applied to the first active clock (kvmclock).
> > 
> > The only way I can think of to fully isolate the changes would be to split this
> > into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> > kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> > ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
> > 
> > 4a would be quite ugly, because to avoid introducing a functional change, it
> > would need to be:
> > 
> > 	if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> > 	    vcpu->xen.vcpu_time_info_cache.active) {
> > 		vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > 		vcpu->pvclock_set_guest_stopped_request = false;
> > 	}
> > 
> > But it's not the worst intermediate code, so I'm not opposed to going that
> > route.
> > 
> 
> What about putting this change after patch 7. Then you could take a local
> copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and so avoid
> bleeding the flag that way?

But to preserve the current behavior of setting PVCLOCK_GUEST_STOPPED for all
clocks, processing pvclock_set_guest_stopped_request needs to be moved out of
kvm_setup_guest_pvclock() before said helper can make a copy of the reference.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
  2025-01-21 16:58   ` Paul Durrant
@ 2025-01-21 18:45     ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:45 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
	syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
	Vitaly Kuznetsov

On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> > reference copy may not be up-to-date, i.e. may yield a false positive of
> > sorts.  In the unlikely scenario that the guest is starting a Xen timer
> > and has used a Xen pvclock in the past, but has since but turned it "off",
> > then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> > if and only if at least pvclock is enabled.
> > 
> > Furthermore, vcpu->arch.hv_clock is currently used by three different
> > pvclocks: kvmclock, Xen, and Xen compat.  While it's extremely unlikely a
> > guest would ever enable multiple pvclocks, effectively sharing KVM's
> > reference clock could yield very weird behavior.  Using the guest's active
> > Xen pvclock instead of KVM's reference will allow dropping KVM's
> > reference copy.
> > 
> > Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> > Cc: Paul Durrant <pdurrant@amazon.com>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
> >   1 file changed, 53 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index a909b817b9c0..b82c28223585 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
> >   	return HRTIMER_NORESTART;
> >   }
> > +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> > +				 struct pvclock_vcpu_time_info *hv_clock,
> > +				 struct gfn_to_pfn_cache *gpc,
> > +				 unsigned int offset)
> > +{
> > +	struct pvclock_vcpu_time_info *guest_hv_clock;
> > +	unsigned long flags;
> > +	int r;
> > +
> > +	read_lock_irqsave(&gpc->lock, flags);
> > +	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> > +		read_unlock_irqrestore(&gpc->lock, flags);
> > +
> > +		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> > +		if (r)
> > +			return r;
> > +
> > +		read_lock_irqsave(&gpc->lock, flags);
> > +	}
> > +
> 
> I guess I must be missing something subtle... What is setting guest_hv_clock
> to point at something meaningful before this line?

Nope, you're not missing anything, this code is completely broken.  As pointed
out by the kernel test bot, the caller is also busted, because the "xen" pointer
is never initialied.

	struct kvm_vcpu_xen *xen;

	...

	do {
		...

		if (xen->vcpu_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
						offsetof(struct compat_vcpu_info, time));
		else if (xen->vcpu_time_info_cache.active)
			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
		if (r)
			break;
	}


I suspect the selftest passes because the @gpc passed to xen_get_guest_pvclock()
is garbage, which likely results in kvm_gpc_refresh() failing, and so KVM falls
backs to the less precise method:

	if (r) {
		/*
		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
		 *
		 * Also if the guest PV clock hasn't been set up yet, as is
		 * likely to be the case during migration when the vCPU has
		 * not been run yet. It would be possible to calculate the
		 * scaling factors properly in that case but there's not much
		 * point in doing so. The get_kvmclock_ns() drift accumulates
		 * over time, so it's OK to use it at startup. Besides, on
		 * migration there's going to be a little bit of skew in the
		 * precise moment at which timers fire anyway. Often they'll
		 * be in the "past" by the time the VM is running again after
		 * migration.
		 */
		guest_now = get_kvmclock_ns(vcpu->kvm);
		kernel_now = ktime_get();
	}

Ugh.  And the reason my build tests didn't catch this is because the only config
I test with KVM_XEN=y also has KASAN=y, which is incompatible with KVM_ERROR=y
(unless the global WERROR=y is enabled).

Time to punt KASAN=y to it's own Kconfig I guess...

I'll verify the happy path is actually being tested before posting v2.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
@ 2025-01-22 15:05 kernel test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kernel test robot @ 2025-01-22 15:05 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20250118005552.2626804-7-seanjc@google.com>
References: <20250118005552.2626804-7-seanjc@google.com>
TO: Sean Christopherson <seanjc@google.com>

Hi Sean,

kernel test robot noticed the following build warnings:

[auto build test WARNING on eb723766b1030a23c38adf2348b7c3d1409d11f0]

url:    https://github.com/intel-lab-lkp/linux/commits/Sean-Christopherson/KVM-x86-Don-t-take-kvm-lock-when-iterating-over-vCPUs-in-suspend-notifier/20250118-085939
base:   eb723766b1030a23c38adf2348b7c3d1409d11f0
patch link:    https://lore.kernel.org/r/20250118005552.2626804-7-seanjc%40google.com
patch subject: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
:::::: branch date: 5 days ago
:::::: commit date: 5 days ago
config: i386-randconfig-141-20250122 (https://download.01.org/0day-ci/archive/20250122/202501222250.KDXKa8hj-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202501222250.KDXKa8hj-lkp@intel.com/

New smatch warnings:
arch/x86/kvm/xen.c:173 xen_get_guest_pvclock() error: uninitialized symbol 'guest_hv_clock'.
arch/x86/kvm/xen.c:219 kvm_xen_start_timer() error: uninitialized symbol 'xen'.

Old smatch warnings:
include/linux/kvm_host.h:998 kvm_get_vcpu_by_id() warn: iterator 'i' not incremented

vim +/guest_hv_clock +173 arch/x86/kvm/xen.c

536395260582be Joao Martins        2022-03-03  152  
4d6163d65c0021 Sean Christopherson 2025-01-17  153  static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
4d6163d65c0021 Sean Christopherson 2025-01-17  154  				 struct pvclock_vcpu_time_info *hv_clock,
4d6163d65c0021 Sean Christopherson 2025-01-17  155  				 struct gfn_to_pfn_cache *gpc,
4d6163d65c0021 Sean Christopherson 2025-01-17  156  				 unsigned int offset)
4d6163d65c0021 Sean Christopherson 2025-01-17  157  {
4d6163d65c0021 Sean Christopherson 2025-01-17  158  	struct pvclock_vcpu_time_info *guest_hv_clock;
4d6163d65c0021 Sean Christopherson 2025-01-17  159  	unsigned long flags;
4d6163d65c0021 Sean Christopherson 2025-01-17  160  	int r;
4d6163d65c0021 Sean Christopherson 2025-01-17  161  
4d6163d65c0021 Sean Christopherson 2025-01-17  162  	read_lock_irqsave(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17  163  	while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
4d6163d65c0021 Sean Christopherson 2025-01-17  164  		read_unlock_irqrestore(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17  165  
4d6163d65c0021 Sean Christopherson 2025-01-17  166  		r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
4d6163d65c0021 Sean Christopherson 2025-01-17  167  		if (r)
4d6163d65c0021 Sean Christopherson 2025-01-17  168  			return r;
4d6163d65c0021 Sean Christopherson 2025-01-17  169  
4d6163d65c0021 Sean Christopherson 2025-01-17  170  		read_lock_irqsave(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17  171  	}
4d6163d65c0021 Sean Christopherson 2025-01-17  172  
4d6163d65c0021 Sean Christopherson 2025-01-17 @173  	memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
4d6163d65c0021 Sean Christopherson 2025-01-17  174  	read_unlock_irqrestore(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17  175  
4d6163d65c0021 Sean Christopherson 2025-01-17  176  	/*
4d6163d65c0021 Sean Christopherson 2025-01-17  177  	 * Sanity check TSC shift+multiplier to verify the guest's view of time
4d6163d65c0021 Sean Christopherson 2025-01-17  178  	 * is more or less consistent.
4d6163d65c0021 Sean Christopherson 2025-01-17  179  	 */
4d6163d65c0021 Sean Christopherson 2025-01-17  180  	if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
4d6163d65c0021 Sean Christopherson 2025-01-17  181  	    hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
4d6163d65c0021 Sean Christopherson 2025-01-17  182  		return -EINVAL;
4d6163d65c0021 Sean Christopherson 2025-01-17  183  	return 0;
4d6163d65c0021 Sean Christopherson 2025-01-17  184  }
4d6163d65c0021 Sean Christopherson 2025-01-17  185  
451a707813aee2 David Woodhouse     2024-02-27  186  static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
451a707813aee2 David Woodhouse     2024-02-27  187  				bool linux_wa)
536395260582be Joao Martins        2022-03-03  188  {
4d6163d65c0021 Sean Christopherson 2025-01-17  189  	struct kvm_vcpu_xen *xen;
451a707813aee2 David Woodhouse     2024-02-27  190  	int64_t kernel_now, delta;
451a707813aee2 David Woodhouse     2024-02-27  191  	uint64_t guest_now;
4d6163d65c0021 Sean Christopherson 2025-01-17  192  	int r = -EOPNOTSUPP;
451a707813aee2 David Woodhouse     2024-02-27  193  
451a707813aee2 David Woodhouse     2024-02-27  194  	/*
451a707813aee2 David Woodhouse     2024-02-27  195  	 * The guest provides the requested timeout in absolute nanoseconds
451a707813aee2 David Woodhouse     2024-02-27  196  	 * of the KVM clock — as *it* sees it, based on the scaled TSC and
451a707813aee2 David Woodhouse     2024-02-27  197  	 * the pvclock information provided by KVM.
451a707813aee2 David Woodhouse     2024-02-27  198  	 *
451a707813aee2 David Woodhouse     2024-02-27  199  	 * The kernel doesn't support hrtimers based on CLOCK_MONOTONIC_RAW
451a707813aee2 David Woodhouse     2024-02-27  200  	 * so use CLOCK_MONOTONIC. In the timescales covered by timers, the
451a707813aee2 David Woodhouse     2024-02-27  201  	 * difference won't matter much as there is no cumulative effect.
451a707813aee2 David Woodhouse     2024-02-27  202  	 *
451a707813aee2 David Woodhouse     2024-02-27  203  	 * Calculate the time for some arbitrary point in time around "now"
451a707813aee2 David Woodhouse     2024-02-27  204  	 * in terms of both kvmclock and CLOCK_MONOTONIC. Calculate the
451a707813aee2 David Woodhouse     2024-02-27  205  	 * delta between the kvmclock "now" value and the guest's requested
451a707813aee2 David Woodhouse     2024-02-27  206  	 * timeout, apply the "Linux workaround" described below, and add
451a707813aee2 David Woodhouse     2024-02-27  207  	 * the resulting delta to the CLOCK_MONOTONIC "now" value, to get
451a707813aee2 David Woodhouse     2024-02-27  208  	 * the absolute CLOCK_MONOTONIC time at which the timer should
451a707813aee2 David Woodhouse     2024-02-27  209  	 * fire.
451a707813aee2 David Woodhouse     2024-02-27  210  	 */
4d6163d65c0021 Sean Christopherson 2025-01-17  211  	do {
4d6163d65c0021 Sean Christopherson 2025-01-17  212  		struct pvclock_vcpu_time_info hv_clock;
451a707813aee2 David Woodhouse     2024-02-27  213  		uint64_t host_tsc, guest_tsc;
451a707813aee2 David Woodhouse     2024-02-27  214  
4d6163d65c0021 Sean Christopherson 2025-01-17  215  		if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
4d6163d65c0021 Sean Christopherson 2025-01-17  216  		    !vcpu->kvm->arch.use_master_clock)
4d6163d65c0021 Sean Christopherson 2025-01-17  217  			break;
4d6163d65c0021 Sean Christopherson 2025-01-17  218  
4d6163d65c0021 Sean Christopherson 2025-01-17 @219  		if (xen->vcpu_info_cache.active)
4d6163d65c0021 Sean Christopherson 2025-01-17  220  			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
4d6163d65c0021 Sean Christopherson 2025-01-17  221  						offsetof(struct compat_vcpu_info, time));
4d6163d65c0021 Sean Christopherson 2025-01-17  222  		else if (xen->vcpu_time_info_cache.active)
4d6163d65c0021 Sean Christopherson 2025-01-17  223  			r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
4d6163d65c0021 Sean Christopherson 2025-01-17  224  		if (r)
4d6163d65c0021 Sean Christopherson 2025-01-17  225  			break;
4d6163d65c0021 Sean Christopherson 2025-01-17  226  
451a707813aee2 David Woodhouse     2024-02-27  227  		if (!IS_ENABLED(CONFIG_64BIT) ||
451a707813aee2 David Woodhouse     2024-02-27  228  		    !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
451a707813aee2 David Woodhouse     2024-02-27  229  			/*
451a707813aee2 David Woodhouse     2024-02-27  230  			 * Don't fall back to get_kvmclock_ns() because it's
451a707813aee2 David Woodhouse     2024-02-27  231  			 * broken; it has a systemic error in its results
451a707813aee2 David Woodhouse     2024-02-27  232  			 * because it scales directly from host TSC to
451a707813aee2 David Woodhouse     2024-02-27  233  			 * nanoseconds, and doesn't scale first to guest TSC
451a707813aee2 David Woodhouse     2024-02-27  234  			 * and *then* to nanoseconds as the guest does.
451a707813aee2 David Woodhouse     2024-02-27  235  			 *
451a707813aee2 David Woodhouse     2024-02-27  236  			 * There is a small error introduced here because time
451a707813aee2 David Woodhouse     2024-02-27  237  			 * continues to elapse between the ktime_get() and the
451a707813aee2 David Woodhouse     2024-02-27  238  			 * subsequent rdtsc(). But not the systemic drift due
451a707813aee2 David Woodhouse     2024-02-27  239  			 * to get_kvmclock_ns().
451a707813aee2 David Woodhouse     2024-02-27  240  			 */
451a707813aee2 David Woodhouse     2024-02-27  241  			kernel_now = ktime_get(); /* This is CLOCK_MONOTONIC */
451a707813aee2 David Woodhouse     2024-02-27  242  			host_tsc = rdtsc();
451a707813aee2 David Woodhouse     2024-02-27  243  		}
451a707813aee2 David Woodhouse     2024-02-27  244  
451a707813aee2 David Woodhouse     2024-02-27  245  		/* Calculate the guest kvmclock as the guest would do it. */
451a707813aee2 David Woodhouse     2024-02-27  246  		guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
4d6163d65c0021 Sean Christopherson 2025-01-17  247  		guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
4d6163d65c0021 Sean Christopherson 2025-01-17  248  	} while (0);
4d6163d65c0021 Sean Christopherson 2025-01-17  249  
4d6163d65c0021 Sean Christopherson 2025-01-17  250  	if (r) {
451a707813aee2 David Woodhouse     2024-02-27  251  		/*
451a707813aee2 David Woodhouse     2024-02-27  252  		 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
451a707813aee2 David Woodhouse     2024-02-27  253  		 *
451a707813aee2 David Woodhouse     2024-02-27  254  		 * Also if the guest PV clock hasn't been set up yet, as is
451a707813aee2 David Woodhouse     2024-02-27  255  		 * likely to be the case during migration when the vCPU has
451a707813aee2 David Woodhouse     2024-02-27  256  		 * not been run yet. It would be possible to calculate the
451a707813aee2 David Woodhouse     2024-02-27  257  		 * scaling factors properly in that case but there's not much
451a707813aee2 David Woodhouse     2024-02-27  258  		 * point in doing so. The get_kvmclock_ns() drift accumulates
451a707813aee2 David Woodhouse     2024-02-27  259  		 * over time, so it's OK to use it at startup. Besides, on
451a707813aee2 David Woodhouse     2024-02-27  260  		 * migration there's going to be a little bit of skew in the
451a707813aee2 David Woodhouse     2024-02-27  261  		 * precise moment at which timers fire anyway. Often they'll
451a707813aee2 David Woodhouse     2024-02-27  262  		 * be in the "past" by the time the VM is running again after
451a707813aee2 David Woodhouse     2024-02-27  263  		 * migration.
451a707813aee2 David Woodhouse     2024-02-27  264  		 */
451a707813aee2 David Woodhouse     2024-02-27  265  		guest_now = get_kvmclock_ns(vcpu->kvm);
451a707813aee2 David Woodhouse     2024-02-27  266  		kernel_now = ktime_get();
451a707813aee2 David Woodhouse     2024-02-27  267  	}
451a707813aee2 David Woodhouse     2024-02-27  268  
451a707813aee2 David Woodhouse     2024-02-27  269  	delta = guest_abs - guest_now;
451a707813aee2 David Woodhouse     2024-02-27  270  
451a707813aee2 David Woodhouse     2024-02-27  271  	/*
451a707813aee2 David Woodhouse     2024-02-27  272  	 * Xen has a 'Linux workaround' in do_set_timer_op() which checks for
451a707813aee2 David Woodhouse     2024-02-27  273  	 * negative absolute timeout values (caused by integer overflow), and
451a707813aee2 David Woodhouse     2024-02-27  274  	 * for values about 13 days in the future (2^50ns) which would be
451a707813aee2 David Woodhouse     2024-02-27  275  	 * caused by jiffies overflow. For those cases, Xen sets the timeout
451a707813aee2 David Woodhouse     2024-02-27  276  	 * 100ms in the future (not *too* soon, since if a guest really did
451a707813aee2 David Woodhouse     2024-02-27  277  	 * set a long timeout on purpose we don't want to keep churning CPU
451a707813aee2 David Woodhouse     2024-02-27  278  	 * time by waking it up).  Emulate Xen's workaround when starting the
451a707813aee2 David Woodhouse     2024-02-27  279  	 * timer in response to __HYPERVISOR_set_timer_op.
451a707813aee2 David Woodhouse     2024-02-27  280  	 */
451a707813aee2 David Woodhouse     2024-02-27  281  	if (linux_wa &&
451a707813aee2 David Woodhouse     2024-02-27  282  	    unlikely((int64_t)guest_abs < 0 ||
451a707813aee2 David Woodhouse     2024-02-27  283  		     (delta > 0 && (uint32_t) (delta >> 50) != 0))) {
451a707813aee2 David Woodhouse     2024-02-27  284  		delta = 100 * NSEC_PER_MSEC;
451a707813aee2 David Woodhouse     2024-02-27  285  		guest_abs = guest_now + delta;
451a707813aee2 David Woodhouse     2024-02-27  286  	}
451a707813aee2 David Woodhouse     2024-02-27  287  
77c9b9dea4fb3e David Woodhouse     2023-09-30  288  	/*
77c9b9dea4fb3e David Woodhouse     2023-09-30  289  	 * Avoid races with the old timer firing. Checking timer_expires
77c9b9dea4fb3e David Woodhouse     2023-09-30  290  	 * to avoid calling hrtimer_cancel() will only have false positives
77c9b9dea4fb3e David Woodhouse     2023-09-30  291  	 * so is fine.
77c9b9dea4fb3e David Woodhouse     2023-09-30  292  	 */
77c9b9dea4fb3e David Woodhouse     2023-09-30  293  	if (vcpu->arch.xen.timer_expires)
77c9b9dea4fb3e David Woodhouse     2023-09-30  294  		hrtimer_cancel(&vcpu->arch.xen.timer);
77c9b9dea4fb3e David Woodhouse     2023-09-30  295  
536395260582be Joao Martins        2022-03-03  296  	atomic_set(&vcpu->arch.xen.timer_pending, 0);
536395260582be Joao Martins        2022-03-03  297  	vcpu->arch.xen.timer_expires = guest_abs;
536395260582be Joao Martins        2022-03-03  298  
451a707813aee2 David Woodhouse     2024-02-27  299  	if (delta <= 0)
536395260582be Joao Martins        2022-03-03  300  		xen_timer_callback(&vcpu->arch.xen.timer);
451a707813aee2 David Woodhouse     2024-02-27  301  	else
536395260582be Joao Martins        2022-03-03  302  		hrtimer_start(&vcpu->arch.xen.timer,
451a707813aee2 David Woodhouse     2024-02-27  303  			      ktime_add_ns(kernel_now, delta),
536395260582be Joao Martins        2022-03-03  304  			      HRTIMER_MODE_ABS_HARD);
536395260582be Joao Martins        2022-03-03  305  }
536395260582be Joao Martins        2022-03-03  306  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-01-22 15:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-18  0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18  0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
2025-01-21 16:01   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
2025-01-21 16:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
2025-01-21 16:05   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
2025-01-21 16:42   ` Paul Durrant
2025-01-21 17:09     ` Sean Christopherson
2025-01-21 17:15       ` Paul Durrant
2025-01-21 18:32         ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
2025-01-21 16:54   ` Paul Durrant
2025-01-21 17:11     ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
2025-01-19  3:00   ` kernel test robot
2025-01-21 16:58   ` Paul Durrant
2025-01-21 18:45     ` Sean Christopherson
2025-01-18  0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
2025-01-21 17:00   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
2025-01-21 17:03   ` Paul Durrant
2025-01-18  0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
2025-01-20 14:49   ` Vitaly Kuznetsov
2025-01-21 15:44     ` Sean Christopherson
2025-01-21 15:59       ` Paul Durrant
2025-01-21 17:16         ` David Woodhouse
2025-01-21 17:30           ` Paul Durrant
2025-01-18  0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
2025-01-21 17:05   ` Paul Durrant
  -- strict thread matches above, loose matches on Subject: below --
2025-01-22 15:05 [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.