* [PATCH 00/10] KVM: x86: pvclock fixes and cleanups
@ 2025-01-18 0:55 Sean Christopherson
2025-01-18 0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
` (9 more replies)
0 siblings, 10 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Fix a lockdep splat in KVM's suspend notifier by simply removing a
spurious kvm->lock acquisition related to kvmclock, and then try to
wrangle KVM's pvclock handling into something approaching sanity (I
made the mistake of looking at how KVM handled PVCLOCK_GUEST_STOPPED).
David,
I didn't look too closely to see how this interacts with your overhaul of
the pvclock madness[*]. When I first started poking at this, I didn't
realize vcpu->arch.hv_lock had tendrils in so many places. Please holler
if you want me to drop the vcpu->arch.hv_lock changes and/or tweak
something to make it play nice with your series.
The Xen changes are *very* lightly tested, so I definitely won't apply
the potentially problematic changes, i.e. anything past "Don't bleed
PVCLOCK_GUEST_STOPPED across PV clocks", until I get a thumbs up from
you and/or Paul.
[*] https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org
Sean Christopherson (10):
KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend
notifier
KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV
clock
KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
KVM: x86/xen: Use guest's copy of pvclock when starting timer
KVM: x86: Pass reference pvclock as a param to
kvm_setup_guest_pvclock()
KVM: x86: Remove per-vCPU "cache" of its reference pvclock
KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock
update)
KVM: x86: Override TSC_STABLE flag for Xen PV clocks in
kvm_guest_time_update()
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kvm/x86.c | 115 ++++++++++++++++----------------
arch/x86/kvm/xen.c | 62 +++++++++++++++--
3 files changed, 114 insertions(+), 66 deletions(-)
base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 16:01 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
` (8 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
don't take kvm->lock as doing so can trigger a largely theoretical
deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
any way (pv_time.active and pvclock_set_guest_stopped_request are
protected by vcpu->mutex, not kvm->lock).
Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d9a16fd4d3..26e18c9b0375 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6907,7 +6907,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
unsigned long i;
int ret = 0;
- mutex_lock(&kvm->lock);
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!vcpu->arch.pv_time.active)
continue;
@@ -6919,7 +6918,6 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
break;
}
}
- mutex_unlock(&kvm->lock);
return ret ? NOTIFY_BAD : NOTIFY_DONE;
}
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18 0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 16:03 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
` (7 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
is active. I.e. barring some bizarre edge case that shouldn't be treated
as an error in the first place, kvm_arch_suspend_notifier() can't fail.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26e18c9b0375..ef21158ec6b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6905,21 +6905,15 @@ static int kvm_arch_suspend_notifier(struct kvm *kvm)
{
struct kvm_vcpu *vcpu;
unsigned long i;
- int ret = 0;
- kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!vcpu->arch.pv_time.active)
- continue;
+ /*
+ * Ignore the return, marking the guest paused only "fails" if the vCPU
+ * isn't using kvmclock; continuing on is correct and desirable.
+ */
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ (void)kvm_set_guest_paused(vcpu);
- ret = kvm_set_guest_paused(vcpu);
- if (ret) {
- kvm_err("Failed to pause guest VCPU%d: %d\n",
- vcpu->vcpu_id, ret);
- break;
- }
- }
-
- return ret ? NOTIFY_BAD : NOTIFY_DONE;
+ return NOTIFY_DONE;
}
int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18 0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
2025-01-18 0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 16:05 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
` (6 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
is immediately shoved into the per-vCPU "cache", i.e. the local variable
serves no purpose.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef21158ec6b2..d8ee37dd2b57 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3178,7 +3178,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
struct kvm_arch *ka = &v->kvm->arch;
s64 kernel_ns;
u64 tsc_timestamp, host_tsc;
- u8 pvclock_flags;
bool use_master_clock;
#ifdef CONFIG_KVM_XEN
/*
@@ -3261,11 +3260,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->last_guest_tsc = tsc_timestamp;
/* If the host uses TSC clocksource, then it is stable */
- pvclock_flags = 0;
+ vcpu->hv_clock.flags = 0;
if (use_master_clock)
- pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
-
- vcpu->hv_clock.flags = pvclock_flags;
+ vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
if (vcpu->pv_time.active)
kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (2 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 16:42 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
` (5 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Handle "guest stopped" propagation only for kvmclock, as the flag is set
if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
bizarre behavior for the guest.
Simply restrict the flag to kvmclock, e.g. instead of trying to handle
Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
added during a refactoring, and while Xen proper defines
XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
support the flag.
Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
is active to preserve the original behavior, i.e. keep the flag pending
if kvmclock happens to be disabled when KVM processes the initial request.
Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d8ee37dd2b57..3c4d210e8a9e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
- if (vcpu->pvclock_set_guest_stopped_request) {
- vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
- vcpu->pvclock_set_guest_stopped_request = false;
- }
-
memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
if (force_tsc_unstable)
@@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
if (use_master_clock)
vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
- if (vcpu->pv_time.active)
+ if (vcpu->pv_time.active) {
+ /*
+ * GUEST_STOPPED is only supported by kvmclock, and KVM's
+ * historic behavior is to only process the request if kvmclock
+ * is active/enabled.
+ */
+ if (vcpu->pvclock_set_guest_stopped_request) {
+ vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+ vcpu->pvclock_set_guest_stopped_request = false;
+ }
kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+
+ vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+ }
+
#ifdef CONFIG_KVM_XEN
if (vcpu->xen.vcpu_info_cache.active)
kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (3 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 16:54 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
` (4 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
When updating a specific PV clock, make a full copy of KVM's reference
copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
clock.
Using a local copy of the pvclock structure also sets the stage for
eliminating the per-vCPU copy/cache (only the TSC frequency information
actually "needs" to be cached/persisted).
Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c4d210e8a9e..5f3ad13a8ac7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
{
struct kvm_vcpu_arch *vcpu = &v->arch;
struct pvclock_vcpu_time_info *guest_hv_clock;
+ struct pvclock_vcpu_time_info hv_clock;
unsigned long flags;
+ memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+
read_lock_irqsave(&gpc->lock, flags);
while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
read_unlock_irqrestore(&gpc->lock, flags);
@@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
* it is consistent.
*/
- guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
+ guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
smp_wmb();
/* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
- vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
+ hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
- memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
+ memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
if (force_tsc_unstable)
guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
smp_wmb();
- guest_hv_clock->version = ++vcpu->hv_clock.version;
+ guest_hv_clock->version = ++hv_clock.version;
kvm_gpc_mark_dirty_in_slot(gpc);
read_unlock_irqrestore(&gpc->lock, flags);
- trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
+ trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
}
static int kvm_guest_time_update(struct kvm_vcpu *v)
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (4 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-19 3:00 ` kernel test robot
2025-01-21 16:58 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
` (3 subsequent siblings)
9 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
reference copy may not be up-to-date, i.e. may yield a false positive of
sorts. In the unlikely scenario that the guest is starting a Xen timer
and has used a Xen pvclock in the past, but has since but turned it "off",
then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
if and only if at least pvclock is enabled.
Furthermore, vcpu->arch.hv_clock is currently used by three different
pvclocks: kvmclock, Xen, and Xen compat. While it's extremely unlikely a
guest would ever enable multiple pvclocks, effectively sharing KVM's
reference clock could yield very weird behavior. Using the guest's active
Xen pvclock instead of KVM's reference will allow dropping KVM's
reference copy.
Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 53 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index a909b817b9c0..b82c28223585 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
+static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
+ struct pvclock_vcpu_time_info *hv_clock,
+ struct gfn_to_pfn_cache *gpc,
+ unsigned int offset)
+{
+ struct pvclock_vcpu_time_info *guest_hv_clock;
+ unsigned long flags;
+ int r;
+
+ read_lock_irqsave(&gpc->lock, flags);
+ while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
+ read_unlock_irqrestore(&gpc->lock, flags);
+
+ r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
+ if (r)
+ return r;
+
+ read_lock_irqsave(&gpc->lock, flags);
+ }
+
+ memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
+ read_unlock_irqrestore(&gpc->lock, flags);
+
+ /*
+ * Sanity check TSC shift+multiplier to verify the guest's view of time
+ * is more or less consistent.
+ */
+ if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
+ hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+ return -EINVAL;
+ return 0;
+}
+
static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
bool linux_wa)
{
+ struct kvm_vcpu_xen *xen;
int64_t kernel_now, delta;
uint64_t guest_now;
+ int r = -EOPNOTSUPP;
/*
* The guest provides the requested timeout in absolute nanoseconds
@@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
* the absolute CLOCK_MONOTONIC time at which the timer should
* fire.
*/
- if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
- static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+ do {
+ struct pvclock_vcpu_time_info hv_clock;
uint64_t host_tsc, guest_tsc;
+ if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
+ !vcpu->kvm->arch.use_master_clock)
+ break;
+
+ if (xen->vcpu_info_cache.active)
+ r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
+ offsetof(struct compat_vcpu_info, time));
+ else if (xen->vcpu_time_info_cache.active)
+ r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
+ if (r)
+ break;
+
if (!IS_ENABLED(CONFIG_64BIT) ||
!kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
/*
@@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
/* Calculate the guest kvmclock as the guest would do it. */
guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
- guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
- guest_tsc);
- } else {
+ guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
+ } while (0);
+
+ if (r) {
/*
* Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
*
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (5 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 17:00 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
` (2 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Pass the reference pvclock structure that's used to setup each individual
pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
toward removing kvm_vcpu_arch.hv_clock.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f3ad13a8ac7..06d27b3cc207 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3116,17 +3116,17 @@ u64 get_kvmclock_ns(struct kvm *kvm)
return data.clock;
}
-static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
+static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
+ struct kvm_vcpu *vcpu,
struct gfn_to_pfn_cache *gpc,
unsigned int offset,
bool force_tsc_unstable)
{
- struct kvm_vcpu_arch *vcpu = &v->arch;
struct pvclock_vcpu_time_info *guest_hv_clock;
struct pvclock_vcpu_time_info hv_clock;
unsigned long flags;
- memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
+ memcpy(&hv_clock, ref_hv_clock, sizeof(hv_clock));
read_lock_irqsave(&gpc->lock, flags);
while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
@@ -3165,7 +3165,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
kvm_gpc_mark_dirty_in_slot(gpc);
read_unlock_irqrestore(&gpc->lock, flags);
- trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
+ trace_kvm_pvclock_update(vcpu->vcpu_id, &hv_clock);
}
static int kvm_guest_time_update(struct kvm_vcpu *v)
@@ -3272,18 +3272,18 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
vcpu->pvclock_set_guest_stopped_request = false;
}
- kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
+ kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
}
#ifdef CONFIG_KVM_XEN
if (vcpu->xen.vcpu_info_cache.active)
- kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
+ kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
offsetof(struct compat_vcpu_info, time),
xen_pvclock_tsc_unstable);
if (vcpu->xen.vcpu_time_info_cache.active)
- kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0,
+ kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
xen_pvclock_tsc_unstable);
#endif
kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (6 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 17:03 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
2025-01-18 0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
Remove the per-vCPU "cache" of the reference pvclock and instead cache
only the TSC shift+multiplier. All other fields in pvclock are fully
recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.
In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
fields makes it easier to see that those fields are tied to hw_tsc_khz
(they exist to avoid having to do expensive math in the common case).
And conversely, not tracking the other fields makes it easier to see that
things like the version number are pulled from the guest's copy, not from
KVM's reference.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/x86.c | 27 +++++++++++++++------------
arch/x86/kvm/xen.c | 8 ++++----
3 files changed, 21 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5193c3dfbce1..f26105654ec4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -900,7 +900,8 @@ struct kvm_vcpu_arch {
int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
gpa_t time;
- struct pvclock_vcpu_time_info hv_clock;
+ u8 pvclock_tsc_shift;
+ u32 pvclock_tsc_mul;
unsigned int hw_tsc_khz;
struct gfn_to_pfn_cache pv_time;
/* set guest stopped flag in pvclock flags field */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 06d27b3cc207..9eabd70891dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3170,6 +3170,7 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
static int kvm_guest_time_update(struct kvm_vcpu *v)
{
+ struct pvclock_vcpu_time_info hv_clock = {};
unsigned long flags, tgt_tsc_khz;
unsigned seq;
struct kvm_vcpu_arch *vcpu = &v->arch;
@@ -3247,20 +3248,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
- &vcpu->hv_clock.tsc_shift,
- &vcpu->hv_clock.tsc_to_system_mul);
+ &vcpu->pvclock_tsc_shift,
+ &vcpu->pvclock_tsc_mul);
vcpu->hw_tsc_khz = tgt_tsc_khz;
kvm_xen_update_tsc_info(v);
}
- vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
- vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
+ hv_clock.tsc_shift = vcpu->pvclock_tsc_shift;
+ hv_clock.tsc_to_system_mul = vcpu->pvclock_tsc_mul;
+ hv_clock.tsc_timestamp = tsc_timestamp;
+ hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
vcpu->last_guest_tsc = tsc_timestamp;
/* If the host uses TSC clocksource, then it is stable */
- vcpu->hv_clock.flags = 0;
+ hv_clock.flags = 0;
if (use_master_clock)
- vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
+ hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
if (vcpu->pv_time.active) {
/*
@@ -3269,24 +3272,24 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
* is active/enabled.
*/
if (vcpu->pvclock_set_guest_stopped_request) {
- vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
+ hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
vcpu->pvclock_set_guest_stopped_request = false;
}
- kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->pv_time, 0, false);
+ kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
- vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
+ hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
}
#ifdef CONFIG_KVM_XEN
if (vcpu->xen.vcpu_info_cache.active)
- kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_info_cache,
+ kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
offsetof(struct compat_vcpu_info, time),
xen_pvclock_tsc_unstable);
if (vcpu->xen.vcpu_time_info_cache.active)
- kvm_setup_guest_pvclock(&vcpu->hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
+ kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
xen_pvclock_tsc_unstable);
#endif
- kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+ kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
return 0;
}
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b82c28223585..7c6e4172527a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -177,8 +177,8 @@ static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
* Sanity check TSC shift+multiplier to verify the guest's view of time
* is more or less consistent.
*/
- if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
- hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
+ if (hv_clock->tsc_shift != vcpu->arch.pvclock_tsc_shift ||
+ hv_clock->tsc_to_system_mul != vcpu->arch.pvclock_tsc_mul)
return -EINVAL;
return 0;
}
@@ -2309,8 +2309,8 @@ void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu)
entry = kvm_find_cpuid_entry_index(vcpu, function, 1);
if (entry) {
- entry->ecx = vcpu->arch.hv_clock.tsc_to_system_mul;
- entry->edx = vcpu->arch.hv_clock.tsc_shift;
+ entry->ecx = vcpu->arch.pvclock_tsc_mul;
+ entry->edx = vcpu->arch.pvclock_tsc_shift;
}
entry = kvm_find_cpuid_entry_index(vcpu, function, 2);
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (7 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-20 14:49 ` Vitaly Kuznetsov
2025-01-18 0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
When updating paravirtual clocks, setup the Hyper-V TSC page before
Xen PV clocks. This will allow dropping xen_pvclock_tsc_unstable in favor
of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eabd70891dd..c68e7f7ba69d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
}
+ kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
+
#ifdef CONFIG_KVM_XEN
if (vcpu->xen.vcpu_info_cache.active)
kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
@@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
xen_pvclock_tsc_unstable);
#endif
- kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
return 0;
}
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
` (8 preceding siblings ...)
2025-01-18 0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-18 0:55 ` Sean Christopherson
2025-01-21 17:05 ` Paul Durrant
9 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-18 0:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
in the flags of the reference pvclock structure. Expand the comment to
(hopefully) make it obvious that Xen clocks need to be processed after all
clocks that care about the TSC_STABLE flag.
No functional change intended.
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
1 file changed, 15 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c68e7f7ba69d..065b349a0218 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3119,8 +3119,7 @@ u64 get_kvmclock_ns(struct kvm *kvm)
static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
struct kvm_vcpu *vcpu,
struct gfn_to_pfn_cache *gpc,
- unsigned int offset,
- bool force_tsc_unstable)
+ unsigned int offset)
{
struct pvclock_vcpu_time_info *guest_hv_clock;
struct pvclock_vcpu_time_info hv_clock;
@@ -3155,9 +3154,6 @@ static void kvm_setup_guest_pvclock(struct pvclock_vcpu_time_info *ref_hv_clock,
memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
- if (force_tsc_unstable)
- guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
-
smp_wmb();
guest_hv_clock->version = ++hv_clock.version;
@@ -3178,16 +3174,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
s64 kernel_ns;
u64 tsc_timestamp, host_tsc;
bool use_master_clock;
-#ifdef CONFIG_KVM_XEN
- /*
- * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
- * explicitly told to use TSC as its clocksource Xen will not set this bit.
- * This default behaviour led to bugs in some guest kernels which cause
- * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
- */
- bool xen_pvclock_tsc_unstable =
- ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
-#endif
kernel_ns = 0;
host_tsc = 0;
@@ -3275,7 +3261,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
vcpu->pvclock_set_guest_stopped_request = false;
}
- kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0, false);
+ kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->pv_time, 0);
hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
}
@@ -3283,13 +3269,22 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
#ifdef CONFIG_KVM_XEN
+ /*
+ * For Xen guests we may need to override PVCLOCK_TSC_STABLE_BIT as unless
+ * explicitly told to use TSC as its clocksource Xen will not set this bit.
+ * This default behaviour led to bugs in some guest kernels which cause
+ * problems if they observe PVCLOCK_TSC_STABLE_BIT in the pvclock flags.
+ *
+ * Note! Clear TSC_STABLE only for Xen clocks, i.e. the order matters!
+ */
+ if (ka->xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE)
+ hv_clock.flags &= ~PVCLOCK_TSC_STABLE_BIT;
+
if (vcpu->xen.vcpu_info_cache.active)
kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
- offsetof(struct compat_vcpu_info, time),
- xen_pvclock_tsc_unstable);
+ offsetof(struct compat_vcpu_info, time));
if (vcpu->xen.vcpu_time_info_cache.active)
- kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
- xen_pvclock_tsc_unstable);
+ kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0);
#endif
return 0;
}
--
2.48.0.rc2.279.g1de40edade-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
2025-01-18 0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
@ 2025-01-19 3:00 ` kernel test robot
2025-01-21 16:58 ` Paul Durrant
1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2025-01-19 3:00 UTC (permalink / raw)
To: Sean Christopherson; +Cc: llvm, oe-kbuild-all
Hi Sean,
kernel test robot noticed the following build warnings:
[auto build test WARNING on eb723766b1030a23c38adf2348b7c3d1409d11f0]
url: https://github.com/intel-lab-lkp/linux/commits/Sean-Christopherson/KVM-x86-Don-t-take-kvm-lock-when-iterating-over-vCPUs-in-suspend-notifier/20250118-085939
base: eb723766b1030a23c38adf2348b7c3d1409d11f0
patch link: https://lore.kernel.org/r/20250118005552.2626804-7-seanjc%40google.com
patch subject: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20250119/202501191041.ew4CA984-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250119/202501191041.ew4CA984-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501191041.ew4CA984-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from arch/x86/kvm/xen.c:10:
In file included from arch/x86/kvm/x86.h:5:
In file included from include/linux/kvm_host.h:16:
In file included from include/linux/mm.h:2223:
include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
505 | item];
| ~~~~
include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
512 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
524 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
525 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
>> arch/x86/kvm/xen.c:219:7: warning: variable 'xen' is uninitialized when used here [-Wuninitialized]
219 | if (xen->vcpu_info_cache.active)
| ^~~
arch/x86/kvm/xen.c:189:26: note: initialize the variable 'xen' to silence this warning
189 | struct kvm_vcpu_xen *xen;
| ^
| = NULL
5 warnings generated.
vim +/xen +219 arch/x86/kvm/xen.c
185
186 static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
187 bool linux_wa)
188 {
189 struct kvm_vcpu_xen *xen;
190 int64_t kernel_now, delta;
191 uint64_t guest_now;
192 int r = -EOPNOTSUPP;
193
194 /*
195 * The guest provides the requested timeout in absolute nanoseconds
196 * of the KVM clock — as *it* sees it, based on the scaled TSC and
197 * the pvclock information provided by KVM.
198 *
199 * The kernel doesn't support hrtimers based on CLOCK_MONOTONIC_RAW
200 * so use CLOCK_MONOTONIC. In the timescales covered by timers, the
201 * difference won't matter much as there is no cumulative effect.
202 *
203 * Calculate the time for some arbitrary point in time around "now"
204 * in terms of both kvmclock and CLOCK_MONOTONIC. Calculate the
205 * delta between the kvmclock "now" value and the guest's requested
206 * timeout, apply the "Linux workaround" described below, and add
207 * the resulting delta to the CLOCK_MONOTONIC "now" value, to get
208 * the absolute CLOCK_MONOTONIC time at which the timer should
209 * fire.
210 */
211 do {
212 struct pvclock_vcpu_time_info hv_clock;
213 uint64_t host_tsc, guest_tsc;
214
215 if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
216 !vcpu->kvm->arch.use_master_clock)
217 break;
218
> 219 if (xen->vcpu_info_cache.active)
220 r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
221 offsetof(struct compat_vcpu_info, time));
222 else if (xen->vcpu_time_info_cache.active)
223 r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
224 if (r)
225 break;
226
227 if (!IS_ENABLED(CONFIG_64BIT) ||
228 !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
229 /*
230 * Don't fall back to get_kvmclock_ns() because it's
231 * broken; it has a systemic error in its results
232 * because it scales directly from host TSC to
233 * nanoseconds, and doesn't scale first to guest TSC
234 * and *then* to nanoseconds as the guest does.
235 *
236 * There is a small error introduced here because time
237 * continues to elapse between the ktime_get() and the
238 * subsequent rdtsc(). But not the systemic drift due
239 * to get_kvmclock_ns().
240 */
241 kernel_now = ktime_get(); /* This is CLOCK_MONOTONIC */
242 host_tsc = rdtsc();
243 }
244
245 /* Calculate the guest kvmclock as the guest would do it. */
246 guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
247 guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
248 } while (0);
249
250 if (r) {
251 /*
252 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
253 *
254 * Also if the guest PV clock hasn't been set up yet, as is
255 * likely to be the case during migration when the vCPU has
256 * not been run yet. It would be possible to calculate the
257 * scaling factors properly in that case but there's not much
258 * point in doing so. The get_kvmclock_ns() drift accumulates
259 * over time, so it's OK to use it at startup. Besides, on
260 * migration there's going to be a little bit of skew in the
261 * precise moment at which timers fire anyway. Often they'll
262 * be in the "past" by the time the VM is running again after
263 * migration.
264 */
265 guest_now = get_kvmclock_ns(vcpu->kvm);
266 kernel_now = ktime_get();
267 }
268
269 delta = guest_abs - guest_now;
270
271 /*
272 * Xen has a 'Linux workaround' in do_set_timer_op() which checks for
273 * negative absolute timeout values (caused by integer overflow), and
274 * for values about 13 days in the future (2^50ns) which would be
275 * caused by jiffies overflow. For those cases, Xen sets the timeout
276 * 100ms in the future (not *too* soon, since if a guest really did
277 * set a long timeout on purpose we don't want to keep churning CPU
278 * time by waking it up). Emulate Xen's workaround when starting the
279 * timer in response to __HYPERVISOR_set_timer_op.
280 */
281 if (linux_wa &&
282 unlikely((int64_t)guest_abs < 0 ||
283 (delta > 0 && (uint32_t) (delta >> 50) != 0))) {
284 delta = 100 * NSEC_PER_MSEC;
285 guest_abs = guest_now + delta;
286 }
287
288 /*
289 * Avoid races with the old timer firing. Checking timer_expires
290 * to avoid calling hrtimer_cancel() will only have false positives
291 * so is fine.
292 */
293 if (vcpu->arch.xen.timer_expires)
294 hrtimer_cancel(&vcpu->arch.xen.timer);
295
296 atomic_set(&vcpu->arch.xen.timer_pending, 0);
297 vcpu->arch.xen.timer_expires = guest_abs;
298
299 if (delta <= 0)
300 xen_timer_callback(&vcpu->arch.xen.timer);
301 else
302 hrtimer_start(&vcpu->arch.xen.timer,
303 ktime_add_ns(kernel_now, delta),
304 HRTIMER_MODE_ABS_HARD);
305 }
306
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-18 0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
@ 2025-01-20 14:49 ` Vitaly Kuznetsov
2025-01-21 15:44 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Vitaly Kuznetsov @ 2025-01-20 14:49 UTC (permalink / raw)
To: Sean Christopherson, Sean Christopherson, Paolo Bonzini,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse
Sean Christopherson <seanjc@google.com> writes:
> When updating paravirtual clocks, setup the Hyper-V TSC page before
> Xen PV clocks. This will allow dropping xen_pvclock_tsc_unstable in favor
> of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9eabd70891dd..c68e7f7ba69d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> }
>
> + kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> +
> #ifdef CONFIG_KVM_XEN
> if (vcpu->xen.vcpu_info_cache.active)
> kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
> xen_pvclock_tsc_unstable);
> #endif
> - kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> return 0;
> }
"No functional change detected".
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
(What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
*some* of the PV clocks as stable and some as unstable to the same
guest, i.e. if it would make sense to disable Hyper-V TSC page when
KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too. I don't know if anyone
combines Xen and Hyper-V emulation capabilities for the same guest on
KVM though.)
--
Vitaly
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-20 14:49 ` Vitaly Kuznetsov
@ 2025-01-21 15:44 ` Sean Christopherson
2025-01-21 15:59 ` Paul Durrant
0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 15:44 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: Paolo Bonzini, David Woodhouse, Paul Durrant, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse
On Mon, Jan 20, 2025, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
>
> > When updating paravirtual clocks, setup the Hyper-V TSC page before
> > Xen PV clocks. This will allow dropping xen_pvclock_tsc_unstable in favor
> > of simply clearing PVCLOCK_TSC_STABLE_BIT in the reference flags.
> >
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/x86.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9eabd70891dd..c68e7f7ba69d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3280,6 +3280,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> > }
> >
> > + kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> > +
> > #ifdef CONFIG_KVM_XEN
> > if (vcpu->xen.vcpu_info_cache.active)
> > kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_info_cache,
> > @@ -3289,7 +3291,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > kvm_setup_guest_pvclock(&hv_clock, v, &vcpu->xen.vcpu_time_info_cache, 0,
> > xen_pvclock_tsc_unstable);
> > #endif
> > - kvm_hv_setup_tsc_page(v->kvm, &hv_clock);
> > return 0;
> > }
>
> "No functional change detected".
>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> (What I'm wondering is if (from mostly theoretical PoV) it's OK to pass
> *some* of the PV clocks as stable and some as unstable to the same
> guest, i.e. if it would make sense to disable Hyper-V TSC page when
> KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE too.
I think it's ok to keep the Hyper-V TSC page in this case. It's not that the Xen
PV clock is truly unstable, it's that some guests get tripped up by the STABLE
flag. A guest that can't handle the STABLE flag has bigger problems than the
existence of a completely unrelated clock that is implied to be stable.
> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> the same guest on KVM though.)
That someone would have to be quite "brave" :-D
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-21 15:44 ` Sean Christopherson
@ 2025-01-21 15:59 ` Paul Durrant
2025-01-21 17:16 ` David Woodhouse
0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 15:59 UTC (permalink / raw)
To: Sean Christopherson, Vitaly Kuznetsov
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse
On 21/01/2025 15:44, Sean Christopherson wrote:
[snip]
>
> I think it's ok to keep the Hyper-V TSC page in this case. It's not that the Xen
> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> flag. A guest that can't handle the STABLE flag has bigger problems than the
> existence of a completely unrelated clock that is implied to be stable.
>
Agreed.
>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>> the same guest on KVM though.)
>
> That someone would have to be quite "brave" :-D
Maybe :-)
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier
2025-01-18 0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
@ 2025-01-21 16:01 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> When queueing vCPU PVCLOCK updates in response to SUSPEND or HIBERNATE,
> don't take kvm->lock as doing so can trigger a largely theoretical
> deadlock, it is perfectly safe to iterate over the xarray of vCPUs without
> holding kvm->lock, and kvm->lock doesn't protect kvm_set_guest_paused() in
> any way (pv_time.active and pvclock_set_guest_stopped_request are
> protected by vcpu->mutex, not kvm->lock).
>
> Reported-by: syzbot+352e553a86e0d75f5120@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/677c0f36.050a0220.3b3668.0014.GAE@google.com
> Fixes: 7d62874f69d7 ("kvm: x86: implement KVM PM-notifier")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND
2025-01-18 0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
@ 2025-01-21 16:03 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop KVM's handling of kvm_set_guest_paused() failure when reacting to a
> SUSPEND notification, as kvm_set_guest_paused() only "fails" if the vCPU
> isn't using kvmclock, and KVM's notifier callback pre-checks that kvmclock
> is active. I.e. barring some bizarre edge case that shouldn't be treated
> as an error in the first place, kvm_arch_suspend_notifier() can't fail.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 20 +++++++-------------
> 1 file changed, 7 insertions(+), 13 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update()
2025-01-18 0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 16:05 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:05 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Drop the local pvclock_flags in kvm_guest_time_update(), the local variable
> is immediately shoved into the per-vCPU "cache", i.e. the local variable
> serves no purpose.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
2025-01-18 0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
@ 2025-01-21 16:42 ` Paul Durrant
2025-01-21 17:09 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:42 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Handle "guest stopped" propagation only for kvmclock, as the flag is set
> if and only if kvmclock is "active", i.e. can only be set for Xen PV clock
> if kvmclock *and* Xen PV clock are in-use by the guest, which creates very
> bizarre behavior for the guest.
>
> Simply restrict the flag to kvmclock, e.g. instead of trying to handle
> Xen PV clock, as propagation of PVCLOCK_GUEST_STOPPED was unintentionally
> added during a refactoring, and while Xen proper defines
> XEN_PVCLOCK_GUEST_STOPPED, there's no evidence that Xen guests actually
> support the flag.
Indeed. I can find no consumers of the flag.
>
> Check and clear pvclock_set_guest_stopped_request if and only if kvmclock
> is active to preserve the original behavior, i.e. keep the flag pending
> if kvmclock happens to be disabled when KVM processes the initial request.
>
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d8ee37dd2b57..3c4d210e8a9e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>
> - if (vcpu->pvclock_set_guest_stopped_request) {
> - vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> - vcpu->pvclock_set_guest_stopped_request = false;
> - }
> -
> memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>
> if (force_tsc_unstable)
> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> if (use_master_clock)
> vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>
> - if (vcpu->pv_time.active)
> + if (vcpu->pv_time.active) {
> + /*
> + * GUEST_STOPPED is only supported by kvmclock, and KVM's
> + * historic behavior is to only process the request if kvmclock
> + * is active/enabled.
> + */
> + if (vcpu->pvclock_set_guest_stopped_request) {
> + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> + vcpu->pvclock_set_guest_stopped_request = false;
> + }
> kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> +
> + vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
Is this intentional? The line above your change in
kvm_setup_guest_pvclock() clearly keeps the flag enabled if it already
set and, without this patch, I don't see anything clearing it.
> + }
> +
> #ifdef CONFIG_KVM_XEN
> if (vcpu->xen.vcpu_info_cache.active)
> kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
2025-01-18 0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
@ 2025-01-21 16:54 ` Paul Durrant
2025-01-21 17:11 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:54 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating a specific PV clock, make a full copy of KVM's reference
> copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> clock.
... but the line I queried in the previous patch squashes the flag
before the Xen PV clock is set up, so no bleed?
>
> Using a local copy of the pvclock structure also sets the stage for
> eliminating the per-vCPU copy/cache (only the TSC frequency information
> actually "needs" to be cached/persisted).
>
> Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c4d210e8a9e..5f3ad13a8ac7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> {
> struct kvm_vcpu_arch *vcpu = &v->arch;
> struct pvclock_vcpu_time_info *guest_hv_clock;
> + struct pvclock_vcpu_time_info hv_clock;
> unsigned long flags;
>
> + memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> +
> read_lock_irqsave(&gpc->lock, flags);
> while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> read_unlock_irqrestore(&gpc->lock, flags);
> @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> * it is consistent.
> */
>
> - guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> + guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
> smp_wmb();
>
> /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> - vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> + hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>
> - memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> + memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
>
> if (force_tsc_unstable)
> guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
>
> smp_wmb();
>
> - guest_hv_clock->version = ++vcpu->hv_clock.version;
> + guest_hv_clock->version = ++hv_clock.version;
>
> kvm_gpc_mark_dirty_in_slot(gpc);
> read_unlock_irqrestore(&gpc->lock, flags);
>
> - trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> + trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
> }
>
> static int kvm_guest_time_update(struct kvm_vcpu *v)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
2025-01-18 0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
2025-01-19 3:00 ` kernel test robot
@ 2025-01-21 16:58 ` Paul Durrant
2025-01-21 18:45 ` Sean Christopherson
1 sibling, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 16:58 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> reference copy may not be up-to-date, i.e. may yield a false positive of
> sorts. In the unlikely scenario that the guest is starting a Xen timer
> and has used a Xen pvclock in the past, but has since but turned it "off",
> then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> if and only if at least pvclock is enabled.
>
> Furthermore, vcpu->arch.hv_clock is currently used by three different
> pvclocks: kvmclock, Xen, and Xen compat. While it's extremely unlikely a
> guest would ever enable multiple pvclocks, effectively sharing KVM's
> reference clock could yield very weird behavior. Using the guest's active
> Xen pvclock instead of KVM's reference will allow dropping KVM's
> reference copy.
>
> Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 53 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index a909b817b9c0..b82c28223585 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
> return HRTIMER_NORESTART;
> }
>
> +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> + struct pvclock_vcpu_time_info *hv_clock,
> + struct gfn_to_pfn_cache *gpc,
> + unsigned int offset)
> +{
> + struct pvclock_vcpu_time_info *guest_hv_clock;
> + unsigned long flags;
> + int r;
> +
> + read_lock_irqsave(&gpc->lock, flags);
> + while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> + read_unlock_irqrestore(&gpc->lock, flags);
> +
> + r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> + if (r)
> + return r;
> +
> + read_lock_irqsave(&gpc->lock, flags);
> + }
> +
I guess I must be missing something subtle... What is setting
guest_hv_clock to point at something meaningful before this line?
> + memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
> + read_unlock_irqrestore(&gpc->lock, flags);
> +
> + /*
> + * Sanity check TSC shift+multiplier to verify the guest's view of time
> + * is more or less consistent.
> + */
> + if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
> + hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
> + return -EINVAL;
> + return 0;
> +}
> +
> static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
> bool linux_wa)
> {
> + struct kvm_vcpu_xen *xen;
> int64_t kernel_now, delta;
> uint64_t guest_now;
> + int r = -EOPNOTSUPP;
>
> /*
> * The guest provides the requested timeout in absolute nanoseconds
> @@ -173,10 +208,22 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
> * the absolute CLOCK_MONOTONIC time at which the timer should
> * fire.
> */
> - if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock &&
> - static_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> + do {
> + struct pvclock_vcpu_time_info hv_clock;
> uint64_t host_tsc, guest_tsc;
>
> + if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
> + !vcpu->kvm->arch.use_master_clock)
> + break;
> +
> + if (xen->vcpu_info_cache.active)
> + r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
> + offsetof(struct compat_vcpu_info, time));
> + else if (xen->vcpu_time_info_cache.active)
> + r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
> + if (r)
> + break;
> +
> if (!IS_ENABLED(CONFIG_64BIT) ||
> !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
> /*
> @@ -197,9 +244,10 @@ static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
>
> /* Calculate the guest kvmclock as the guest would do it. */
> guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
> - guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock,
> - guest_tsc);
> - } else {
> + guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
> + } while (0);
> +
> + if (r) {
> /*
> * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
> *
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock()
2025-01-18 0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
@ 2025-01-21 17:00 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:00 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Pass the reference pvclock structure that's used to setup each individual
> pvclock as a parameter to kvm_setup_guest_pvclock() as a preparatory step
> toward removing kvm_vcpu_arch.hv_clock.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock
2025-01-18 0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
@ 2025-01-21 17:03 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> Remove the per-vCPU "cache" of the reference pvclock and instead cache
> only the TSC shift+multiplier. All other fields in pvclock are fully
> recomputed by kvm_guest_time_update(), i.e. aren't actually persisted.
>
> In addition to shaving a few bytes, explicitly tracking the TSC shift/mul
> fields makes it easier to see that those fields are tied to hw_tsc_khz
> (they exist to avoid having to do expensive math in the common case).
> And conversely, not tracking the other fields makes it easier to see that
> things like the version number are pulled from the guest's copy, not from
> KVM's reference.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 3 ++-
> arch/x86/kvm/x86.c | 27 +++++++++++++++------------
> arch/x86/kvm/xen.c | 8 ++++----
> 3 files changed, 21 insertions(+), 17 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update()
2025-01-18 0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
@ 2025-01-21 17:05 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:05 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, David Woodhouse
Cc: kvm, linux-kernel, syzbot+352e553a86e0d75f5120, Paul Durrant,
David Woodhouse, Vitaly Kuznetsov
On 18/01/2025 00:55, Sean Christopherson wrote:
> When updating PV clocks, handle the Xen-specific UNSTABLE_TSC override in
> the main kvm_guest_time_update() by simply clearing PVCLOCK_TSC_STABLE_BIT
> in the flags of the reference pvclock structure. Expand the comment to
> (hopefully) make it obvious that Xen clocks need to be processed after all
> clocks that care about the TSC_STABLE flag.
>
> No functional change intended.
>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 35 +++++++++++++++--------------------
> 1 file changed, 15 insertions(+), 20 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
2025-01-21 16:42 ` Paul Durrant
@ 2025-01-21 17:09 ` Sean Christopherson
2025-01-21 17:15 ` Paul Durrant
0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:09 UTC (permalink / raw)
To: paul
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
Vitaly Kuznetsov
On Tue, Jan 21, 2025, Paul Durrant wrote:
> > ---
> > arch/x86/kvm/x86.c | 20 ++++++++++++++------
> > 1 file changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d8ee37dd2b57..3c4d210e8a9e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > - if (vcpu->pvclock_set_guest_stopped_request) {
> > - vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > - vcpu->pvclock_set_guest_stopped_request = false;
> > - }
> > -
> > memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > if (force_tsc_unstable)
> > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > if (use_master_clock)
> > vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > - if (vcpu->pv_time.active)
> > + if (vcpu->pv_time.active) {
> > + /*
> > + * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > + * historic behavior is to only process the request if kvmclock
> > + * is active/enabled.
> > + */
> > + if (vcpu->pvclock_set_guest_stopped_request) {
> > + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > + vcpu->pvclock_set_guest_stopped_request = false;
> > + }
> > kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > +
> > + vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>
> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> clearly keeps the flag enabled if it already set and, without this patch, I
> don't see anything clearing it.
Oh, I see what you're getting at. Hrm. Yes, clearing the flag is intentional,
otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
only for kvmclock).
Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
applied to the first active clock (kvmclock).
The only way I can think of to fully isolate the changes would be to split this
into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
4a would be quite ugly, because to avoid introducing a functional change, it
would need to be:
if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
vcpu->xen.vcpu_time_info_cache.active) {
vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
vcpu->pvclock_set_guest_stopped_request = false;
}
But it's not the worst intermediate code, so I'm not opposed to going that
route.
> > + }
> > +
> > #ifdef CONFIG_KVM_XEN
> > if (vcpu->xen.vcpu_info_cache.active)
> > kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks
2025-01-21 16:54 ` Paul Durrant
@ 2025-01-21 17:11 ` Sean Christopherson
0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 17:11 UTC (permalink / raw)
To: paul
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
Vitaly Kuznetsov
On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > When updating a specific PV clock, make a full copy of KVM's reference
> > copy/cache so that PVCLOCK_GUEST_STOPPED doesn't bleed across clocks.
> > E.g. in the unlikely scenario the guest has enabled both kvmclock and Xen
> > PV clock, a dangling GUEST_STOPPED in kvmclock would bleed into Xen PV
> > clock.
>
> ... but the line I queried in the previous patch squashes the flag before
> the Xen PV clock is set up, so no bleed?
Yeah, in practice, no bleed after the previous patch. But very theoretically,
there could be bleed if the guest set PVCLOCK_GUEST_STOPPED in the compat clock
*and* had both compat and non-compat Xen PV clocks active (is that even possible?)
> > Using a local copy of the pvclock structure also sets the stage for
> > eliminating the per-vCPU copy/cache (only the TSC frequency information
> > actually "needs" to be cached/persisted).
> >
> > Fixes: aa096aa0a05f ("KVM: x86/xen: setup pvclock updates")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/x86.c | 13 ++++++++-----
> > 1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3c4d210e8a9e..5f3ad13a8ac7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3123,8 +3123,11 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > {
> > struct kvm_vcpu_arch *vcpu = &v->arch;
> > struct pvclock_vcpu_time_info *guest_hv_clock;
> > + struct pvclock_vcpu_time_info hv_clock;
> > unsigned long flags;
> > + memcpy(&hv_clock, &vcpu->hv_clock, sizeof(hv_clock));
> > +
> > read_lock_irqsave(&gpc->lock, flags);
> > while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> > read_unlock_irqrestore(&gpc->lock, flags);
> > @@ -3144,25 +3147,25 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > * it is consistent.
> > */
> > - guest_hv_clock->version = vcpu->hv_clock.version = (guest_hv_clock->version + 1) | 1;
> > + guest_hv_clock->version = hv_clock.version = (guest_hv_clock->version + 1) | 1;
> > smp_wmb();
> > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > - vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > + hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > - memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > + memcpy(guest_hv_clock, &hv_clock, sizeof(*guest_hv_clock));
> > if (force_tsc_unstable)
> > guest_hv_clock->flags &= ~PVCLOCK_TSC_STABLE_BIT;
> > smp_wmb();
> > - guest_hv_clock->version = ++vcpu->hv_clock.version;
> > + guest_hv_clock->version = ++hv_clock.version;
> > kvm_gpc_mark_dirty_in_slot(gpc);
> > read_unlock_irqrestore(&gpc->lock, flags);
> > - trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> > + trace_kvm_pvclock_update(v->vcpu_id, &hv_clock);
> > }
> > static int kvm_guest_time_update(struct kvm_vcpu *v)
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
2025-01-21 17:09 ` Sean Christopherson
@ 2025-01-21 17:15 ` Paul Durrant
2025-01-21 18:32 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
Vitaly Kuznetsov
On 21/01/2025 17:09, Sean Christopherson wrote:
> On Tue, Jan 21, 2025, Paul Durrant wrote:
>>> ---
>>> arch/x86/kvm/x86.c | 20 ++++++++++++++------
>>> 1 file changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index d8ee37dd2b57..3c4d210e8a9e 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
>>> /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
>>> vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
>>> - if (vcpu->pvclock_set_guest_stopped_request) {
>>> - vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> - vcpu->pvclock_set_guest_stopped_request = false;
>>> - }
>>> -
>>> memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
>>> if (force_tsc_unstable)
>>> @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>> if (use_master_clock)
>>> vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
>>> - if (vcpu->pv_time.active)
>>> + if (vcpu->pv_time.active) {
>>> + /*
>>> + * GUEST_STOPPED is only supported by kvmclock, and KVM's
>>> + * historic behavior is to only process the request if kvmclock
>>> + * is active/enabled.
>>> + */
>>> + if (vcpu->pvclock_set_guest_stopped_request) {
>>> + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
>>> + vcpu->pvclock_set_guest_stopped_request = false;
>>> + }
>>> kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
>>> +
>>> + vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
>>
>> Is this intentional? The line above your change in kvm_setup_guest_pvclock()
>> clearly keeps the flag enabled if it already set and, without this patch, I
>> don't see anything clearing it.
>
> Oh, I see what you're getting at. Hrm. Yes, clearing the flag is intentional,
> otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> only for kvmclock).
>
> Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> applied to the first active clock (kvmclock).
>
> The only way I can think of to fully isolate the changes would be to split this
> into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
>
> 4a would be quite ugly, because to avoid introducing a functional change, it
> would need to be:
>
> if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> vcpu->xen.vcpu_time_info_cache.active) {
> vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> vcpu->pvclock_set_guest_stopped_request = false;
> }
>
> But it's not the worst intermediate code, so I'm not opposed to going that
> route.
>
What about putting this change after patch 7. Then you could take a
local copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and
so avoid bleeding the flag that way?
>>> + }
>>> +
>>> #ifdef CONFIG_KVM_XEN
>>> if (vcpu->xen.vcpu_info_cache.active)
>>> kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_info_cache,
>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-21 15:59 ` Paul Durrant
@ 2025-01-21 17:16 ` David Woodhouse
2025-01-21 17:30 ` Paul Durrant
0 siblings, 1 reply; 32+ messages in thread
From: David Woodhouse @ 2025-01-21 17:16 UTC (permalink / raw)
To: paul, Sean Christopherson, Vitaly Kuznetsov
Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
Paul Durrant
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]
On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
> On 21/01/2025 15:44, Sean Christopherson wrote:
> [snip]
> >
> > I think it's ok to keep the Hyper-V TSC page in this case. It's not that the Xen
> > PV clock is truly unstable, it's that some guests get tripped up by the STABLE
> > flag. A guest that can't handle the STABLE flag has bigger problems than the
> > existence of a completely unrelated clock that is implied to be stable.
> >
>
> Agreed.
>
> > > I don't know if anyone combines Xen and Hyper-V emulation capabilities for
> > > the same guest on KVM though.)
> >
> > That someone would have to be quite "brave" :-D
>
> Maybe :-)
Xen itself does offer some Hyper-V enlightenments, and we might
reasonably expect KVM-based hypervisors to offer the same. We
explicitly do account for the KVM CPUID leaves moving up to let the
Hyper-V ones exist.
I don't recall if Xen's Hyper-V support includes the TSC page though.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update)
2025-01-21 17:16 ` David Woodhouse
@ 2025-01-21 17:30 ` Paul Durrant
0 siblings, 0 replies; 32+ messages in thread
From: Paul Durrant @ 2025-01-21 17:30 UTC (permalink / raw)
To: David Woodhouse, Sean Christopherson, Vitaly Kuznetsov
Cc: Paolo Bonzini, kvm, linux-kernel, syzbot+352e553a86e0d75f5120,
Paul Durrant
On 21/01/2025 17:16, David Woodhouse wrote:
> On Tue, 2025-01-21 at 15:59 +0000, Paul Durrant wrote:
>> On 21/01/2025 15:44, Sean Christopherson wrote:
>> [snip]
>>>
>>> I think it's ok to keep the Hyper-V TSC page in this case. It's not that the Xen
>>> PV clock is truly unstable, it's that some guests get tripped up by the STABLE
>>> flag. A guest that can't handle the STABLE flag has bigger problems than the
>>> existence of a completely unrelated clock that is implied to be stable.
>>>
>>
>> Agreed.
>>
>>>> I don't know if anyone combines Xen and Hyper-V emulation capabilities for
>>>> the same guest on KVM though.)
>>>
>>> That someone would have to be quite "brave" :-D
>>
>> Maybe :-)
>
> Xen itself does offer some Hyper-V enlightenments, and we might
> reasonably expect KVM-based hypervisors to offer the same. We
> explicitly do account for the KVM CPUID leaves moving up to let the
> Hyper-V ones exist.
>
> I don't recall if Xen's Hyper-V support includes the TSC page though.
It does :-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock
2025-01-21 17:15 ` Paul Durrant
@ 2025-01-21 18:32 ` Sean Christopherson
0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:32 UTC (permalink / raw)
To: paul
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
Vitaly Kuznetsov
On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 21/01/2025 17:09, Sean Christopherson wrote:
> > On Tue, Jan 21, 2025, Paul Durrant wrote:
> > > > ---
> > > > arch/x86/kvm/x86.c | 20 ++++++++++++++------
> > > > 1 file changed, 14 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index d8ee37dd2b57..3c4d210e8a9e 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3150,11 +3150,6 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > > > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */
> > > > vcpu->hv_clock.flags |= (guest_hv_clock->flags & PVCLOCK_GUEST_STOPPED);
> > > > - if (vcpu->pvclock_set_guest_stopped_request) {
> > > > - vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > - vcpu->pvclock_set_guest_stopped_request = false;
> > > > - }
> > > > -
> > > > memcpy(guest_hv_clock, &vcpu->hv_clock, sizeof(*guest_hv_clock));
> > > > if (force_tsc_unstable)
> > > > @@ -3264,8 +3259,21 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > > > if (use_master_clock)
> > > > vcpu->hv_clock.flags |= PVCLOCK_TSC_STABLE_BIT;
> > > > - if (vcpu->pv_time.active)
> > > > + if (vcpu->pv_time.active) {
> > > > + /*
> > > > + * GUEST_STOPPED is only supported by kvmclock, and KVM's
> > > > + * historic behavior is to only process the request if kvmclock
> > > > + * is active/enabled.
> > > > + */
> > > > + if (vcpu->pvclock_set_guest_stopped_request) {
> > > > + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > > > + vcpu->pvclock_set_guest_stopped_request = false;
> > > > + }
> > > > kvm_setup_guest_pvclock(v, &vcpu->pv_time, 0, false);
> > > > +
> > > > + vcpu->hv_clock.flags &= ~PVCLOCK_GUEST_STOPPED;
> > >
> > > Is this intentional? The line above your change in kvm_setup_guest_pvclock()
> > > clearly keeps the flag enabled if it already set and, without this patch, I
> > > don't see anything clearing it.
> >
> > Oh, I see what you're getting at. Hrm. Yes, clearing the flag is intentional,
> > otherwise the patch wouldn't do what it claims to do (set PVCLOCK_GUEST_STOPPED
> > only for kvmclock).
> >
> > Swapping the order of this patch and the next patch ("don't bleed ...") doesn't
> > break the cycle because that would result in PVCLOCK_GUEST_STOPPED only being
> > applied to the first active clock (kvmclock).
> >
> > The only way I can think of to fully isolate the changes would be to split this
> > into two patches: (4a) hoist pvclock_set_guest_stopped_request processing into
> > kvm_guest_time_update() and (4b) apply it only to kvmclock, and then make the
> > ordering 4a, 5, 4b, i.e. "hoist", "don't bleed", "only kvmclock".
> >
> > 4a would be quite ugly, because to avoid introducing a functional change, it
> > would need to be:
> >
> > if (vcpu->pv_time.active || vcpu->xen.vcpu_info_cache.active ||
> > vcpu->xen.vcpu_time_info_cache.active) {
> > vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED;
> > vcpu->pvclock_set_guest_stopped_request = false;
> > }
> >
> > But it's not the worst intermediate code, so I'm not opposed to going that
> > route.
> >
>
> What about putting this change after patch 7. Then you could take a local
> copy of hv_clock in which you could set PVCLOCK_GUEST_STOPPED and so avoid
> bleeding the flag that way?
But to preserve the current behavior of setting PVCLOCK_GUEST_STOPPED for all
clocks, processing pvclock_set_guest_stopped_request needs to be moved out of
kvm_setup_guest_pvclock() before said helper can make a copy of the reference.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
2025-01-21 16:58 ` Paul Durrant
@ 2025-01-21 18:45 ` Sean Christopherson
0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-01-21 18:45 UTC (permalink / raw)
To: paul
Cc: Paolo Bonzini, David Woodhouse, kvm, linux-kernel,
syzbot+352e553a86e0d75f5120, Paul Durrant, David Woodhouse,
Vitaly Kuznetsov
On Tue, Jan 21, 2025, Paul Durrant wrote:
> On 18/01/2025 00:55, Sean Christopherson wrote:
> > Use the guest's copy of its pvclock when starting a Xen timer, as KVM's
> > reference copy may not be up-to-date, i.e. may yield a false positive of
> > sorts. In the unlikely scenario that the guest is starting a Xen timer
> > and has used a Xen pvclock in the past, but has since but turned it "off",
> > then vcpu->arch.hv_clock may be stale, as KVM's reference copy is updated
> > if and only if at least pvclock is enabled.
> >
> > Furthermore, vcpu->arch.hv_clock is currently used by three different
> > pvclocks: kvmclock, Xen, and Xen compat. While it's extremely unlikely a
> > guest would ever enable multiple pvclocks, effectively sharing KVM's
> > reference clock could yield very weird behavior. Using the guest's active
> > Xen pvclock instead of KVM's reference will allow dropping KVM's
> > reference copy.
> >
> > Fixes: 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> > Cc: Paul Durrant <pdurrant@amazon.com>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/xen.c | 58 ++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 53 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index a909b817b9c0..b82c28223585 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -150,11 +150,46 @@ static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer)
> > return HRTIMER_NORESTART;
> > }
> > +static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
> > + struct pvclock_vcpu_time_info *hv_clock,
> > + struct gfn_to_pfn_cache *gpc,
> > + unsigned int offset)
> > +{
> > + struct pvclock_vcpu_time_info *guest_hv_clock;
> > + unsigned long flags;
> > + int r;
> > +
> > + read_lock_irqsave(&gpc->lock, flags);
> > + while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
> > + read_unlock_irqrestore(&gpc->lock, flags);
> > +
> > + r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
> > + if (r)
> > + return r;
> > +
> > + read_lock_irqsave(&gpc->lock, flags);
> > + }
> > +
>
> I guess I must be missing something subtle... What is setting guest_hv_clock
> to point at something meaningful before this line?
Nope, you're not missing anything, this code is completely broken. As pointed
out by the kernel test bot, the caller is also busted, because the "xen" pointer
is never initialied.
struct kvm_vcpu_xen *xen;
...
do {
...
if (xen->vcpu_info_cache.active)
r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
offsetof(struct compat_vcpu_info, time));
else if (xen->vcpu_time_info_cache.active)
r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
if (r)
break;
}
I suspect the selftest passes because the @gpc passed to xen_get_guest_pvclock()
is garbage, which likely results in kvm_gpc_refresh() failing, and so KVM falls
backs to the less precise method:
if (r) {
/*
* Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
*
* Also if the guest PV clock hasn't been set up yet, as is
* likely to be the case during migration when the vCPU has
* not been run yet. It would be possible to calculate the
* scaling factors properly in that case but there's not much
* point in doing so. The get_kvmclock_ns() drift accumulates
* over time, so it's OK to use it at startup. Besides, on
* migration there's going to be a little bit of skew in the
* precise moment at which timers fire anyway. Often they'll
* be in the "past" by the time the VM is running again after
* migration.
*/
guest_now = get_kvmclock_ns(vcpu->kvm);
kernel_now = ktime_get();
}
Ugh. And the reason my build tests didn't catch this is because the only config
I test with KVM_XEN=y also has KASAN=y, which is incompatible with KVM_ERROR=y
(unless the global WERROR=y is enabled).
Time to punt KASAN=y to it's own Kconfig I guess...
I'll verify the happy path is actually being tested before posting v2.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
@ 2025-01-22 15:05 kernel test robot
0 siblings, 0 replies; 32+ messages in thread
From: kernel test robot @ 2025-01-22 15:05 UTC (permalink / raw)
To: oe-kbuild; +Cc: lkp, Dan Carpenter
BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20250118005552.2626804-7-seanjc@google.com>
References: <20250118005552.2626804-7-seanjc@google.com>
TO: Sean Christopherson <seanjc@google.com>
Hi Sean,
kernel test robot noticed the following build warnings:
[auto build test WARNING on eb723766b1030a23c38adf2348b7c3d1409d11f0]
url: https://github.com/intel-lab-lkp/linux/commits/Sean-Christopherson/KVM-x86-Don-t-take-kvm-lock-when-iterating-over-vCPUs-in-suspend-notifier/20250118-085939
base: eb723766b1030a23c38adf2348b7c3d1409d11f0
patch link: https://lore.kernel.org/r/20250118005552.2626804-7-seanjc%40google.com
patch subject: [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer
:::::: branch date: 5 days ago
:::::: commit date: 5 days ago
config: i386-randconfig-141-20250122 (https://download.01.org/0day-ci/archive/20250122/202501222250.KDXKa8hj-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202501222250.KDXKa8hj-lkp@intel.com/
New smatch warnings:
arch/x86/kvm/xen.c:173 xen_get_guest_pvclock() error: uninitialized symbol 'guest_hv_clock'.
arch/x86/kvm/xen.c:219 kvm_xen_start_timer() error: uninitialized symbol 'xen'.
Old smatch warnings:
include/linux/kvm_host.h:998 kvm_get_vcpu_by_id() warn: iterator 'i' not incremented
vim +/guest_hv_clock +173 arch/x86/kvm/xen.c
536395260582be Joao Martins 2022-03-03 152
4d6163d65c0021 Sean Christopherson 2025-01-17 153 static int xen_get_guest_pvclock(struct kvm_vcpu *vcpu,
4d6163d65c0021 Sean Christopherson 2025-01-17 154 struct pvclock_vcpu_time_info *hv_clock,
4d6163d65c0021 Sean Christopherson 2025-01-17 155 struct gfn_to_pfn_cache *gpc,
4d6163d65c0021 Sean Christopherson 2025-01-17 156 unsigned int offset)
4d6163d65c0021 Sean Christopherson 2025-01-17 157 {
4d6163d65c0021 Sean Christopherson 2025-01-17 158 struct pvclock_vcpu_time_info *guest_hv_clock;
4d6163d65c0021 Sean Christopherson 2025-01-17 159 unsigned long flags;
4d6163d65c0021 Sean Christopherson 2025-01-17 160 int r;
4d6163d65c0021 Sean Christopherson 2025-01-17 161
4d6163d65c0021 Sean Christopherson 2025-01-17 162 read_lock_irqsave(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17 163 while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) {
4d6163d65c0021 Sean Christopherson 2025-01-17 164 read_unlock_irqrestore(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17 165
4d6163d65c0021 Sean Christopherson 2025-01-17 166 r = kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock));
4d6163d65c0021 Sean Christopherson 2025-01-17 167 if (r)
4d6163d65c0021 Sean Christopherson 2025-01-17 168 return r;
4d6163d65c0021 Sean Christopherson 2025-01-17 169
4d6163d65c0021 Sean Christopherson 2025-01-17 170 read_lock_irqsave(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17 171 }
4d6163d65c0021 Sean Christopherson 2025-01-17 172
4d6163d65c0021 Sean Christopherson 2025-01-17 @173 memcpy(hv_clock, guest_hv_clock, sizeof(*hv_clock));
4d6163d65c0021 Sean Christopherson 2025-01-17 174 read_unlock_irqrestore(&gpc->lock, flags);
4d6163d65c0021 Sean Christopherson 2025-01-17 175
4d6163d65c0021 Sean Christopherson 2025-01-17 176 /*
4d6163d65c0021 Sean Christopherson 2025-01-17 177 * Sanity check TSC shift+multiplier to verify the guest's view of time
4d6163d65c0021 Sean Christopherson 2025-01-17 178 * is more or less consistent.
4d6163d65c0021 Sean Christopherson 2025-01-17 179 */
4d6163d65c0021 Sean Christopherson 2025-01-17 180 if (hv_clock->tsc_shift != vcpu->arch.hv_clock.tsc_shift ||
4d6163d65c0021 Sean Christopherson 2025-01-17 181 hv_clock->tsc_to_system_mul != vcpu->arch.hv_clock.tsc_to_system_mul)
4d6163d65c0021 Sean Christopherson 2025-01-17 182 return -EINVAL;
4d6163d65c0021 Sean Christopherson 2025-01-17 183 return 0;
4d6163d65c0021 Sean Christopherson 2025-01-17 184 }
4d6163d65c0021 Sean Christopherson 2025-01-17 185
451a707813aee2 David Woodhouse 2024-02-27 186 static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs,
451a707813aee2 David Woodhouse 2024-02-27 187 bool linux_wa)
536395260582be Joao Martins 2022-03-03 188 {
4d6163d65c0021 Sean Christopherson 2025-01-17 189 struct kvm_vcpu_xen *xen;
451a707813aee2 David Woodhouse 2024-02-27 190 int64_t kernel_now, delta;
451a707813aee2 David Woodhouse 2024-02-27 191 uint64_t guest_now;
4d6163d65c0021 Sean Christopherson 2025-01-17 192 int r = -EOPNOTSUPP;
451a707813aee2 David Woodhouse 2024-02-27 193
451a707813aee2 David Woodhouse 2024-02-27 194 /*
451a707813aee2 David Woodhouse 2024-02-27 195 * The guest provides the requested timeout in absolute nanoseconds
451a707813aee2 David Woodhouse 2024-02-27 196 * of the KVM clock — as *it* sees it, based on the scaled TSC and
451a707813aee2 David Woodhouse 2024-02-27 197 * the pvclock information provided by KVM.
451a707813aee2 David Woodhouse 2024-02-27 198 *
451a707813aee2 David Woodhouse 2024-02-27 199 * The kernel doesn't support hrtimers based on CLOCK_MONOTONIC_RAW
451a707813aee2 David Woodhouse 2024-02-27 200 * so use CLOCK_MONOTONIC. In the timescales covered by timers, the
451a707813aee2 David Woodhouse 2024-02-27 201 * difference won't matter much as there is no cumulative effect.
451a707813aee2 David Woodhouse 2024-02-27 202 *
451a707813aee2 David Woodhouse 2024-02-27 203 * Calculate the time for some arbitrary point in time around "now"
451a707813aee2 David Woodhouse 2024-02-27 204 * in terms of both kvmclock and CLOCK_MONOTONIC. Calculate the
451a707813aee2 David Woodhouse 2024-02-27 205 * delta between the kvmclock "now" value and the guest's requested
451a707813aee2 David Woodhouse 2024-02-27 206 * timeout, apply the "Linux workaround" described below, and add
451a707813aee2 David Woodhouse 2024-02-27 207 * the resulting delta to the CLOCK_MONOTONIC "now" value, to get
451a707813aee2 David Woodhouse 2024-02-27 208 * the absolute CLOCK_MONOTONIC time at which the timer should
451a707813aee2 David Woodhouse 2024-02-27 209 * fire.
451a707813aee2 David Woodhouse 2024-02-27 210 */
4d6163d65c0021 Sean Christopherson 2025-01-17 211 do {
4d6163d65c0021 Sean Christopherson 2025-01-17 212 struct pvclock_vcpu_time_info hv_clock;
451a707813aee2 David Woodhouse 2024-02-27 213 uint64_t host_tsc, guest_tsc;
451a707813aee2 David Woodhouse 2024-02-27 214
4d6163d65c0021 Sean Christopherson 2025-01-17 215 if (!static_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
4d6163d65c0021 Sean Christopherson 2025-01-17 216 !vcpu->kvm->arch.use_master_clock)
4d6163d65c0021 Sean Christopherson 2025-01-17 217 break;
4d6163d65c0021 Sean Christopherson 2025-01-17 218
4d6163d65c0021 Sean Christopherson 2025-01-17 @219 if (xen->vcpu_info_cache.active)
4d6163d65c0021 Sean Christopherson 2025-01-17 220 r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_info_cache,
4d6163d65c0021 Sean Christopherson 2025-01-17 221 offsetof(struct compat_vcpu_info, time));
4d6163d65c0021 Sean Christopherson 2025-01-17 222 else if (xen->vcpu_time_info_cache.active)
4d6163d65c0021 Sean Christopherson 2025-01-17 223 r = xen_get_guest_pvclock(vcpu, &hv_clock, &xen->vcpu_time_info_cache, 0);
4d6163d65c0021 Sean Christopherson 2025-01-17 224 if (r)
4d6163d65c0021 Sean Christopherson 2025-01-17 225 break;
4d6163d65c0021 Sean Christopherson 2025-01-17 226
451a707813aee2 David Woodhouse 2024-02-27 227 if (!IS_ENABLED(CONFIG_64BIT) ||
451a707813aee2 David Woodhouse 2024-02-27 228 !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) {
451a707813aee2 David Woodhouse 2024-02-27 229 /*
451a707813aee2 David Woodhouse 2024-02-27 230 * Don't fall back to get_kvmclock_ns() because it's
451a707813aee2 David Woodhouse 2024-02-27 231 * broken; it has a systemic error in its results
451a707813aee2 David Woodhouse 2024-02-27 232 * because it scales directly from host TSC to
451a707813aee2 David Woodhouse 2024-02-27 233 * nanoseconds, and doesn't scale first to guest TSC
451a707813aee2 David Woodhouse 2024-02-27 234 * and *then* to nanoseconds as the guest does.
451a707813aee2 David Woodhouse 2024-02-27 235 *
451a707813aee2 David Woodhouse 2024-02-27 236 * There is a small error introduced here because time
451a707813aee2 David Woodhouse 2024-02-27 237 * continues to elapse between the ktime_get() and the
451a707813aee2 David Woodhouse 2024-02-27 238 * subsequent rdtsc(). But not the systemic drift due
451a707813aee2 David Woodhouse 2024-02-27 239 * to get_kvmclock_ns().
451a707813aee2 David Woodhouse 2024-02-27 240 */
451a707813aee2 David Woodhouse 2024-02-27 241 kernel_now = ktime_get(); /* This is CLOCK_MONOTONIC */
451a707813aee2 David Woodhouse 2024-02-27 242 host_tsc = rdtsc();
451a707813aee2 David Woodhouse 2024-02-27 243 }
451a707813aee2 David Woodhouse 2024-02-27 244
451a707813aee2 David Woodhouse 2024-02-27 245 /* Calculate the guest kvmclock as the guest would do it. */
451a707813aee2 David Woodhouse 2024-02-27 246 guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
4d6163d65c0021 Sean Christopherson 2025-01-17 247 guest_now = __pvclock_read_cycles(&hv_clock, guest_tsc);
4d6163d65c0021 Sean Christopherson 2025-01-17 248 } while (0);
4d6163d65c0021 Sean Christopherson 2025-01-17 249
4d6163d65c0021 Sean Christopherson 2025-01-17 250 if (r) {
451a707813aee2 David Woodhouse 2024-02-27 251 /*
451a707813aee2 David Woodhouse 2024-02-27 252 * Without CONSTANT_TSC, get_kvmclock_ns() is the only option.
451a707813aee2 David Woodhouse 2024-02-27 253 *
451a707813aee2 David Woodhouse 2024-02-27 254 * Also if the guest PV clock hasn't been set up yet, as is
451a707813aee2 David Woodhouse 2024-02-27 255 * likely to be the case during migration when the vCPU has
451a707813aee2 David Woodhouse 2024-02-27 256 * not been run yet. It would be possible to calculate the
451a707813aee2 David Woodhouse 2024-02-27 257 * scaling factors properly in that case but there's not much
451a707813aee2 David Woodhouse 2024-02-27 258 * point in doing so. The get_kvmclock_ns() drift accumulates
451a707813aee2 David Woodhouse 2024-02-27 259 * over time, so it's OK to use it at startup. Besides, on
451a707813aee2 David Woodhouse 2024-02-27 260 * migration there's going to be a little bit of skew in the
451a707813aee2 David Woodhouse 2024-02-27 261 * precise moment at which timers fire anyway. Often they'll
451a707813aee2 David Woodhouse 2024-02-27 262 * be in the "past" by the time the VM is running again after
451a707813aee2 David Woodhouse 2024-02-27 263 * migration.
451a707813aee2 David Woodhouse 2024-02-27 264 */
451a707813aee2 David Woodhouse 2024-02-27 265 guest_now = get_kvmclock_ns(vcpu->kvm);
451a707813aee2 David Woodhouse 2024-02-27 266 kernel_now = ktime_get();
451a707813aee2 David Woodhouse 2024-02-27 267 }
451a707813aee2 David Woodhouse 2024-02-27 268
451a707813aee2 David Woodhouse 2024-02-27 269 delta = guest_abs - guest_now;
451a707813aee2 David Woodhouse 2024-02-27 270
451a707813aee2 David Woodhouse 2024-02-27 271 /*
451a707813aee2 David Woodhouse 2024-02-27 272 * Xen has a 'Linux workaround' in do_set_timer_op() which checks for
451a707813aee2 David Woodhouse 2024-02-27 273 * negative absolute timeout values (caused by integer overflow), and
451a707813aee2 David Woodhouse 2024-02-27 274 * for values about 13 days in the future (2^50ns) which would be
451a707813aee2 David Woodhouse 2024-02-27 275 * caused by jiffies overflow. For those cases, Xen sets the timeout
451a707813aee2 David Woodhouse 2024-02-27 276 * 100ms in the future (not *too* soon, since if a guest really did
451a707813aee2 David Woodhouse 2024-02-27 277 * set a long timeout on purpose we don't want to keep churning CPU
451a707813aee2 David Woodhouse 2024-02-27 278 * time by waking it up). Emulate Xen's workaround when starting the
451a707813aee2 David Woodhouse 2024-02-27 279 * timer in response to __HYPERVISOR_set_timer_op.
451a707813aee2 David Woodhouse 2024-02-27 280 */
451a707813aee2 David Woodhouse 2024-02-27 281 if (linux_wa &&
451a707813aee2 David Woodhouse 2024-02-27 282 unlikely((int64_t)guest_abs < 0 ||
451a707813aee2 David Woodhouse 2024-02-27 283 (delta > 0 && (uint32_t) (delta >> 50) != 0))) {
451a707813aee2 David Woodhouse 2024-02-27 284 delta = 100 * NSEC_PER_MSEC;
451a707813aee2 David Woodhouse 2024-02-27 285 guest_abs = guest_now + delta;
451a707813aee2 David Woodhouse 2024-02-27 286 }
451a707813aee2 David Woodhouse 2024-02-27 287
77c9b9dea4fb3e David Woodhouse 2023-09-30 288 /*
77c9b9dea4fb3e David Woodhouse 2023-09-30 289 * Avoid races with the old timer firing. Checking timer_expires
77c9b9dea4fb3e David Woodhouse 2023-09-30 290 * to avoid calling hrtimer_cancel() will only have false positives
77c9b9dea4fb3e David Woodhouse 2023-09-30 291 * so is fine.
77c9b9dea4fb3e David Woodhouse 2023-09-30 292 */
77c9b9dea4fb3e David Woodhouse 2023-09-30 293 if (vcpu->arch.xen.timer_expires)
77c9b9dea4fb3e David Woodhouse 2023-09-30 294 hrtimer_cancel(&vcpu->arch.xen.timer);
77c9b9dea4fb3e David Woodhouse 2023-09-30 295
536395260582be Joao Martins 2022-03-03 296 atomic_set(&vcpu->arch.xen.timer_pending, 0);
536395260582be Joao Martins 2022-03-03 297 vcpu->arch.xen.timer_expires = guest_abs;
536395260582be Joao Martins 2022-03-03 298
451a707813aee2 David Woodhouse 2024-02-27 299 if (delta <= 0)
536395260582be Joao Martins 2022-03-03 300 xen_timer_callback(&vcpu->arch.xen.timer);
451a707813aee2 David Woodhouse 2024-02-27 301 else
536395260582be Joao Martins 2022-03-03 302 hrtimer_start(&vcpu->arch.xen.timer,
451a707813aee2 David Woodhouse 2024-02-27 303 ktime_add_ns(kernel_now, delta),
536395260582be Joao Martins 2022-03-03 304 HRTIMER_MODE_ABS_HARD);
536395260582be Joao Martins 2022-03-03 305 }
536395260582be Joao Martins 2022-03-03 306
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-01-22 15:07 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-18 0:55 [PATCH 00/10] KVM: x86: pvclock fixes and cleanups Sean Christopherson
2025-01-18 0:55 ` [PATCH 01/10] KVM: x86: Don't take kvm->lock when iterating over vCPUs in suspend notifier Sean Christopherson
2025-01-21 16:01 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 02/10] KVM: x86: Eliminate "handling" of impossible errors during SUSPEND Sean Christopherson
2025-01-21 16:03 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 03/10] KVM: x86: Drop local pvclock_flags variable in kvm_guest_time_update() Sean Christopherson
2025-01-21 16:05 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 04/10] KVM: x86: Set PVCLOCK_GUEST_STOPPED only for kvmclock, not for Xen PV clock Sean Christopherson
2025-01-21 16:42 ` Paul Durrant
2025-01-21 17:09 ` Sean Christopherson
2025-01-21 17:15 ` Paul Durrant
2025-01-21 18:32 ` Sean Christopherson
2025-01-18 0:55 ` [PATCH 05/10] KVM: x86: Don't bleed PVCLOCK_GUEST_STOPPED across PV clocks Sean Christopherson
2025-01-21 16:54 ` Paul Durrant
2025-01-21 17:11 ` Sean Christopherson
2025-01-18 0:55 ` [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer Sean Christopherson
2025-01-19 3:00 ` kernel test robot
2025-01-21 16:58 ` Paul Durrant
2025-01-21 18:45 ` Sean Christopherson
2025-01-18 0:55 ` [PATCH 07/10] KVM: x86: Pass reference pvclock as a param to kvm_setup_guest_pvclock() Sean Christopherson
2025-01-21 17:00 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 08/10] KVM: x86: Remove per-vCPU "cache" of its reference pvclock Sean Christopherson
2025-01-21 17:03 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 09/10] KVM: x86: Setup Hyper-V TSC page before Xen PV clocks (during clock update) Sean Christopherson
2025-01-20 14:49 ` Vitaly Kuznetsov
2025-01-21 15:44 ` Sean Christopherson
2025-01-21 15:59 ` Paul Durrant
2025-01-21 17:16 ` David Woodhouse
2025-01-21 17:30 ` Paul Durrant
2025-01-18 0:55 ` [PATCH 10/10] KVM: x86: Override TSC_STABLE flag for Xen PV clocks in kvm_guest_time_update() Sean Christopherson
2025-01-21 17:05 ` Paul Durrant
-- strict thread matches above, loose matches on Subject: below --
2025-01-22 15:05 [PATCH 06/10] KVM: x86/xen: Use guest's copy of pvclock when starting timer kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.