From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Dongli Zhang <dongli.zhang@oracle.com>,
Joe Jin <joe.jin@oracle.com>,
x86@kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, pbonzini@redhat.com,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically
Date: Mon, 16 Oct 2023 11:49:15 -0700 [thread overview]
Message-ID: <ZS2Fq5dr2CeZaBok@google.com> (raw)
In-Reply-To: <03afed7eb3c1e5f4b2b8ecfd8616ae5c6f1819e9.camel@infradead.org>
On Mon, Oct 16, 2023, David Woodhouse wrote:
> On Mon, 2023-10-16 at 08:47 -0700, Dongli Zhang wrote:
> > Suppose we are discussing a non-permanenet solution, I would suggest:
> >
> > 1. Document something to accept that kvm-clock (or pvclock on KVM, including Xen
> > on KVM) is not good enough in some cases, e.g., vCPU hotplug.
>
> I still don't understand the vCPU hotplug case.
>
> In the case where the TSC is actually sane, why would we need to reset
> the masterclock on vCPU hotplug?
>
> The new vCPU gets its TSC synchronised to the others, and its kvmclock
> parameters (mul/shift/offset based on the guest TSC) can be *precisely*
> the same as the other vCPUs too, can't they? Why reset anything?
Aha! I think I finally figured out why KVM behaves the way it does.
The unnecessary masterclock updates effectively came from:
commit 7f187922ddf6b67f2999a76dcb71663097b75497
Author: Marcelo Tosatti <mtosatti@redhat.com>
Date: Tue Nov 4 21:30:44 2014 -0200
KVM: x86: update masterclock values on TSC writes
When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.
Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.
Before that, KVM only re-synced the masterclock if it was enabled or disabled,
i.e. KVM behaved as we want it to behave. Note, at the time of the above commit,
VMX synchronized TSC on *guest* writes to MSR_IA32_TSC:
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
That got changed by commit 0c899c25d754 ("KVM: x86: do not attempt TSC synchronization
on guest writes"), but I don't think the guest angle is actually relevant to the
fix. AFAICT, a write from the host would suffer the same problem. But knowing
that KVM synced on guest writes is crucial to understanding the changelog.
In kvm_write_tsc(), except for KVM's wonderful "less than 1 second" hack, KVM
snapshotted the vCPU's current TSC *and* the current time in nanoseconds, where
kvm->arch.cur_tsc_nsec is the current host kernel time in nanoseconds.
ns = get_kernel_ns();
...
if (usdiff < USEC_PER_SEC &&
vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
...
} else {
/*
* We split periods of matched TSC writes into generations.
* For each generation, we track the original measured
* nanosecond time, offset, and write, so if TSCs are in
* sync, we can match exact offset, and if not, we can match
* exact software computation in compute_guest_tsc()
*
* These values are tracked in kvm->arch.cur_xxx variables.
*/
kvm->arch.cur_tsc_generation++;
kvm->arch.cur_tsc_nsec = ns;
kvm->arch.cur_tsc_write = data;
kvm->arch.cur_tsc_offset = offset;
matched = false;
pr_debug("kvm: new tsc generation %llu, clock %llu\n",
kvm->arch.cur_tsc_generation, data);
}
...
/* Keep track of which generation this VCPU has synchronized to */
vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
Note that the above sets matched to false! But because kvm_track_tsc_matching()
looks for matched+1, i.e. doesn't require the first vCPU to match itself, KVM
would immediately compute vcpus_matched as true for VMs with a single vCPU. As
a result, KVM would skip the masterlock update, even though a new TSC generation
was created.
vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
atomic_read(&vcpu->kvm->online_vcpus));
if (vcpus_matched && gtod->clock.vclock_mode == VCLOCK_TSC)
if (!ka->use_master_clock)
do_request = 1;
if (!vcpus_matched && ka->use_master_clock)
do_request = 1;
if (do_request)
kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
On hardware without TSC scaling support, vcpu->tsc_catchup is set to true if the
guest TSC frequency is faster than the host TSC frequency, even if the TSC is
otherwise stable. And for that mode, kvm_guest_time_update(), by way of
compute_guest_tsc(), uses vcpu->arch.this_tsc_nsec, a.k.a. the kernel time at the
last TSC write, to compute the guest TSC relative to kernel time:
static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
{
u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.this_tsc_nsec,
vcpu->arch.virtual_tsc_mult,
vcpu->arch.virtual_tsc_shift);
tsc += vcpu->arch.this_tsc_write;
return tsc;
}
Except the @kernel_ns passed to compute_guest_tsc() isn't the current kernel time,
it's the masterclock snapshot!
spin_lock(&ka->pvclock_gtod_sync_lock);
use_master_clock = ka->use_master_clock;
if (use_master_clock) {
host_tsc = ka->master_cycle_now;
kernel_ns = ka->master_kernel_ns;
}
spin_unlock(&ka->pvclock_gtod_sync_lock);
if (vcpu->tsc_catchup) {
u64 tsc = compute_guest_tsc(v, kernel_ns);
if (tsc > tsc_timestamp) {
adjust_tsc_offset_guest(v, tsc - tsc_timestamp);
tsc_timestamp = tsc;
}
}
And so the "kernel_ns-vcpu->arch.this_tsc_nsec" is *guaranteed* to generate a
negative value, because this_tsc_nsec was captured after ka->master_kernel_ns.
Forcing a masterclock update essentially fudged around that problem, but in a
heavy handed way that introduced undesirable side effects, i.e. unnecessarily
forces a masterclock update when a new vCPU joins the party via hotplug.
Compile tested only, but the below should fix the vCPU hotplug case. Then
someone (not me) just needs to figure out why kvm_xen_shared_info_init() forces
a masterclock update.
I still think we should clean up the periodic sync code, but I don't think we
need to periodically sync the masterclock.
---
arch/x86/kvm/x86.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c54e1133e0d3..f0a607b6fc31 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2510,26 +2510,29 @@ static inline int gtod_is_based_on_tsc(int mode)
}
#endif
-static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
+static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generation)
{
#ifdef CONFIG_X86_64
- bool vcpus_matched;
struct kvm_arch *ka = &vcpu->kvm->arch;
struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
- vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
- atomic_read(&vcpu->kvm->online_vcpus));
+ /*
+ * To use the masterclock, the host clocksource must be based on TSC
+ * and all vCPUs must have matching TSCs. Note, the count for matching
+ * vCPUs doesn't include the reference vCPU, hence "+1".
+ */
+ bool use_master_clock = (ka->nr_vcpus_matched_tsc + 1 ==
+ atomic_read(&vcpu->kvm->online_vcpus)) &&
+ gtod_is_based_on_tsc(gtod->clock.vclock_mode);
/*
- * Once the masterclock is enabled, always perform request in
- * order to update it.
- *
- * In order to enable masterclock, the host clocksource must be TSC
- * and the vcpus need to have matched TSCs. When that happens,
- * perform request to enable masterclock.
+ * Request a masterclock update if the masterclock needs to be toggled
+ * on/off, or when starting a new generation and the masterclock is
+ * enabled (compute_guest_tsc() requires the masterclock snaphot to be
+ * taken _after_ the new generation is created).
*/
- if (ka->use_master_clock ||
- (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched))
+ if ((ka->use_master_clock && new_generation) ||
+ (ka->use_master_clock != use_master_clock))
kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
@@ -2706,7 +2709,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
- kvm_track_tsc_matching(vcpu);
+ kvm_track_tsc_matching(vcpu, !matched);
}
static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
base-commit: dfdc8b7884b50e3bfa635292973b530a97689f12
--
next prev parent reply other threads:[~2023-10-16 18:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 23:06 [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically Dongli Zhang
2023-09-27 0:29 ` Joe Jin
2023-09-27 0:36 ` Dongli Zhang
2023-09-28 16:18 ` Sean Christopherson
2023-09-29 20:15 ` Dongli Zhang
2023-10-02 8:33 ` David Woodhouse
2023-10-02 16:37 ` Sean Christopherson
2023-10-02 17:17 ` Dongli Zhang
2023-10-02 18:18 ` Sean Christopherson
2023-10-02 21:06 ` Peter Zijlstra
2023-10-02 21:16 ` Peter Zijlstra
2023-10-02 18:16 ` David Woodhouse
2023-10-03 0:53 ` Sean Christopherson
2023-10-03 1:32 ` Dongli Zhang
2023-10-03 1:49 ` Sean Christopherson
2023-10-03 2:07 ` Dongli Zhang
2023-10-03 21:00 ` Sean Christopherson
2023-10-03 5:54 ` David Woodhouse
2023-10-04 0:04 ` Sean Christopherson
2023-10-04 10:01 ` David Woodhouse
2023-10-04 18:06 ` Sean Christopherson
2023-10-04 19:13 ` Dongli Zhang
2023-10-11 0:20 ` Sean Christopherson
2023-10-11 7:18 ` David Woodhouse
2023-10-13 18:07 ` Sean Christopherson
2023-10-13 18:21 ` David Woodhouse
2023-10-13 19:02 ` Sean Christopherson
2023-10-13 19:12 ` David Woodhouse
2023-10-13 20:03 ` Sean Christopherson
2023-10-13 20:12 ` Dongli Zhang
2023-10-13 23:26 ` Sean Christopherson
2023-10-14 9:49 ` David Woodhouse
2023-10-16 15:47 ` Dongli Zhang
2023-10-16 16:25 ` David Woodhouse
2023-10-16 17:04 ` Dongli Zhang
2023-10-16 18:49 ` Sean Christopherson [this message]
2023-10-16 22:04 ` Dongli Zhang
2023-10-16 22:48 ` Sean Christopherson
2023-10-17 16:18 ` Dongli Zhang
2023-10-03 9:12 ` David Woodhouse
2023-10-04 0:07 ` Sean Christopherson
2023-10-04 8:06 ` David Woodhouse
2023-10-03 14:29 ` David Woodhouse
2023-10-04 0:10 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZS2Fq5dr2CeZaBok@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dongli.zhang@oracle.com \
--cc=dwmw2@infradead.org \
--cc=joe.jin@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.