* [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call
@ 2021-03-29 11:47 Vitaly Kuznetsov
2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov
2021-03-29 11:48 ` [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Vitaly Kuznetsov
0 siblings, 2 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2021-03-29 11:47 UTC (permalink / raw)
To: kvm, Paolo Bonzini
Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti
Changes since v1:
- Fix the issue by casting 'hv_clock->system_time' to s64 in
compute_tsc_page_parameters() instead of clamping its value to zero in
kvm_guest_time_update() [Paolo]
Original description:
I discovered that after KVM_SET_CLOCK(0) TSC page value in the guest can
go through the roof and apparently we have a signedness issue when the
update is performed. Fix the issue and add a selftest.
Vitaly Kuznetsov (2):
KVM: x86: hyper-v: Properly divide maybe-negative
'hv_clock->system_time' in compute_tsc_page_parameters()
selftests: kvm: Check that TSC page value is small after
KVM_SET_CLOCK(0)
arch/x86/kvm/hyperv.c | 9 ++++++---
tools/testing/selftests/kvm/x86_64/hyperv_clock.c | 13 +++++++++++--
2 files changed, 17 insertions(+), 5 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-29 11:47 [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call Vitaly Kuznetsov @ 2021-03-29 11:47 ` Vitaly Kuznetsov 2021-03-29 17:24 ` Sean Christopherson 2021-03-30 13:12 ` Marcelo Tosatti 2021-03-29 11:48 ` [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Vitaly Kuznetsov 1 sibling, 2 replies; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-29 11:47 UTC (permalink / raw) To: kvm, Paolo Bonzini Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti When guest time is reset with KVM_SET_CLOCK(0), it is possible for hv_clock->system_time to become a small negative number. This happens because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, kvm_guest_time_update() does hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; And 'master_kernel_ns' represents the last time when masterclock got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a problem, the difference is very small, e.g. I'm observing hv_clock.system_time = -70 ns. The issue comes from the fact that 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in compute_tsc_page_parameters() becomes a very big number. Use div_s64() to get the proper result when dividing maybe-negative 'hv_clock.system_time' by 100. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- arch/x86/kvm/hyperv.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index f98370a39936..0529b892f634 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, hv_clock->tsc_to_system_mul, 100); - tsc_ref->tsc_offset = hv_clock->system_time; - do_div(tsc_ref->tsc_offset, 100); - tsc_ref->tsc_offset -= + /* + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative + * value here, thus div_s64(). + */ + tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) - mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); + return true; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov @ 2021-03-29 17:24 ` Sean Christopherson 2021-03-30 10:21 ` Vitaly Kuznetsov 2021-03-30 14:12 ` Marcelo Tosatti 2021-03-30 13:12 ` Marcelo Tosatti 1 sibling, 2 replies; 13+ messages in thread From: Sean Christopherson @ 2021-03-29 17:24 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti On Mon, Mar 29, 2021, Vitaly Kuznetsov wrote: > When guest time is reset with KVM_SET_CLOCK(0), it is possible for > hv_clock->system_time to become a small negative number. This happens > because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based > on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, > kvm_guest_time_update() does > > hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; > > And 'master_kernel_ns' represents the last time when masterclock > got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a > problem, the difference is very small, e.g. I'm observing > hv_clock.system_time = -70 ns. The issue comes from the fact that > 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in > compute_tsc_page_parameters() becomes a very big number. > > Use div_s64() to get the proper result when dividing maybe-negative > 'hv_clock.system_time' by 100. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/kvm/hyperv.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index f98370a39936..0529b892f634 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, > hv_clock->tsc_to_system_mul, > 100); > > - tsc_ref->tsc_offset = hv_clock->system_time; > - do_div(tsc_ref->tsc_offset, 100); > - tsc_ref->tsc_offset -= > + /* > + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative > + * value here, thus div_s64(). > + */ Will anything break if hv_clock.system_time is made a s64? > + tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) - > mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); > + > return true; > } > > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-29 17:24 ` Sean Christopherson @ 2021-03-30 10:21 ` Vitaly Kuznetsov 2021-03-30 14:12 ` Marcelo Tosatti 1 sibling, 0 replies; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-30 10:21 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson, Marcelo Tosatti Sean Christopherson <seanjc@google.com> writes: > On Mon, Mar 29, 2021, Vitaly Kuznetsov wrote: >> When guest time is reset with KVM_SET_CLOCK(0), it is possible for >> hv_clock->system_time to become a small negative number. This happens >> because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based >> on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, >> kvm_guest_time_update() does >> >> hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; >> >> And 'master_kernel_ns' represents the last time when masterclock >> got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a >> problem, the difference is very small, e.g. I'm observing >> hv_clock.system_time = -70 ns. The issue comes from the fact that >> 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in >> compute_tsc_page_parameters() becomes a very big number. >> >> Use div_s64() to get the proper result when dividing maybe-negative >> 'hv_clock.system_time' by 100. >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> --- >> arch/x86/kvm/hyperv.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c >> index f98370a39936..0529b892f634 100644 >> --- a/arch/x86/kvm/hyperv.c >> +++ b/arch/x86/kvm/hyperv.c >> @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, >> hv_clock->tsc_to_system_mul, >> 100); >> >> - tsc_ref->tsc_offset = hv_clock->system_time; >> - do_div(tsc_ref->tsc_offset, 100); >> - tsc_ref->tsc_offset -= >> + /* >> + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative >> + * value here, thus div_s64(). >> + */ > > Will anything break if hv_clock.system_time is made a s64? > I think no, but pvclock-abi.h says: " * These structs MUST NOT be changed. * They are the ABI between hypervisor and guest OS. * Both Xen and KVM are using this. " so I was kind of scared to suggest a change... -- Vitaly ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-29 17:24 ` Sean Christopherson 2021-03-30 10:21 ` Vitaly Kuznetsov @ 2021-03-30 14:12 ` Marcelo Tosatti 1 sibling, 0 replies; 13+ messages in thread From: Marcelo Tosatti @ 2021-03-30 14:12 UTC (permalink / raw) To: Sean Christopherson Cc: Vitaly Kuznetsov, kvm, Paolo Bonzini, Wanpeng Li, Jim Mattson On Mon, Mar 29, 2021 at 05:24:13PM +0000, Sean Christopherson wrote: > On Mon, Mar 29, 2021, Vitaly Kuznetsov wrote: > > When guest time is reset with KVM_SET_CLOCK(0), it is possible for > > hv_clock->system_time to become a small negative number. This happens > > because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based > > on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, > > kvm_guest_time_update() does > > > > hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; > > > > And 'master_kernel_ns' represents the last time when masterclock > > got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a > > problem, the difference is very small, e.g. I'm observing > > hv_clock.system_time = -70 ns. The issue comes from the fact that > > 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in > > compute_tsc_page_parameters() becomes a very big number. > > > > Use div_s64() to get the proper result when dividing maybe-negative > > 'hv_clock.system_time' by 100. > > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > --- > > arch/x86/kvm/hyperv.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > > index f98370a39936..0529b892f634 100644 > > --- a/arch/x86/kvm/hyperv.c > > +++ b/arch/x86/kvm/hyperv.c > > @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, > > hv_clock->tsc_to_system_mul, > > 100); > > > > - tsc_ref->tsc_offset = hv_clock->system_time; > > - do_div(tsc_ref->tsc_offset, 100); > > - tsc_ref->tsc_offset -= > > + /* > > + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative > > + * value here, thus div_s64(). > > + */ > > Will anything break if hv_clock.system_time is made a s64? IMHO hv_clock.system_time represents an unsigned value: system_time: a host notion of monotonic time, including sleep time at the time this structure was last updated. Unit is nanoseconds. Delta between values is not transmitted through this variable, so unclear what negative values would mean. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov 2021-03-29 17:24 ` Sean Christopherson @ 2021-03-30 13:12 ` Marcelo Tosatti 2021-03-30 14:44 ` Vitaly Kuznetsov 1 sibling, 1 reply; 13+ messages in thread From: Marcelo Tosatti @ 2021-03-30 13:12 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson Hi Vitaly, On Mon, Mar 29, 2021 at 01:47:59PM +0200, Vitaly Kuznetsov wrote: > When guest time is reset with KVM_SET_CLOCK(0), it is possible for > hv_clock->system_time to become a small negative number. This happens > because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based > on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, > kvm_guest_time_update() does > > hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; Hum, masterclock update should always precede KVM_SET_CLOCK() call. So when KVM_SET_CLOCK(0) is called, we'd like the guest (or the following): static __always_inline u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc) { u64 delta = tsc - src->tsc_timestamp; u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul, src->tsc_shift); return src->system_time + offset; } To return 0 (which won't happen if pvclock_data->system_time is being initialized to a negative value). KVM_SET_CLOCK(0) kvm_gen_update_masterclock(kvm); now_ns = get_kvmclock_ns(kvm); kvm->arch.kvmclock_offset += user_ns.clock - now_ns = -now_ns; hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; Wonder if the negative value happens due to the switch from masterclock=off -> on (get_kvmclock_ns reading kernel time directly). Perhaps this error is being added to clock when migration is performed. But just thinking out loud... > And 'master_kernel_ns' represents the last time when masterclock > got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a > problem, the difference is very small, e.g. I'm observing > hv_clock.system_time = -70 ns. The issue comes from the fact that > 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in > compute_tsc_page_parameters() becomes a very big number. Which it is (a very big number). > Use div_s64() to get the proper result when dividing maybe-negative > 'hv_clock.system_time' by 100. Well hv_clock.system_time is not negative. It is positive. > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/kvm/hyperv.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index f98370a39936..0529b892f634 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, > hv_clock->tsc_to_system_mul, > 100); > > - tsc_ref->tsc_offset = hv_clock->system_time; > - do_div(tsc_ref->tsc_offset, 100); > - tsc_ref->tsc_offset -= > + /* > + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative > + * value here, thus div_s64(). > + */ > + tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) - > mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); > + > return true; > } Won't the guest see: 0.1, 0.005, 0.0025, 0.0001, ..., 0, 0.0001, 0.0025, 0.005, 0.1, ... As system_time progresses from negative value to positive value with the above fix? While the fix seems OK in practice, perhaps the negative system_time could be fixed instead. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-30 13:12 ` Marcelo Tosatti @ 2021-03-30 14:44 ` Vitaly Kuznetsov 2021-03-30 15:44 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-30 14:44 UTC (permalink / raw) To: Marcelo Tosatti Cc: kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson Marcelo Tosatti <mtosatti@redhat.com> writes: > Hi Vitaly, > > On Mon, Mar 29, 2021 at 01:47:59PM +0200, Vitaly Kuznetsov wrote: >> When guest time is reset with KVM_SET_CLOCK(0), it is possible for >> hv_clock->system_time to become a small negative number. This happens >> because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based >> on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, >> kvm_guest_time_update() does >> >> hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; > > Hum, masterclock update should always precede KVM_SET_CLOCK() call. > > So when KVM_SET_CLOCK(0) is called, we'd like the guest (or the > following): > > static __always_inline > u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc) > { > u64 delta = tsc - src->tsc_timestamp; > u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul, > src->tsc_shift); > return src->system_time + offset; > } > > To return 0 (which won't happen if pvclock_data->system_time is being > initialized to a negative value). > > KVM_SET_CLOCK(0) > > kvm_gen_update_masterclock(kvm); > now_ns = get_kvmclock_ns(kvm); > kvm->arch.kvmclock_offset += user_ns.clock - now_ns = -now_ns; > > hv_clock.tsc_timestamp = ka->master_cycle_now; > hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; > > Wonder if the negative value happens due to the switch from > masterclock=off -> on (get_kvmclock_ns reading kernel time directly). > > Perhaps this error is being added to clock when migration is performed. > The issue is very easy to reproduce if you take the selftest (PATCH2 from this series), it is just a simple KVM_SET_CLOCK(0) call which makes Hyper-V TSC page value very very big. The order of events is: KVM_SET_CLOCK(0) is performed and we set now_ns = get_kvmclock_ns(kvm); kvm->arch.kvmclock_offset += user_ns.clock - now_ns; user_ns.clock is zero here. Note, we use get_kvmclock_ns() which gives the precise kvmclock value for the moment , to simplify things a bit: kvm->arch.kvmclock_offset = kvmclock_offset - master_kernel_ns - kvmclock_offset - pvclock_scale_delta() == -master_kernel_ns - pvclock_scale_delta() pvclock_scale_delta() is time since the begining of the time interval. Now we call kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE) and this leads to kvm_guest_time_update(). In kvm_guest_time_update() we do: if (use_master_clock) { host_tsc = ka->master_cycle_now; kernel_ns = ka->master_kernel_ns; } ... vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset; Note, we don't add the delta since the beginning of the time interval (assuming we are still on the same interval) so combined with the above: vcpu->hv_clock.system_time = master_kernel_ns - master_kernel_ns - pvclock_scale_delta() == -pvclock_scale_delta() and this is certainly a negative number. A very small one, but it's negative. We could've solved the problem by reducing the precision a bit and instead of doing now_ns = get_kvmclock_ns(kvm); in KVM_SET_CLOCK() handling we could do now_ns = ka->master_kernel_ns and that would make vcpu->hv_clock.system_time == 0 after kvm_guest_time_update() but it'd hurt 'normal' use-cases to fix the corner case. > But just thinking out loud... > >> And 'master_kernel_ns' represents the last time when masterclock >> got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a >> problem, the difference is very small, e.g. I'm observing >> hv_clock.system_time = -70 ns. The issue comes from the fact that >> 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in >> compute_tsc_page_parameters() becomes a very big number. > > Which it is (a very big number). > >> Use div_s64() to get the proper result when dividing maybe-negative >> 'hv_clock.system_time' by 100. > > Well hv_clock.system_time is not negative. It is positive. > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> --- >> arch/x86/kvm/hyperv.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c >> index f98370a39936..0529b892f634 100644 >> --- a/arch/x86/kvm/hyperv.c >> +++ b/arch/x86/kvm/hyperv.c >> @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, >> hv_clock->tsc_to_system_mul, >> 100); >> >> - tsc_ref->tsc_offset = hv_clock->system_time; >> - do_div(tsc_ref->tsc_offset, 100); >> - tsc_ref->tsc_offset -= >> + /* >> + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative >> + * value here, thus div_s64(). >> + */ >> + tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) - >> mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); >> + >> return true; >> } > > Won't the guest see: > > 0.1, 0.005, 0.0025, 0.0001, ..., 0, 0.0001, 0.0025, 0.005, 0.1, ... > > As system_time progresses from negative value to positive value with > the above fix? > > While the fix seems OK in practice, perhaps the negative system_time > could be fixed instead. Well, that was v1 of the patch where I suggested to clamp system_time value to 0 in kvm_guest_time_update() but Paolo talked me into this v2 :-) -- Vitaly ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-30 14:44 ` Vitaly Kuznetsov @ 2021-03-30 15:44 ` Paolo Bonzini 2021-03-31 6:29 ` Vitaly Kuznetsov 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2021-03-30 15:44 UTC (permalink / raw) To: Vitaly Kuznetsov, Marcelo Tosatti Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson On 30/03/21 16:44, Vitaly Kuznetsov wrote: > We could've solved the problem by reducing the precision a bit and > instead of doing > > now_ns = get_kvmclock_ns(kvm); > > in KVM_SET_CLOCK() handling we could do > > now_ns = ka->master_kernel_ns > > and that would make vcpu->hv_clock.system_time == 0 after > kvm_guest_time_update() but it'd hurt 'normal' use-cases to fix the > corner case. Marcelo is right, and I guess instead of a negative system time you *should* have a slightly larger tsc_timestamp that corresponds to a zero system_time. E.g. instead of -70 ns in the system time you'd have 210 more clock cycles in the tsc_timestamp and 0 in the system time. In the end it's impossible to achieve absolute precision; does the KVM_SET_CLOCK value refer to the nanosecond value before entering the kernel, or after, or what? The difference is much more than these 70 ns. So what you propose above seems to be really the best solution within the constraints of the KVM_SET_CLOCK API (a better API, which Maxim had started working on and I should revisit, would pass a TSC+nanosecond pair). On the other hand you'd have to take into account what happens if the masterclock is not in use, which would make things a bit more complex than what you sketched. If guests probably do not look at the system_time and just add it blindly to the result, then treating system_time as signed as in v2 is the easiest. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-30 15:44 ` Paolo Bonzini @ 2021-03-31 6:29 ` Vitaly Kuznetsov 2021-03-31 6:52 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-31 6:29 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti Paolo Bonzini <pbonzini@redhat.com> writes: > On 30/03/21 16:44, Vitaly Kuznetsov wrote: >> We could've solved the problem by reducing the precision a bit and >> instead of doing >> >> now_ns = get_kvmclock_ns(kvm); >> >> in KVM_SET_CLOCK() handling we could do >> >> now_ns = ka->master_kernel_ns >> >> and that would make vcpu->hv_clock.system_time == 0 after >> kvm_guest_time_update() but it'd hurt 'normal' use-cases to fix the >> corner case. > > Marcelo is right, and I guess instead of a negative system time you > *should* have a slightly larger tsc_timestamp that corresponds to a zero > system_time. E.g. instead of -70 ns in the system time you'd have 210 > more clock cycles in the tsc_timestamp and 0 in the system time. > > In the end it's impossible to achieve absolute precision; does the > KVM_SET_CLOCK value refer to the nanosecond value before entering the > kernel, or after, or what? The difference is much more than these 70 > ns. So what you propose above seems to be really the best solution > within the constraints of the KVM_SET_CLOCK API (a better API, which > Maxim had started working on and I should revisit, would pass a > TSC+nanosecond pair). I'm leaning towards making a v3 and depending on how complex it's going to look like we can decide which way to go. > > On the other hand you'd have to take into account what happens if the > masterclock is not in use, which would make things a bit more complex > than what you sketched. (I really wish we establish a plan to get rid of !masterclock logic some day ... ) > If guests probably do not look at the > system_time and just add it blindly to the result, then treating > system_time as signed as in v2 is the easiest. -- Vitaly ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-31 6:29 ` Vitaly Kuznetsov @ 2021-03-31 6:52 ` Paolo Bonzini 2021-03-31 9:59 ` Vitaly Kuznetsov 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2021-03-31 6:52 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti On 31/03/21 08:29, Vitaly Kuznetsov wrote: > I'm leaning towards making a v3 and depending on how complex it's going > to look like we can decide which way to go. As you prefer, but I would have no problem with committing v2 for now. From the point of view of system_time being a signed delta your v2 is not a regression. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-31 6:52 ` Paolo Bonzini @ 2021-03-31 9:59 ` Vitaly Kuznetsov 2021-03-31 10:58 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-31 9:59 UTC (permalink / raw) To: Paolo Bonzini, Marcelo Tosatti Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson Paolo Bonzini <pbonzini@redhat.com> writes: > On 31/03/21 08:29, Vitaly Kuznetsov wrote: >> I'm leaning towards making a v3 and depending on how complex it's going >> to look like we can decide which way to go. > > As you prefer, but I would have no problem with committing v2 for now. > From the point of view of system_time being a signed delta your v2 is > not a regression. Ok, I convinced myself this is correct: @@ -5744,8 +5745,22 @@ long kvm_arch_vm_ioctl(struct file *filp, * pvclock_update_vm_gtod_copy(). */ kvm_gen_update_masterclock(kvm); - now_ns = get_kvmclock_ns(kvm); - kvm->arch.kvmclock_offset += user_ns.clock - now_ns; + + /* + * This pairs with kvm_guest_time_update(): when masterclock is + * in use, we use master_kernel_ns + kvmclock_offset to set + * unsigned 'system_time' so if we use get_kvmclock_ns() (which + * is slightly ahead) here we risk going negative on unsigned + * 'system_time' when 'user_ns.clock' is very small. + */ + spin_lock(&ka->pvclock_gtod_sync_lock); + if (kvm->arch.use_master_clock) + now_ns = ka->master_kernel_ns; + else + now_ns = get_kvmclock_base_ns(); + ka->kvmclock_offset = user_ns.clock - now_ns; + spin_unlock(&ka->pvclock_gtod_sync_lock); + kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); break; } In !masterclock case kvm_guest_time_update() uses get_kvmclock_base_ns() (and get_kvmclock_ns() is just get_kvmclock_base_ns() + ka->kvmclock_offset) so we're good. Also, it looks like a good idea to put the whole calculation under spinlock here. Personally, I like this a little bit more than treating 'system_time' as signed, what do you guys think? -- Vitaly ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() 2021-03-31 9:59 ` Vitaly Kuznetsov @ 2021-03-31 10:58 ` Paolo Bonzini 0 siblings, 0 replies; 13+ messages in thread From: Paolo Bonzini @ 2021-03-31 10:58 UTC (permalink / raw) To: Vitaly Kuznetsov, Marcelo Tosatti Cc: kvm, Sean Christopherson, Wanpeng Li, Jim Mattson On 31/03/21 11:59, Vitaly Kuznetsov wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: >> On 31/03/21 08:29, Vitaly Kuznetsov wrote: >>> I'm leaning towards making a v3 and depending on how complex it's going >>> to look like we can decide which way to go. > > Ok, I convinced myself this is correct: > > @@ -5744,8 +5745,22 @@ long kvm_arch_vm_ioctl(struct file *filp, > * pvclock_update_vm_gtod_copy(). > */ > kvm_gen_update_masterclock(kvm); > - now_ns = get_kvmclock_ns(kvm); > - kvm->arch.kvmclock_offset += user_ns.clock - now_ns; > + > + /* > + * This pairs with kvm_guest_time_update(): when masterclock is > + * in use, we use master_kernel_ns + kvmclock_offset to set > + * unsigned 'system_time' so if we use get_kvmclock_ns() (which > + * is slightly ahead) here we risk going negative on unsigned > + * 'system_time' when 'user_ns.clock' is very small. > + */ > + spin_lock(&ka->pvclock_gtod_sync_lock); > + if (kvm->arch.use_master_clock) > + now_ns = ka->master_kernel_ns; > + else > + now_ns = get_kvmclock_base_ns(); > + ka->kvmclock_offset = user_ns.clock - now_ns; > + spin_unlock(&ka->pvclock_gtod_sync_lock); > + > kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); > break; > } > > In !masterclock case kvm_guest_time_update() uses get_kvmclock_base_ns() > (and get_kvmclock_ns() is just get_kvmclock_base_ns() + ka->kvmclock_offset) > so we're good. Also, it looks like a good idea to put the whole > calculation under spinlock here. Yes, sounds good. Note that I posted a series that changes pvclock_gtod_sync_lock to use spin_lock_irqsave/spin_unlock_irqrestore, so please base your patch on those and switch to spin_lock_irq. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) 2021-03-29 11:47 [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call Vitaly Kuznetsov 2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov @ 2021-03-29 11:48 ` Vitaly Kuznetsov 1 sibling, 0 replies; 13+ messages in thread From: Vitaly Kuznetsov @ 2021-03-29 11:48 UTC (permalink / raw) To: kvm, Paolo Bonzini Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Marcelo Tosatti Add a test for the issue when KVM_SET_CLOCK(0) call could cause TSC page value to go very big because of a signedness issue around hv_clock->system_time. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- tools/testing/selftests/kvm/x86_64/hyperv_clock.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c index ffbc4555c6e2..7f1d2765572c 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c @@ -80,19 +80,24 @@ static inline void check_tsc_msr_rdtsc(void) GUEST_ASSERT(delta_ns * 100 < (t2 - t1) * 100); } +static inline u64 get_tscpage_ts(struct ms_hyperv_tsc_page *tsc_page) +{ + return mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset; +} + static inline void check_tsc_msr_tsc_page(struct ms_hyperv_tsc_page *tsc_page) { u64 r1, r2, t1, t2; /* Compare TSC page clocksource with HV_X64_MSR_TIME_REF_COUNT */ - t1 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset; + t1 = get_tscpage_ts(tsc_page); r1 = rdmsr(HV_X64_MSR_TIME_REF_COUNT); /* 10 ms tolerance */ GUEST_ASSERT(r1 >= t1 && r1 - t1 < 100000); nop_loop(); - t2 = mul_u64_u64_shr64(rdtsc(), tsc_page->tsc_scale) + tsc_page->tsc_offset; + t2 = get_tscpage_ts(tsc_page); r2 = rdmsr(HV_X64_MSR_TIME_REF_COUNT); GUEST_ASSERT(r2 >= t1 && r2 - t2 < 100000); } @@ -130,7 +135,11 @@ static void guest_main(struct ms_hyperv_tsc_page *tsc_page, vm_paddr_t tsc_page_ tsc_offset = tsc_page->tsc_offset; /* Call KVM_SET_CLOCK from userspace, check that TSC page was updated */ + GUEST_SYNC(7); + /* Sanity check TSC page timestamp, it should be close to 0 */ + GUEST_ASSERT(get_tscpage_ts(tsc_page) < 100000); + GUEST_ASSERT(tsc_page->tsc_offset != tsc_offset); nop_loop(); -- 2.30.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-03-31 10:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-29 11:47 [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call Vitaly Kuznetsov 2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov 2021-03-29 17:24 ` Sean Christopherson 2021-03-30 10:21 ` Vitaly Kuznetsov 2021-03-30 14:12 ` Marcelo Tosatti 2021-03-30 13:12 ` Marcelo Tosatti 2021-03-30 14:44 ` Vitaly Kuznetsov 2021-03-30 15:44 ` Paolo Bonzini 2021-03-31 6:29 ` Vitaly Kuznetsov 2021-03-31 6:52 ` Paolo Bonzini 2021-03-31 9:59 ` Vitaly Kuznetsov 2021-03-31 10:58 ` Paolo Bonzini 2021-03-29 11:48 ` [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Vitaly Kuznetsov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox