From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters()
Date: Tue, 30 Mar 2021 16:44:29 +0200 [thread overview]
Message-ID: <87ft0cu2eq.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20210330131236.GA5932@fuller.cnet>
Marcelo Tosatti <mtosatti@redhat.com> writes:
> Hi Vitaly,
>
> On Mon, Mar 29, 2021 at 01:47:59PM +0200, Vitaly Kuznetsov wrote:
>> When guest time is reset with KVM_SET_CLOCK(0), it is possible for
>> hv_clock->system_time to become a small negative number. This happens
>> because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based
>> on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
>> kvm_guest_time_update() does
>>
>> hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
>
> Hum, masterclock update should always precede KVM_SET_CLOCK() call.
>
> So when KVM_SET_CLOCK(0) is called, we'd like the guest (or the
> following):
>
> static __always_inline
> u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc)
> {
> u64 delta = tsc - src->tsc_timestamp;
> u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
> src->tsc_shift);
> return src->system_time + offset;
> }
>
> To return 0 (which won't happen if pvclock_data->system_time is being
> initialized to a negative value).
>
> KVM_SET_CLOCK(0)
>
> kvm_gen_update_masterclock(kvm);
> now_ns = get_kvmclock_ns(kvm);
> kvm->arch.kvmclock_offset += user_ns.clock - now_ns = -now_ns;
>
> hv_clock.tsc_timestamp = ka->master_cycle_now;
> hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
>
> Wonder if the negative value happens due to the switch from
> masterclock=off -> on (get_kvmclock_ns reading kernel time directly).
>
> Perhaps this error is being added to clock when migration is performed.
>
The issue is very easy to reproduce if you take the selftest (PATCH2
from this series), it is just a simple KVM_SET_CLOCK(0) call which makes
Hyper-V TSC page value very very big.
The order of events is:
KVM_SET_CLOCK(0) is performed and we set
now_ns = get_kvmclock_ns(kvm);
kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
user_ns.clock is zero here. Note, we use get_kvmclock_ns() which gives
the precise kvmclock value for the moment , to simplify things a bit:
kvm->arch.kvmclock_offset = kvmclock_offset - master_kernel_ns -
kvmclock_offset - pvclock_scale_delta() == -master_kernel_ns - pvclock_scale_delta()
pvclock_scale_delta() is time since the begining of the time
interval.
Now we call kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE) and
this leads to kvm_guest_time_update(). In kvm_guest_time_update() we do:
if (use_master_clock) {
host_tsc = ka->master_cycle_now;
kernel_ns = ka->master_kernel_ns;
}
...
vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
Note, we don't add the delta since the beginning of the time interval
(assuming we are still on the same interval) so combined with the
above:
vcpu->hv_clock.system_time = master_kernel_ns - master_kernel_ns -
pvclock_scale_delta() == -pvclock_scale_delta()
and this is certainly a negative number. A very small one, but it's
negative.
We could've solved the problem by reducing the precision a bit and
instead of doing
now_ns = get_kvmclock_ns(kvm);
in KVM_SET_CLOCK() handling we could do
now_ns = ka->master_kernel_ns
and that would make vcpu->hv_clock.system_time == 0 after
kvm_guest_time_update() but it'd hurt 'normal' use-cases to fix the
corner case.
> But just thinking out loud...
>
>> And 'master_kernel_ns' represents the last time when masterclock
>> got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
>> problem, the difference is very small, e.g. I'm observing
>> hv_clock.system_time = -70 ns. The issue comes from the fact that
>> 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
>> compute_tsc_page_parameters() becomes a very big number.
>
> Which it is (a very big number).
>
>> Use div_s64() to get the proper result when dividing maybe-negative
>> 'hv_clock.system_time' by 100.
>
> Well hv_clock.system_time is not negative. It is positive.
>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> arch/x86/kvm/hyperv.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index f98370a39936..0529b892f634 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock,
>> hv_clock->tsc_to_system_mul,
>> 100);
>>
>> - tsc_ref->tsc_offset = hv_clock->system_time;
>> - do_div(tsc_ref->tsc_offset, 100);
>> - tsc_ref->tsc_offset -=
>> + /*
>> + * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative
>> + * value here, thus div_s64().
>> + */
>> + tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) -
>> mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64);
>> +
>> return true;
>> }
>
> Won't the guest see:
>
> 0.1, 0.005, 0.0025, 0.0001, ..., 0, 0.0001, 0.0025, 0.005, 0.1, ...
>
> As system_time progresses from negative value to positive value with
> the above fix?
>
> While the fix seems OK in practice, perhaps the negative system_time
> could be fixed instead.
Well, that was v1 of the patch where I suggested to clamp system_time
value to 0 in kvm_guest_time_update() but Paolo talked me into this v2
:-)
--
Vitaly
next prev parent reply other threads:[~2021-03-30 14:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-29 11:47 [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call Vitaly Kuznetsov
2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov
2021-03-29 17:24 ` Sean Christopherson
2021-03-30 10:21 ` Vitaly Kuznetsov
2021-03-30 14:12 ` Marcelo Tosatti
2021-03-30 13:12 ` Marcelo Tosatti
2021-03-30 14:44 ` Vitaly Kuznetsov [this message]
2021-03-30 15:44 ` Paolo Bonzini
2021-03-31 6:29 ` Vitaly Kuznetsov
2021-03-31 6:52 ` Paolo Bonzini
2021-03-31 9:59 ` Vitaly Kuznetsov
2021-03-31 10:58 ` Paolo Bonzini
2021-03-29 11:48 ` [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Vitaly Kuznetsov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ft0cu2eq.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox