Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters()
Date: Tue, 30 Mar 2021 16:44:29 +0200	[thread overview]
Message-ID: <87ft0cu2eq.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20210330131236.GA5932@fuller.cnet>

Marcelo Tosatti <mtosatti@redhat.com> writes:

> Hi Vitaly,
>
> On Mon, Mar 29, 2021 at 01:47:59PM +0200, Vitaly Kuznetsov wrote:
>> When guest time is reset with KVM_SET_CLOCK(0), it is possible for
>> hv_clock->system_time to become a small negative number. This happens
>> because in KVM_SET_CLOCK handling we set kvm->arch.kvmclock_offset based
>> on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
>> kvm_guest_time_update() does
>> 
>> hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
>
> Hum, masterclock update should always precede KVM_SET_CLOCK() call.
>
> So when KVM_SET_CLOCK(0) is called, we'd like the guest (or the
> following):
>
> static __always_inline
> u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc)
> {
>         u64 delta = tsc - src->tsc_timestamp;
>         u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
>                                              src->tsc_shift);
>         return src->system_time + offset;
> }
>
> To return 0 (which won't happen if pvclock_data->system_time is being
> initialized to a negative value).
>
> 	KVM_SET_CLOCK(0)
>
>  	kvm_gen_update_masterclock(kvm);
> 	now_ns = get_kvmclock_ns(kvm);
> 	kvm->arch.kvmclock_offset += user_ns.clock - now_ns = -now_ns; 
>
> 	hv_clock.tsc_timestamp = ka->master_cycle_now;
>         hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
>
> Wonder if the negative value happens due to the switch from
> masterclock=off -> on (get_kvmclock_ns reading kernel time directly).
>
> Perhaps this error is being added to clock when migration is performed.
>

The issue is very easy to reproduce if you take the selftest (PATCH2
from this series), it is just a simple KVM_SET_CLOCK(0) call which makes
Hyper-V TSC page value very very big.

The order of events is:

KVM_SET_CLOCK(0) is performed and we set 

	        now_ns = get_kvmclock_ns(kvm);
                kvm->arch.kvmclock_offset += user_ns.clock - now_ns;

user_ns.clock is zero here. Note, we use get_kvmclock_ns() which gives
the precise kvmclock value for the moment , to simplify things a bit:

 kvm->arch.kvmclock_offset = kvmclock_offset - master_kernel_ns -
 kvmclock_offset - pvclock_scale_delta() == -master_kernel_ns - pvclock_scale_delta()
 
pvclock_scale_delta() is time since the begining of the time
interval.

Now we call kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE) and
this leads to kvm_guest_time_update(). In kvm_guest_time_update() we do:

	if (use_master_clock) {
                host_tsc = ka->master_cycle_now;
                kernel_ns = ka->master_kernel_ns;
        }
        ...
        vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;

Note, we don't add the delta since the beginning of the time interval
(assuming we are still on the same interval) so combined with the
above:

vcpu->hv_clock.system_time = master_kernel_ns - master_kernel_ns -
pvclock_scale_delta() == -pvclock_scale_delta()

and this is certainly a negative number. A very small one, but it's
negative.

We could've solved the problem by reducing the precision a bit and
instead of doing 

 now_ns = get_kvmclock_ns(kvm);

in KVM_SET_CLOCK() handling we could do

 now_ns = ka->master_kernel_ns

and that would make vcpu->hv_clock.system_time == 0 after
kvm_guest_time_update() but it'd hurt 'normal' use-cases to fix the
corner case.

> But just thinking out loud...
>
>> And 'master_kernel_ns' represents the last time when masterclock
>> got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
>> problem, the difference is very small, e.g. I'm observing
>> hv_clock.system_time = -70 ns. The issue comes from the fact that
>> 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
>> compute_tsc_page_parameters() becomes a very big number.
>
> Which it is (a very big number).
>
>> Use div_s64() to get the proper result when dividing maybe-negative
>> 'hv_clock.system_time' by 100.
>
> Well hv_clock.system_time is not negative. It is positive.
>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/hyperv.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index f98370a39936..0529b892f634 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1070,10 +1070,13 @@ static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock,
>>  				hv_clock->tsc_to_system_mul,
>>  				100);
>>  
>> -	tsc_ref->tsc_offset = hv_clock->system_time;
>> -	do_div(tsc_ref->tsc_offset, 100);
>> -	tsc_ref->tsc_offset -=
>> +	/*
>> +	 * Note: 'hv_clock->system_time' despite being 'u64' can hold a negative
>> +	 * value here, thus div_s64().
>> +	 */
>> +	tsc_ref->tsc_offset = div_s64(hv_clock->system_time, 100) -
>>  		mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64);
>> +
>>  	return true;
>>  }
>
> Won't the guest see:
>
> 0.1, 0.005, 0.0025, 0.0001, ..., 0, 0.0001, 0.0025, 0.005, 0.1, ...
>
> As system_time progresses from negative value to positive value with
> the above fix?
>
> While the fix seems OK in practice, perhaps the negative system_time 
> could be fixed instead.

Well, that was v1 of the patch where I suggested to clamp system_time
value to 0 in kvm_guest_time_update() but Paolo talked me into this v2
:-)

-- 
Vitaly

next prev parent reply	other threads:[~2021-03-30 14:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 11:47 [PATCH v2 0/2] KVM: x86: hyper-v: Fix TSC page update after KVM_SET_CLOCK(0) call Vitaly Kuznetsov
2021-03-29 11:47 ` [PATCH v2 1/2] KVM: x86: hyper-v: Properly divide maybe-negative 'hv_clock->system_time' in compute_tsc_page_parameters() Vitaly Kuznetsov
2021-03-29 17:24   ` Sean Christopherson
2021-03-30 10:21     ` Vitaly Kuznetsov
2021-03-30 14:12     ` Marcelo Tosatti
2021-03-30 13:12   ` Marcelo Tosatti
2021-03-30 14:44     ` Vitaly Kuznetsov [this message]
2021-03-30 15:44       ` Paolo Bonzini
2021-03-31  6:29         ` Vitaly Kuznetsov
2021-03-31  6:52           ` Paolo Bonzini
2021-03-31  9:59             ` Vitaly Kuznetsov
2021-03-31 10:58               ` Paolo Bonzini
2021-03-29 11:48 ` [PATCH v2 2/2] selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ft0cu2eq.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.