From mboxrd@z Thu Jan 1 00:00:00 1970 From: Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH v2] KVM: x86: fix KVM_SET_CLOCK relative to setting correct clock value Date: Fri, 12 May 2017 19:37:12 +0200 Message-ID: <20170512173711.GA13226@potion> References: <20170502213616.GA24837@amt.cnet> <2499ef65-1dfe-8460-ec41-661b05cc5023@redhat.com> <20170503134341.GB10468@amt.cnet> <20170510180430.GA2240@potion> <20170511153903.GC2308@amt.cnet> <20170512141322.GC2173@potion> <20170512153101.GA1848@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: Paolo Bonzini , kvm-devel To: Marcelo Tosatti Return-path: Received: from mx1.redhat.com ([209.132.183.28]:42712 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752459AbdELRhQ (ORCPT ); Fri, 12 May 2017 13:37:16 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A32D74E34D for ; Fri, 12 May 2017 17:37:15 +0000 (UTC) Content-Disposition: inline In-Reply-To: <20170512153101.GA1848@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: 2017-05-12 12:31-0300, Marcelo Tosatti: > On Fri, May 12, 2017 at 04:13:23PM +0200, Radim Krčmář wrote: > > 2017-05-11 12:39-0300, Marcelo Tosatti: > > > On Wed, May 10, 2017 at 08:04:31PM +0200, Radim Krčmář wrote: > > > > 2017-05-03 10:43-0300, Marcelo Tosatti: > > > > and the important fix for kvm master clock is the move of > > > > kvm_gen_update_masterclock() before we read the time. > > > > > > The rest is just a minor optimization that also ignores time since > > > > master_kernel_ns() and therefore pins user_ns.clock to a slightly > > > > earlier time. > > > > > > > > But all attention was given to the "minor optimization" -- have I missed > > > > something about the direct use of ka->master_kernel_ns? > > > > > > I haven't attempted to optimize anything. Not sure what you mean. > > > > I mean, why doesn't the patch look like this? > d > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 464da936c53d..8db1d09e59d7 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -4175,9 +4175,10 @@ long kvm_arch_vm_ioctl(struct file *filp, > > goto out; > > > > r = 0; > > + kvm_gen_update_masterclock(kvm); > > now_ns = get_kvmclock_ns(kvm); > > kvm->arch.kvmclock_offset += user_ns.clock - now_ns; > > - kvm_gen_update_masterclock(kvm); > > + kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); > > break; > > now_ns = ka->master_kernel_ns + kvmclock_offset_prev + (grdtsc() - ka->master_cycle_now) > kvmclock_offset += user_ns.clock - (ka->master_kernel_ns + kvmclock_offset_prev + (grdtsc() - ka->master_cycle_now) > kvmclock_offset = kvmclock_offset_prev + user_ns.clock - (ka->master_kernel_ns + kvmclock_offset_prev + (grdtsc() - ka->master_cycle_now) > > In case of VM was just initialized before migration, kvmclock_offset_prev is -ktime_get_boot_ns() > > kvmclock_offset = -ktime_get_boot_ns() + user_ns.clock - (ka->master_kernel_ns -ktime_get_boot_ns() + grdtsc() - ka->master_cycle_now)) > > But master_kernel_ns = ktime_get_boot_ns() + delta-between-vm-init-and-KVM_SET_CLOCK (AKA delta) > (the same one from VM init) > > kvmclock_offset = -ktime_get_boot_ns() + user_ns.clock - (ktime_get_boot_ns() + delta + -ktime_get_boot_ns() + grdtsc() - ka->master_cycle_now)) > > kvmclock_offset = -ktime_get_boot_ns() + user_ns.clock - delta - grdtsc() + ka->master_cycle_now > > But we don't want grdtsc() - ka->master_cycle_now in there. > > Note: grdtsc() == guest read tsc. > > Now with > > + kvm->arch.kvmclock_offset = user_ns.clock - > + ka->master_kernel_ns; > > What happens is that guest clock starts counting, via kernel timekeeper, > at the moment kvm_get_time_and_clockread() runs. If you add grdtsc() - > ka->master_cycle_now in there, you are mindfully counting clock twice > (first: kernel timekeeper, second: the TSC between the (grdtsc() - > ka->master_cycle_now) in question. > > + kvm->arch.kvmclock_offset = -ktime_get_boot_ns() +user_ns.clock -delta > > Note that (grdtsc() - ka->master_cycle_now) is susceptible to scheduling > etc. > > Makes sense? Yes. The simpler code starts the kvmclock a bit later, but both are correct -- anything within KVM_SET_CLOCK runtime is. If we care about accuracy, then we should let userspace provide a (kernel timestamp, kvm timestamp) pair, so the value of kvmclock can really be controlled. Adding ugly optimizations to work around shortcomings of the API is going the wrong way ...