From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH v2] KVM: x86: fix KVM_SET_CLOCK relative to setting correct clock value Date: Mon, 15 May 2017 18:06:15 -0300 Message-ID: <20170515210615.GA4764@amt.cnet> References: <20170502213616.GA24837@amt.cnet> <2499ef65-1dfe-8460-ec41-661b05cc5023@redhat.com> <20170503134341.GB10468@amt.cnet> <20170510180430.GA2240@potion> <20170511153903.GC2308@amt.cnet> <20170512141322.GC2173@potion> <20170512153101.GA1848@amt.cnet> <20170512173711.GA13226@potion> <20170513034648.GA20368@amt.cnet> <20170515161956.GA3224@potion> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: Paolo Bonzini , kvm-devel To: Radim =?utf-8?B?S3LEjW3DocWZ?= Return-path: Received: from mx1.redhat.com ([209.132.183.28]:49976 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751413AbdEOVGk (ORCPT ); Mon, 15 May 2017 17:06:40 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A5B04C0C6C20 for ; Mon, 15 May 2017 21:06:39 +0000 (UTC) Content-Disposition: inline In-Reply-To: <20170515161956.GA3224@potion> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, May 15, 2017 at 06:19:57PM +0200, Radim Krčmář wrote: > 2017-05-13 00:46-0300, Marcelo Tosatti: > > On Fri, May 12, 2017 at 07:37:12PM +0200, Radim Krčmář wrote: > > > 2017-05-12 12:31-0300, Marcelo Tosatti: > > > > Now with > > > > > > > > + kvm->arch.kvmclock_offset = user_ns.clock - > > > > + ka->master_kernel_ns; > > > > > > > > What happens is that guest clock starts counting, via kernel timekeeper, > > > > at the moment kvm_get_time_and_clockread() runs. If you add grdtsc() - > > > > ka->master_cycle_now in there, you are mindfully counting clock twice > > > > (first: kernel timekeeper, second: the TSC between the (grdtsc() - > > > > ka->master_cycle_now) in question. > > > > > > > > + kvm->arch.kvmclock_offset = -ktime_get_boot_ns() +user_ns.clock -delta > > > > > > > > Note that (grdtsc() - ka->master_cycle_now) is susceptible to scheduling > > > > etc. > > > > > > > > Makes sense? > > > > > > Yes. The simpler code starts the kvmclock a bit later, but both are > > > correct -- anything within KVM_SET_CLOCK runtime is. > > > > No the simpler code is not correct. You count time with two clocks for a > > small period of time. > > A clock that counts kernel-nanoseconds is instantly replaced by a clock > that counts masterclock-nanoseconds, not incorrect by itself. > > The simpler code will get the same kvmclock_offset as your code where > kvm_get_time_and_clockread() is called a bit later. > The distribution of resulting kvm_offsets will differ, but they must > both be correct or both incorrect, because they are already off-mark. > > > KVM_SET_CLOCK means: set the guest clock to the passed value and start > > counting it from there. > > Which is exactly what both versions do. > > > With the simple fix, KVM_SET_CLOCK does: set the guest clock to the > > passed value, but also add the delta between kvm_get_time_and_clockread() > > and get_kvmclock_ns(). > > > > Which is variable, due to scheduling. > > Yes. > > > So it is just wrong. > > It makes the matter slightly worse by adding some execution time, but > the whole interface is "just wrong" even without that: we already have > the variability of the time between userspace's decision on > user_ns.clock value and kvm_get_time_and_clockread(). > > >> If we care about accuracy, then we should let userspace provide a > >> (kernel timestamp, kvm timestamp) pair, so the value of kvmclock can > >> really be controlled. > > > > I suppose something else has to be done: the control of the clock, > > from whatever userspace is using to measure passage of time, > > to TSC, has to be done in kernel. > > We agree, just worded it differently. > > > But lets see if that is really necessary when the QEMU > > PTP/CLOCK_MONOTONIC delta stuff is done (working on it). > > Right. > > > In the meantime, do you have anything against this patch? I depend on > > it for the work above. > > The reasoning provided with the patch does not explain why > > * kvmclock_offset must be adjusted so that > * user_ns.clock = master_kernel_ns + kvmclock_offset > > Please explain why it "must". I assert that it does not have to be. > > If we agree that it is not necessary, then it is an optimization and I'd > like numbers to show that we are getting something that balances the > obfuscation; and why do we want it if we don't care about the better > solution (discussed above). > > > I depend on > > it for the work above. > > Describing how other code couldn't work without this is great reason, > but again, please be specific -- what difference it make? > > >> Adding ugly optimizations to work around shortcomings of the API is > >> going the wrong way ... > > > > What optimization you refer to? > > I refer to everything on top of the second hunk I posted. > > Thanks. Actually you are right, your patch is fine because the length of time between kvm_get_time_and_clockread() and get_kvmclock_ns(kvm) is compensated by - grdtsc() + ka->master_cycle_now = - ( +grdtsc() - ka->master_cycle_now) Which is the length of time between kvm_get_time_and_clockread() and get_kvmclock_ns(kvm). Its much cleaner indeed. Can you please apply it? Thanks.