From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: kvmclock doesn't work, help? Date: Mon, 14 Dec 2015 20:38:42 -0200 Message-ID: <20151214223842.GA26372@amt.cnet> References: <20151210213212.GA4836@amt.cnet> <566EC7AF.3090508@redhat.com> <20151214220027.GA24973@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Paolo Bonzini , kvm list , Radim Krcmar , X86 ML To: Andy Lutomirski Return-path: Received: from mx1.redhat.com ([209.132.183.28]:54016 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934912AbbLQP5n (ORCPT ); Thu, 17 Dec 2015 10:57:43 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: GOn Mon, Dec 14, 2015 at 02:31:10PM -0800, Andy Lutomirski wrote: > On Mon, Dec 14, 2015 at 2:00 PM, Marcelo Tosatti wrote: > > On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote: > >> > >> > >> On 11/12/2015 22:57, Andy Lutomirski wrote: > >> > I'm still not seeing the issue. > >> > > >> > The formula is: > >> > > >> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >> > >> > pvti->tsc_shift) + pvti->system_time > >> > > >> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value > >> > after suspend/resume, you would also need to update system_time. > >> > > >> > I don't see what this has to do with suspend/resume or with whether > >> > the effective scale factor is greater than or less than one. The only > >> > suspend/resume interaction I can see is that, if the host allows the > >> > guest-observed TSC value to jump (which is arguably a bug, what that's > >> > not important here), it needs to update pvti before resuming the > >> > guest. > >> > >> Which is not an issue, since freezing obviously gets all CPUs out of > >> guest mode. > >> > >> Marcelo, can you provide an example with made-up values for tsc and pvti? > > > > I meant "systemtime" at ^^^^^. > > > > guest visible clock = systemtime (updated at time 0, guest initialization) + scaled tsc reads=LARGE VALUE. > > ^^^^^^^^^^ > > guest reads clock to memory at location A = scaled tsc read. > > -> suspend resume event > > guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled tsc reads=0. > > guest reads clock to memory at location B. > > > > So before the suspend/resume event, the clock is the RAW TSC values > > (scaled by kvmclock, but the frequency of the RAW TSC). > > > > After suspend/resume event, the clock is updated from the host > > via get_kernel_ns(), which reads the corrected NTP frequency TSC. > > > > So you switch the timebase, from a clock running at a given frequency, > > to a clock running at another frequency (effective frequency). > > > > Example: > > > > RAW TSC NTP corrected TSC > > t0 10 10 > > t1 20 19.99 > > t2 30 29.98 > > t3 40 39.97 > > t4 50 49.96 > > > > ... > > > > if you suddenly switch from RAW TSC to NTP corrected TSC, > > you can see what will happen. > > Sure, but why would you ever switch from one to the other? Because thats what happens when you ask kvmclock to update from system time (which is a reliable clock, resistant to suspend/resume issues). > I'm still not seeing the scenario under which this discontinuity is > visible to anything other than the kvmclock code itself. Host userspace can see if it uses TSC and clock_gettime() and expects them to run hand in hand. > The only things that need to be monotonic are the output from > vread_pvclock and the in-kernel equivalent, I think. > > --Andy clock_gettime as well, should be monotonic.