From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: kvmclock doesn't work, help?
Date: Mon, 14 Dec 2015 20:38:42 -0200
Message-ID: <20151214223842.GA26372@amt.cnet>
References: <CALCETrVZwDddGcW8axAb4PP+YZyfz5TGR9xYwZXv3d_aghLBtA@mail.gmail.com>
 <20151210213212.GA4836@amt.cnet>
 <CALCETrVhHP6p-XRKhzUQX4QY3ymupriarr3joUCgjQgYa-49Bg@mail.gmail.com>
 <566EC7AF.3090508@redhat.com>
 <20151214220027.GA24973@amt.cnet>
 <CALCETrULJW9BpB+VQOFvLYOYrA0xBWwgzim3kRB+FzZe6Voa+g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	Radim Krcmar <rkrcmar@redhat.com>, X86 ML <x86@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:54016 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S934912AbbLQP5n (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 17 Dec 2015 10:57:43 -0500
Content-Disposition: inline
In-Reply-To: <CALCETrULJW9BpB+VQOFvLYOYrA0xBWwgzim3kRB+FzZe6Voa+g@mail.gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

GOn Mon, Dec 14, 2015 at 02:31:10PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 14, 2015 at 2:00 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 11/12/2015 22:57, Andy Lutomirski wrote:
> >> > I'm still not seeing the issue.
> >> >
> >> > The formula is:
> >> >
> >> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> >> > pvti->tsc_shift) + pvti->system_time
> >> >
> >> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> >> > after suspend/resume, you would also need to update system_time.
> >> >
> >> > I don't see what this has to do with suspend/resume or with whether
> >> > the effective scale factor is greater than or less than one.  The only
> >> > suspend/resume interaction I can see is that, if the host allows the
> >> > guest-observed TSC value to jump (which is arguably a bug, what that's
> >> > not important here), it needs to update pvti before resuming the
> >> > guest.
> >>
> >> Which is not an issue, since freezing obviously gets all CPUs out of
> >> guest mode.
> >>
> >> Marcelo, can you provide an example with made-up values for tsc and pvti?
> >
> > I meant "systemtime" at ^^^^^.
> >
> > guest visible clock = systemtime (updated at time 0, guest initialization) + scaled tsc reads=LARGE VALUE.
> >                       ^^^^^^^^^^
> > guest reads clock to memory at location A = scaled tsc read.
> > -> suspend resume event
> > guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled tsc reads=0.
> > guest reads clock to memory at location B.
> >
> > So before the suspend/resume event, the clock is the RAW TSC values
> > (scaled by kvmclock, but the frequency of the RAW TSC).
> >
> > After suspend/resume event, the clock is updated from the host
> > via get_kernel_ns(), which reads the corrected NTP frequency TSC.
> >
> > So you switch the timebase, from a clock running at a given frequency,
> > to a clock running at another frequency (effective frequency).
> >
> > Example:
> >
> >         RAW TSC                 NTP corrected TSC
> > t0      10                      10
> > t1      20                      19.99
> > t2      30                      29.98
> > t3      40                      39.97
> > t4      50                      49.96
> >
> > ...
> >
> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> > you can see what will happen.
> 
> Sure, but why would you ever switch from one to the other? 

Because thats what happens when you ask kvmclock to update from system
time (which is a reliable clock, resistant to suspend/resume issues).

>  I'm still not seeing the scenario under which this discontinuity is
> visible to anything other than the kvmclock code itself.

Host userspace can see if it uses TSC and clock_gettime()
and expects them to run hand in hand.

> The only things that need to be monotonic are the output from
> vread_pvclock and the in-kernel equivalent, I think.
> 
> --Andy

clock_gettime as well, should be monotonic.