All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm list <kvm@vger.kernel.org>, Radim Krcmar <rkrcmar@redhat.com>,
	X86 ML <x86@kernel.org>
Subject: Re: kvmclock doesn't work, help?
Date: Thu, 17 Dec 2015 17:08:51 -0200	[thread overview]
Message-ID: <20151217190850.GA13981@amt.cnet> (raw)
In-Reply-To: <CALCETrVfG5YtfP+8G-NLqkyZByndT11MwkmgZ1X9u-h_syw9PQ@mail.gmail.com>

On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> >>
> >> >>
> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >>> >         RAW TSC                 NTP corrected TSC
> >> >>> > t0      10                      10
> >> >>> > t1      20                      19.99
> >> >>> > t2      30                      29.98
> >> >>> > t3      40                      39.97
> >> >>> > t4      50                      49.96

(1)

> >> >>> >
> >> >>> > ...
> >> >>> >
> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >>> > you can see what will happen.
> >> >>>
> >> >>> Sure, but why would you ever switch from one to the other?
> >> >>
> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> >> resume, the TSC certainly increases at the same rate as before, but the
> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> >> than the guest kvmclock.
> >> >
> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >
> >> > If it's the host's, then wouldn't systemtime be reset after resume to
> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> > backwards.
> >> >
> >> > If it's the guest's, then the guest's NTP correction is applied on top
> >> > of kvmclock, and this shouldn't matter.
> >> >
> >> > I still feel like I'm missing something very basic here.
> >> >
> >>
> >> OK, I think I get it.
> >>
> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
> >> correction to the guest.  If it did, indeed, propagate the correction
> >> then, after resume, the host's new system_time would match the guest's
> >> idea of it (after accounting for the guest's long nap), and I don't
> >> think there would be a problem.
> >> That being said, I can't find the code in the masterclock stuff that
> >> would actually do this.
> >
> > Guest clock is maintained by guest timekeeping code, which does:
> >
> > timer_interrupt()
> >         offset = read clocksource since last timer interrupt
> >         accumulate_to_systemclock(offset)
> >
> > The frequency correction of NTP in the host can be applied to
> > kvmclock, which will be visible to the guest
> > at "read clocksource since last timer interrupt"
> > (kvmclock_clocksource_read function).
> 
> pvclock_clocksource_read?  That seems to do the same thing as all the
> other clocksource access functions.
> 
> >
> > This does not mean that the NTP correction in the host is propagated
> > to the guests system clock directly.
> >
> > (For example, the guest can run NTP which is free to do further
> > adjustments at "accumulate_to_systemclock(offset)" time).
> 
> Of course.  But I expected that, in the absence of NTP on the guest,
> that the guest would track the host's *corrected* time.
> 
> >
> >> If, on the other hand, the host's NTP correction is not supposed to
> >> propagate to the guest,
> >
> > This is optional. There is a module option to control this, in fact.
> >
> > Its nice to have, because then you can execute a guest without NTP
> > (say without network connection), and have a kvmclock (kvmclock is a
> > clocksource, not a guest system clock) which is NTP corrected.
> 
> Can you point to how this works?  I found kvm_guest_time_update, whch
> is called under circumstances that I haven't untangled.  I can't
> really tell what it's trying to do.

Documentation/virtual/kvm/timekeeping.txt.

> In any case, this still seems much more convoluted than it has to be.
> In the case in which the host has a stable TSC (tsc is selected in the
> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
> the time on the last few generations of CPUs, then the core
> timekeeping code is already exposing a linear function that's supposed
> to be used for monotonic, cpu-local access to a corrected nanosecond
> counter.  It's even in pretty much exactly the right form to pass
> through to the guest via pvclock in the gtod data.  Why doesn't KVM
> pass it through verbatim, updated in real time?  Is there some legacy
> reason that KVM must apply its own corrections and has to jump through
> hoops to pause vcpus when updating those vcpu's copies of the pvclock
> data?

Read the comment on x86.c which starts with
" *
 * Assuming a stable TSC across physical CPUS, and a stable TSC
 * across virtual CPUs, the following condition is possible.
 * Each numbered line represents an event visible to both
 * CPUs at the next numbered event.
"

> >> then shouldn't KVM just update system_time on
> >> resume to whatever the guest would think it had (which I think would
> >> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
> >> shifted by some per-guest constant offset).
> >>
> >> --Andy
> >
> > Sure, you could add a correction to compensate and make sure
> > the guest clock does not see time backwards.
> >
> 
> Could you help do that?  You understand the code far better than I do.

Sure, you have to save the guests view of time (system_time + scaled tsc
read) when suspending, and add an offset to get_kernel_ns() to 
compensate the effect of (1) when resuming.

Does that make sense? 

> As it stands, it simply doesn't work on any system that suspends and
> resumes (unless maybe the system has the upcoming Intel ART feature,
> and I have no clue when that'll show up).
> 
> --Andy

  reply	other threads:[~2015-12-18  0:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-09 21:10 kvmclock doesn't work, help? Andy Lutomirski
2015-12-09 21:16 ` Paolo Bonzini
2015-12-09 21:49   ` Andy Lutomirski
2015-12-09 22:12     ` Paolo Bonzini
2015-12-09 22:27       ` Andy Lutomirski
2015-12-09 22:42         ` Paolo Bonzini
2015-12-09 22:43         ` Andy Lutomirski
2015-12-10 21:33         ` Marcelo Tosatti
2015-12-10 21:32 ` Marcelo Tosatti
2015-12-11 21:57   ` Andy Lutomirski
2015-12-11 23:48     ` Marcelo Tosatti
2015-12-14 18:07       ` Andy Lutomirski
2015-12-14 21:47         ` Marcelo Tosatti
2015-12-14 13:44     ` Paolo Bonzini
2015-12-14 22:00       ` Marcelo Tosatti
2015-12-14 22:31         ` Andy Lutomirski
2015-12-14 22:38           ` Marcelo Tosatti
2015-12-15  8:42           ` Paolo Bonzini
2015-12-16 17:48             ` Andy Lutomirski
2015-12-16 18:17               ` Andy Lutomirski
2015-12-16 21:57                 ` Marcelo Tosatti
2015-12-17 16:33                   ` Andy Lutomirski
2015-12-17 19:08                     ` Marcelo Tosatti [this message]
2015-12-18  1:12                       ` Andy Lutomirski
2015-12-18 11:47                         ` Marcelo Tosatti
2015-12-18 19:27                           ` Andy Lutomirski
2015-12-18 19:45                             ` Marcelo Tosatti
2015-12-18 20:25                               ` Andy Lutomirski
2015-12-18 21:49                                 ` Marcelo Tosatti
2015-12-21 22:49                                   ` Andy Lutomirski
2015-12-23 19:27                                     ` Marcelo Tosatti
2015-12-23 23:09                                       ` Andy Lutomirski
2015-12-19  1:16                                 ` John Stultz
2015-12-10 21:36 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151217190850.GA13981@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.