From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Hornyack <peterhornyack@google.com>,
Owen Hofmann <osh@google.com>, KVM General <kvm@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: What time is it kvm-clock?
Date: Thu, 25 Feb 2016 09:12:39 -0300 [thread overview]
Message-ID: <20160225121237.GA31115@amt.cnet> (raw)
In-Reply-To: <CALCETrU708tVWmX4nb7+XBcPELEuKcqd2oLvk2R0RzXL7Sb=5Q@mail.gmail.com>
On Wed, Feb 24, 2016 at 05:19:38PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 24, 2016 at 3:35 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Feb 24, 2016 at 09:35:44AM -0800, Peter Hornyack wrote:
> >> On Tue, Feb 23, 2016 at 7:57 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> > On Tue, Feb 23, 2016 at 06:31:59PM -0800, Owen Hofmann wrote:
> >> >> Specifically, what underlying source of time should be exposed through
> >> >> kvm-clock and other paravirtual ABIs like the HyperV reference tsc
> >> >> page? Recently a couple of threads on kvm-list, along with attempts
> >> >> to produce reliable behavior from kvm-clock on our systems have
> >> >> highlighted a tension between the current implementation of kvm-clock
> >> >> and potentially diverging goals for paravirt time. Here are a few:
> >> >>
> >> >> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html
> >> >> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html
> >> >> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html
> >> >>
> >> >> This question is mostly in regards to kvm-clock in masterclock mode
> >> >> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to
> >> >> expose a source of time that is more 'true' than the underlying TSC?
> >> >> For example, by passing through NTP correction from the host. For the
> >> >> current implementation, the answer seems to be... why not both? Once
> >> >> programmed, kvm-clock or the HyperV TSC page will advance with the TSC
> >> >> multiplied by the frequency specified by kvm. On the other hand,
> >> >> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR
> >> >> are measured against corrected time from the host. A guest reading its
> >> >> pvclock gets a very different result from a host KVM_GET_CLOCK if the
> >> >> guest has run long enough to for TSC to diverge from NTP time. A VMM
> >> >> using these ioctls to save and restore clock state can produce wild
> >> >> time jumps from the guest's perspective.
> >> >>
> >> >> The patches in (2) address this mismatch by plumbing updates to clock
> >> >> frequency through kvm-clock to the guest. This seems like an important
> >> >> design choice for kvm-clock, and IMO deserves at least a clear
> >> >> statement of the goals for this interface, if not some more
> >> >> discussion.
> >> >
> >> > Design goals of what interface? KVM_GET_CLOCK / KVM_SET_CLOCK?
> >> >
> >> > The interfaces have been introduced to fix a bug.
> >> >
> >> >> The (later) thread in (3) claims that synchronizing with
> >> >> host time is *not* a goal of kvm-clock.
> >> >
> >> > It is not.
> >> >
> >> >> To me, kvm-clock and the HyperV TSC page are extremely effective as
> >> >> simply a more enlightened path to the host TSC. Maintaining a
> >> >> high-performance path to the TSC in the face of updates is tricky -
> >> >> see the extended comment in pvclock_update_vm_gtod_copy, or the
> >> >> discussion on the patchset in (2). Is the cost of auditing that the
> >> >> path from host gettimeofday update -> kvm -> guest pvclock -> guest
> >> >> gettimeofday both tracks host time correctly and does not produce any
> >> >> backwards warps worth the added value, if it exists? As an
> >> >> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a
> >> >> function of the last update to kvm-clock or the reference TSC page,
> >> >> respectively, sounds very straightforward.
> >> >>
> >> >> (Outside of masterclock mode, the requirement that the client
> >> >> synchronizes across cpus for montonicity smoothes over a lot of
> >> >> complexity - periodically updating kvm-clock to the current time is
> >> >> simple and works.)
> >> >>
> >> >> Regardless of my opinion, I think that a clear statement of the design
> >> >> goals for kvm-clock (and kvm's implementation of the reference TSC
> >> >> page) would be valuable.
> >> >
> >> > Documentation/virtual/kvm/timekeeping.txt
> >> >
> >>
> >> Hi Marcelo,
> >>
> >> While I appreciate all of the detail in timekeeping.txt, it is not a
> >> very good reference for what kvm-clock is or how it works. kvm-clock
> >> is only mentioned three times in different places throughout that
> >> document, and nowhere is there a very clear statement of what
> >> kvm-clock is supposed to do or how it does it.
> >>
> >> For somebody that does not already have a deep understanding of the
> >> core masterclock code, trying to understand how kvm-clock works is a
> >
> > There is no "deep understanding". There is one comment there about
> > why you can't update systemtimestamp + tsc_offset (you have to read
> > the kvmclock clock read function to understand this sentence) in
> > parallel in multiple VCPUs, and thats all masterclock is about.
> >
> > Its called "master" because there must be only one system_timestamp
> > and not multiple (therefore thats the "master" copy of system_time).
> >
> >> real challenge.
> >>
> >> Thanks,
> >> Peter
> >
> > Design goals: provide a reliable clocksource device to Linux guests
> > so they are able to cope with virtualization problems, namely:
> >
> > 1. Migration to hosts with different TSC frequency.
> > 2. Support for hosts with TSCs that are not stable (whose
> > counting frequency changes across processor frequency changes).
> >
> > How: Expose a clockdevice which counts at 1GHz to guests.
>
> This still doesn't define how closely it is intended to track 1 GHz or
> whether NTP slew is applied.
>
> > Evolution of masterclock scheme (bugs uncovered):
> >
> > Problem: time backwards as seen by guests.
> > Solution: Fix in guest with pvclock global variable (cmpxchg).
>
> I thought that was only for non-masterclock.
>
> >
> > Problem: gettimeofday() performance
> > Solution: Use masterclock scheme (update pvclock areas in sync to avoid
> > time backwards event being visible to guests, its well documented in
> > x86.c, if something is unclear please try to understand the code / ask
> > and you/we improve the documentation there).
>
> The actual masterclock host code is long and very difficult to follow.
>
> In 4.5-rc, the vDSO guest code is IMO short and reasonably clear.
>
> >
> > Problem: get_kernel_ns VS TSC clock get out of sync and
> > Hyper-V complains about the difference.
> >
> > Solution: expose the NTP TSC frequency so that guests
> > apply NTP frequency correction to their kvmclock reads on TSC as well.
> >
>
> I don't understand what you mean.
>
> > ---
> >
> > About future: agree with Andy that kvmclock should be removed.
> > So there is a pending work item there: "verify TSC clocksource
> > is fine for exposing to guests, think about the implications for
> > management software".
> > I can write down a list of items that have been fixed
> > for kvmclock and would have to be check for tsc clocksource.
> >
> > Anyone willing to take that task ?
> >
>
> How?
>
> On very very new hosts (those that support TSC_ADJUST and tsc
> scaling), this should be possible.
Exactly, TSC scaling.
> The host would ideally tell the
> guest what frequency of clock it intends to provide (ideally 1 GHz
> exactly) and the guest would use it. I'm not sure this hardware
> exists yet.
>
> If you enable TSC scaling like this, you may need to supply an ART
> (always running timer) adjustment to the guest in case you intend to
> pass any ART consumers through to the guest. Of course, no one
> outside Intel has *that* hardware either (AFAIK -- maybe there are
> some prototypes floating around).
>
> > ---
> >
> > About complaint that "its not well designed whether NTP correction
> > should be applied or not". There are two different things:
> >
> > 1) Host clock and guest clocks synchronized.
> > KVM is not responsible for that, and it can't, because
> > Linux exposes a clock which is created in software
> > and fixed by NTP.
>
> I don't understand what you mean.
>
> Of course the guest can run its own NTP daemon or similar adjtimex
> caller and cause the guest to stop tracking the host. But if the host
> passed CLOCK_MONOTONIC through, then the guest would, by default,
> treat kvm-clock as an exactly 1GHz source and would then expose a
> disciplined NTP-tracking CLOCK_MONOTONIC through to its user apps even
> without an NTP client on the guest.
>
> If integration with the POSIX clock core were provided, the guest
> would learn to consume the host's CLOCK_REALTIME as well, as long as
> the host uses the tsc as its clocksource.
>
> >
> > 2) NTP frequency correction being applied to kvmclock.
> >
> > This only means that the frequency of the pvclock reads
> > in the guest are NTP corrected.
>
> If the host applied NTP frequency correction to the guest, then I
> would be happy. Some folks might want this to be optional.
>
> The guest can do additional correction on top if it wants regardless.
>
> --Andy
Paolo's track-TSC-offset-multiplier-from-kvmclock-updates should make
enabling masterclock for suspend/resume much simpler.
next prev parent reply other threads:[~2016-02-25 12:12 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-24 2:31 What time is it kvm-clock? Owen Hofmann
2016-02-24 3:57 ` Marcelo Tosatti
2016-02-24 17:35 ` Peter Hornyack
2016-02-24 20:17 ` Radim Krčmář
2016-02-24 20:24 ` Andy Lutomirski
2016-02-24 20:53 ` Radim Krčmář
2016-02-25 11:13 ` Radim Krčmář
2016-02-25 11:22 ` Marcelo Tosatti
2016-02-24 23:35 ` Marcelo Tosatti
2016-02-24 23:36 ` Marcelo Tosatti
2016-02-25 1:19 ` Andy Lutomirski
2016-02-25 3:50 ` Owen Hofmann
2016-02-25 12:20 ` Radim Krčmář
2016-02-26 17:02 ` Andy Lutomirski
2016-02-26 19:30 ` Marcelo Tosatti
2016-02-27 0:00 ` Andy Lutomirski
2016-02-25 11:36 ` Radim Krčmář
2016-02-25 12:12 ` Marcelo Tosatti [this message]
2016-02-24 3:59 ` Marcelo Tosatti
2016-02-24 14:14 ` Paolo Bonzini
2016-02-24 16:44 ` Andy Lutomirski
2016-02-24 17:38 ` Marcelo Tosatti
2016-02-24 19:38 ` Andy Lutomirski
2016-02-24 19:44 ` Paolo Bonzini
2016-02-24 19:52 ` Andy Lutomirski
2016-02-24 19:55 ` Owen Hofmann
2016-02-25 12:22 ` Joao Martins
2016-02-26 15:04 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160225121237.GA31115@amt.cnet \
--to=mtosatti@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=osh@google.com \
--cc=pbonzini@redhat.com \
--cc=peterhornyack@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox