public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Randy Dunlap <randy.dunlap@oracle.com>
To: Glauber Costa <glommer@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, avi@redhat.com
Subject: Re: [PATCH 5/5] add documentation about kvmclock
Date: Thu, 15 Apr 2010 12:28:36 -0700	[thread overview]
Message-ID: <20100415122836.27f1e255.randy.dunlap@oracle.com> (raw)
In-Reply-To: <1271356648-5108-6-git-send-email-glommer@redhat.com>

On Thu, 15 Apr 2010 14:37:28 -0400 Glauber Costa wrote:

> This patch adds a new file, kvm/kvmclock.txt, describing
> the mechanism we use in kvmclock.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  Documentation/kvm/kvmclock.txt |  138 ++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 138 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/kvm/kvmclock.txt
> 
> diff --git a/Documentation/kvm/kvmclock.txt b/Documentation/kvm/kvmclock.txt
> new file mode 100644
> index 0000000..21008bb
> --- /dev/null
> +++ b/Documentation/kvm/kvmclock.txt
> @@ -0,0 +1,138 @@
> +KVM Paravirtual Clocksource driver
> +Glauber Costa, Red Hat Inc.
> +==================================
> +
> +1. General Description
> +=======================
> +
...
> +
> +2. kvmclock basics 
> +===========================
> +
> +When supported by the hypervisor, guests can register a memory page
> +to contain kvmclock data. This page has to be present in guest's address space
> +throughout its whole life. The hypervisor continues to write to it until it is
> +explicitly disabled or the guest is turned off.
> +
> +2.1 kvmclock availability
> +-------------------------
> +
> +Guests that want to take advantage of kvmclock should first check its
> +availability through cpuid.
> +
> +kvm features are presented to the guest in leaf 0x40000001. Bit 3 indicates
> +the present of kvmclock. Bit 0 indicates that kvmclock is present, but the

       presence
but it's confusing.  Is it bit 3 or bit 0?  They seem to indicate the same thing.

> +old MSR set must be used. See section 2.3 for details.

"old MSR set":  what does this mean?

> +
> +2.2 kvmclock functionality
> +--------------------------
> +
> +Two MSRs are provided by the hypervisor, controlling kvmclock operation:
> +
> + * MSR_KVM_WALL_CLOCK, value 0x4b564d00 and
> + * MSR_KVM_SYSTEM_TIME, value 0x4b564d01.
> +
> +The first one is only used in rare situations, like boot-time and a
> +suspend-resume cycle. Data is disposable, and after used, the guest
> +may use it for something else. This is hardly a hot path for anything.
> +The Hypervisor fills in the address provided through this MSR with the
> +following structure:
> +
> +struct pvclock_wall_clock {
> +        u32   version;
> +        u32   sec;
> +        u32   nsec;
> +} __attribute__((__packed__));
> +
> +Guest should only trust data to be valid when version haven't changed before

                                                         has not

> +and after reads of sec and nsec. Besides not changing, it has to be an even
> +number. Hypervisor may write an odd number to version field to indicate that
> +an update is in progress.
> +
> +MSR_KVM_SYSTEM_TIME, on the other hand, has persistent data, and is
> +constantly updated by the hypervisor with time information. The data
> +written in this MSR contains two pieces of information: the address in which
> +the guests expects time data to be present 4-byte aligned or'ed with an
> +enabled bit. If one wants to shutdown kvmclock, it just needs to write
> +anything that has 0 as its last bit.
> +
> +Time information presented by the hypervisor follows the structure:
> +
> +struct pvclock_vcpu_time_info {
> +        u32   version;
> +        u32   pad0;
> +        u64   tsc_timestamp;
> +        u64   system_time;
> +        u32   tsc_to_system_mul;
> +        s8    tsc_shift;
> +        u8    pad[3];
> +} __attribute__((__packed__)); 
> +
> +The version field plays the same role as with the one in struct
> +pvclock_wall_clock. The other fields, are:
> +
> + a. tsc_timestamp: the guest-visible tsc (result of rdtsc + tsc_offset) of
> +    this cpu at the moment we recorded system_time. Note that some time is

            CPU (please)

> +    inevitably spent between system_time and tsc_timestamp measurements.
> +    Guests can subtract this quantity from the current value of tsc to obtain
> +    a delta to be added to system_time

                           to system_time.

> +
> + b. system_time: this is the most recent host-time we could be provided with.
> +    host gets it through ktime_get_ts, using whichever clocksource is
> +    registered at the moment

                         moment.

> +
> + c. tsc_to_system_mul: this is the number that tsc delta has to be multiplied
> +    by in order to obtain time in nanoseconds. Hypervisor is free to change
> +    this value in face of events like cpu frequency change, pcpu migration,

                                         CPU

> +    etc.
> + 
> + d. tsc_shift: guests must shift 

missing text??

> +
> +With this information available, guest calculates current time as:
> +
> +  T = kt + to_nsec(tsc - tsc_0)
> +
> +2.3 Compatibility MSRs
> +----------------------
> +
> +Guests running on top of older hypervisors may have to use a different set of
> +MSRs. This is because originally, kvmclock MSRs were exported within a
> +reserved range by accident. Guests should check cpuid leaf 0x40000001 for the
> +presence of kvmclock. If bit 3 is disabled, but bit 0 is enabled, guests can
> +have access to kvmclock functionality through
> +
> + * MSR_KVM_WALL_CLOCK_OLD, value 0x11 and
> + * MSR_KVM_SYSTEM_TIME_OLD, value 0x12.
> +
> +Note, however, that this is deprecated.
> +
> +3. Migration
> +============
> +
> +Two ioctls are provided to aid the task of migration: 
> +
> + * KVM_GET_CLOCK and
> + * KVM_SET_CLOCK
> +
> +Their aim is to control an offset that can be summed to system_time, in order
> +to guarantee monotonicity on the time over guest migration. Source host
> +executes KVM_GET_CLOCK, obtaining the last valid timestamp in this host, while
> +destination sets it with KVM_SET_CLOCK. It's the destination responsibility to
> +never return time that is less than that.


---
~Randy

  reply	other threads:[~2010-04-15 19:28 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-15 18:37 [PATCH 0/5] pv clock misc fixes Glauber Costa
2010-04-15 18:37 ` [PATCH 1/5] Add a global synchronization point for pvclock Glauber Costa
2010-04-15 18:37   ` [PATCH 2/5] change msr numbers for kvmclock Glauber Costa
2010-04-15 18:37     ` [PATCH 3/5] Try using new kvm clock msrs Glauber Costa
2010-04-15 18:37       ` [PATCH 4/5] export new cpuid KVM_CAP Glauber Costa
2010-04-15 18:37         ` [PATCH 5/5] add documentation about kvmclock Glauber Costa
2010-04-15 19:28           ` Randy Dunlap [this message]
2010-04-15 20:10             ` Glauber Costa
2010-04-17 18:58         ` [PATCH 4/5] export new cpuid KVM_CAP Avi Kivity
2010-04-19 14:50           ` Glauber Costa
2010-04-20  9:29             ` Avi Kivity
2010-04-17 18:55       ` [PATCH 3/5] Try using new kvm clock msrs Avi Kivity
2010-04-17 18:51     ` [PATCH 2/5] change msr numbers for kvmclock Avi Kivity
2010-04-16 20:23   ` [PATCH 1/5] Add a global synchronization point for pvclock Marcelo Tosatti
2010-04-16 20:36   ` Jeremy Fitzhardinge
2010-04-16 21:05     ` Zachary Amsden
2010-04-19 10:39     ` Peter Zijlstra
2010-04-19 10:50       ` Avi Kivity
2010-04-19 11:05         ` Peter Zijlstra
2010-04-19 11:10           ` Avi Kivity
2010-04-19 14:21             ` Glauber Costa
2010-04-19 14:33               ` Avi Kivity
2010-04-19 14:46                 ` Peter Zijlstra
2010-04-19 16:18                   ` Jeremy Fitzhardinge
2010-04-20  9:31                     ` Avi Kivity
2010-04-20 18:23                       ` Jeremy Fitzhardinge
2010-04-20 18:54                         ` Avi Kivity
2010-04-20 19:42                           ` Jeremy Fitzhardinge
2010-04-21  0:07                             ` Zachary Amsden
2010-04-22 13:11                             ` Glauber Costa
2010-04-23  1:44                               ` Zachary Amsden
2010-04-23  9:34                                 ` Avi Kivity
2010-04-23 19:22                                   ` Jeremy Fitzhardinge
2010-04-23 19:25                                     ` Avi Kivity
2010-04-23 21:31                                   ` Zachary Amsden
2010-04-23 21:35                                     ` Jeremy Fitzhardinge
2010-04-23 21:41                                       ` Zachary Amsden
2010-04-24  9:30                                         ` Avi Kivity
2010-04-24  9:29                                     ` Avi Kivity
2010-04-19 16:11                 ` Jeremy Fitzhardinge
2010-04-19 14:26     ` Glauber Costa
2010-04-19 16:19       ` Jeremy Fitzhardinge
2010-04-19 18:25         ` Glauber Costa
2010-04-20  1:57           ` Marcelo Tosatti
2010-04-20  9:35             ` Avi Kivity
2010-04-20 12:59               ` Glauber Costa
2010-04-20 15:16                 ` Avi Kivity
2010-04-21  0:01               ` Zachary Amsden
2010-04-21  8:06                 ` Avi Kivity
2010-04-17 18:48   ` Avi Kivity
2010-04-17 18:49     ` Avi Kivity
2010-04-19 10:43       ` Peter Zijlstra
2010-04-19 10:47         ` Avi Kivity
2010-04-19 10:56           ` Peter Zijlstra
2010-04-19 11:13             ` Avi Kivity
2010-04-19 11:19               ` Peter Zijlstra
2010-04-19 11:40                 ` Avi Kivity
2010-04-19 14:32                 ` Glauber Costa
2010-04-19 14:37                   ` Avi Kivity
2010-04-19 10:46     ` Peter Zijlstra
2010-04-19 10:49       ` Avi Kivity
2010-04-19 10:51         ` Peter Zijlstra
2010-04-19 10:54           ` Avi Kivity
2010-04-19 18:35             ` Zachary Amsden
2010-04-20  9:39               ` Avi Kivity
2010-04-21  0:05                 ` Zachary Amsden
2010-04-21  8:08                   ` Avi Kivity
2010-04-19 10:49       ` Peter Zijlstra
2010-04-19 10:53         ` Avi Kivity
2010-04-19 10:59           ` Peter Zijlstra
2010-04-19 11:35             ` Avi Kivity
2010-10-25 23:30   ` Jeremy Fitzhardinge
2010-10-26  8:14     ` Avi Kivity
2010-10-26 10:49       ` Glauber Costa
2010-10-26 17:04       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100415122836.27f1e255.randy.dunlap@oracle.com \
    --to=randy.dunlap@oracle.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox