All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: kurt.hackel@oracle.com,
	"Xen-Devel (E-mail)" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Jan Beulich <JBeulich@novell.com>
Subject: Re: rdtscP and xen (and maybe the app-tsc answer I've been looking for)
Date: Fri, 18 Sep 2009 15:55:08 -0700	[thread overview]
Message-ID: <4AB40FCC.5090805@goop.org> (raw)
In-Reply-To: <c05faa5f-f9a9-4093-9e15-5648f5e4ff77@default>

On 09/18/09 13:27, Dan Magenheimer wrote:
> If guest vm.cfg has vrdtscp=0 (default):
>   rdtscp is emulated and returns nsec since guest
>   boot (same as emulated rdtsc), value returned
>   for TSC_AUX is -1
>
> If guest vm.cfg has vrdtscp=1:
>   If underlying hardware has rdtscp support:
>     rdtscp is directly executed by hardware,
>     value returned for TSC_AUX is non-zero
>     (see below)
>   Else: (no hardware rdtscp support)
>     rdtscp is emulated and returns nsec since
>     guest boot, value returned for TSC_AUX is 0
>   

Why do you need to distinguish between the two emulated rdtscp cases? 
Special-casing a version of '0' is awkward because it would arise
naturally from version wraparound (after 2^31 time parameter updates,
but still).

If the hardware doesn't support rdtscp, how should an app know whether
or not to use it?  Should it just try running rdtscp being prepared to
handle a SIGILL?

> How it works from the app point-of-view:
>
> Guest app must have some capability of getting 64-bit
> pvclock parameters directly from Xen without OS changes,
> e.g. emulated userland wrmsr, userland hypercall,
> or userland mapped shared page.  (This will be done
> rarely so need not be fast! But it does create
> a new userland<->Xen ABI that must be kept compatible.)
>
> On first rdtscp, app records returned TSC_AUX value,
> verifies that it is neither 0 nor -1,
> fetches pvclock parameters from Xen, executes
> another rdtscp.  If TSC_AUX matches previous value,
> app applies pvclock algorithm to tsc value to
> obtain nsec since guest boot.  If TSC_AUX is
> zero or -1, tsc value IS nsec since guest boot.
> If TSC_AUX differs from last recorded value,
> fetch pvclock parameters from Xen again.
>
> On subsequent rdtscp's, app compares
> returned TSC_AUX against the previous one,
> and fetches pvclock parameters from Xen only
> if it differs (which should be rare).
>   

Presumably the pvclock would contain the same version number which must
match; if not it keeps iterating (rdtscp, get-timing-parameters) until
they do.

> What Xen needs to do:
>
> Xen must record the setting for each guest's vrdtscp
> config variable and ensure that it persists across
> save/restore and migration.  If the guest has
> vrdtscp=1, a vrdtscp "version" number is also
> part of the guest's state and must persist
> across save/restore/migration.
>
> Xen must know whether or not it is running on a
> machine where TSC is reliable.  If TSC is NOT
> reliable AND rdtscp is supported by hardware,
> Xen must ensure that TSC_AUX is -1 on all pcpu's
> that are running a guest with vrdtscp=0, and 0
> on all pcpu's that are running a guest where
> vrdtscp=1 (and must enable CR4.TSD on those
> pcpus if it wasn't already).

If rdtscp is not reliable but Xen has accurate tsc parameter info, then
the algorithm above will still work efficiently.

>   If TSC is NOT
> reliable AND rdtscp is NOT supported by hardware,
> Xen must emulate rdtscp (e.g.
> return Xen system time) and emulate the
> same behavior for TSC_AUX.  If TSC IS reliable,
> Xen sets TSC_AUX to the guest's vrdtscp version
> number on all pcpu's that are running the guest.
> Finally, when a guest transitions from one
> "TSC domain" to another (restore/migrate/NUMA)
> it increments the vrdtscp version number.
>   

Well, it just needs to increment it whenever Xen knows the tsc has
changed, as the current pvclock code does.  It could be more frequently
than restore/migrate if tsc changes on power events.

> The only problem I can see is that when
> vrdtscp==1, other apps that are running on that guest
> that use rdtsc (no p) directly (i.e. haven't been
> modified to use pv-rdtscp) will continue to
> have the same kinds of failure on save/restore/
> migration.  But this is true of all the solutions
> proposed so far: Xen can only turn on emulation
> guest-wide, not per-app.
>   

Linux already reserves rdtscp for use as part of vsyscall, where TSC_AUX
contains the NUMA node and the CPU number, so there should be no "naked"
users of rdtscp.

> Also even on machines where TSC is reliable,
> there is a small chance that consecutive
> TSC values read will be from different
> processors and so TSC might appear to go
> backwards by some small amount.  So apps
> must still put raw TSC values through
> a "monotonicity filter".  (Xen already
> does this for emulated reads of TSC.)
>   

Why?  I thought "reliable" tscs were supposed to be synced between cores?

    J

  reply	other threads:[~2009-09-18 22:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-18 16:30 rdtscP and xen (and maybe the app-tsc answer I've been looking for) Dan Magenheimer
2009-09-18 20:27 ` Dan Magenheimer
2009-09-18 22:55   ` Jeremy Fitzhardinge [this message]
2009-09-19 15:34     ` Dan Magenheimer
2009-09-21 14:47       ` Dan Magenheimer
2009-09-21 18:36       ` Jeremy Fitzhardinge
2009-09-21 22:20         ` Dan Magenheimer
2009-09-21 22:50           ` Jeremy Fitzhardinge
2009-09-21 23:29             ` Dan Magenheimer
2009-09-21 23:55               ` Jeremy Fitzhardinge
2009-09-22  0:11                 ` Dan Magenheimer
2009-09-22  0:42                   ` Jeremy Fitzhardinge
2009-09-22 19:36                 ` Dan Magenheimer
2009-09-22 19:52                   ` Jeremy Fitzhardinge
2009-09-22 20:22                     ` Dan Magenheimer
2009-09-22 22:18                       ` Jeremy Fitzhardinge
2009-09-22  7:44               ` Jan Beulich
2009-09-22 15:00                 ` Dan Magenheimer
2009-09-22 15:16                   ` Jan Beulich
2009-09-22 17:15                     ` Jeremy Fitzhardinge
2009-09-22  7:39         ` Jan Beulich
2009-09-22 17:26           ` Jeremy Fitzhardinge
2009-09-21  8:17   ` Jan Beulich
2009-09-21 14:04     ` Dan Magenheimer
2009-09-21 14:18       ` Jan Beulich
2009-09-21 15:25         ` Dan Magenheimer
2009-09-21 15:41           ` Keir Fraser
2009-09-21 15:53             ` Keir Fraser
2009-09-21 16:55               ` Dan Magenheimer
2009-09-21 17:02                 ` Keir Fraser
2009-09-21 17:56                   ` Dan Magenheimer
2009-09-21 18:17                     ` Keir Fraser
2009-09-21 21:47                       ` Dan Magenheimer
2009-09-21 16:03           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AB40FCC.5090805@goop.org \
    --to=jeremy@goop.org \
    --cc=JBeulich@novell.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=kurt.hackel@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.