All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: "Xen-Devel (E-mail)" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: rdtsc: correctness vs performance on Xen (and KVM?)
Date: Mon, 31 Aug 2009 17:22:20 -0700	[thread overview]
Message-ID: <4A9C693C.3020704@goop.org> (raw)
In-Reply-To: <830e5c23-96f5-4e79-9f11-3884735e1c33@default>

On 08/31/09 16:52, Dan Magenheimer wrote:
> work both on Xen and bare metal, and works properly
> across: vcpu-to-pcpu rescheduling even on NUMA
> machines; system sleep/hibernation; and 
> save/restore/migration between machines with
> dissimilar clock rates. 

But it will only do this when running under Xen.  If running on bare
metal, there will be nothing providing the correction info to the app,
and it will be no better than using raw rdtsc with all its limitations. 
In practice this means that the app will have to have some other code
path anyway.

>  Implementation requires
> changes in Xen and "the app" but no OS changes
> thus making it still viable on legacy OS's
> and possibly(?) HVM domains.  Note that
> only apps that need to sample time on the
> order of >5-100K/core/second would use this;
> for other apps, rdtsc emulation overhead
> is probably negligible (<0.2%).
>
> 0)  Xen implements rdtsc emulation by default
> 1)  Guest OS is launched with pvtsc=1 in vm.cfg
> 2)  App running on guest OS sets up a SIGILL handler
> 3)  App executes a special rdmsr instruction or
>     hypercall.
>   

No way to do direct hypercalls from usermode, so it would need to be an
illegal instruction (like cpuid).

But really it should be a system-wide kernel setting, set via sysctl or
something.

> 4a) If SIGILL results, not running on Xen at all,
>     or on old Xen; app uses rdtsc at own risk. Done.
> 4b) Else, rdmsr/hypercall returns virtual address of
>     special pvclock page ("pvclock_va").
>   
This can't be done without changing the kernel; Xen can't just start
sticking stuff into usermode mappings (how does Xen even know where a
given OS's usermode is?).

And again, usermode can't do hypercalls and I don't think we should
start making fake rdmsrs start working in usermode.

> 5)  App executes another special rdmsr instruction/
>     hypercall to disable rdtsc emulation.  This
>     affects ALL execution for all processes in this VM.
>   

Once enabled, it should just stay enabled.  System-wide is very coarse
anyway (since there's no guarantee that all apps will use the mechanism).

> 6)  Xen maintains mapping of pvclock_va to a
>     different physical page for each processor
>     and transparently handles TLB misses for
>     pvclock_va
>   

If you mean that a given VA has a per-cpu mapping, it requires percpu
pagetables.  That's not possible in Linux with PV pagetables (since two
tasks/threads on different cpus sharing the same mm will use the same
pagetable).

> 7)  App uses (unemulated) rdtsc and applies
>     pvclock algorithm (using values in memory
>     at pvclock_va) resulting in pvtsc, which
>     is nanoseconds since VM start.  App can
>     further apply local algorithms to enforce
>     monotonicity or frequency scaling as desired.
>
> Comments appreciated.  I realize that this is hacky
> and ugly... better alternatives gladly solicited.
>   

In general even Linux's specialised APIs are entirely unused (sendfile,
vmsplice, etc).  Something as esoteric as this will be pretty much unused.

This can be entirely done within the vsyscall mechansim without any app
changes.  There's no reason no to.

> P.S. While it would be nice if we could just tell
> apps to use a fast vgettimeofday equivalent, this
> does not exist today and, even if it did, would not
> be widely available for years in the kernel running under
> most enterprise app deployments (and, even then,
> only on 64-bit Linux.)
>   

These rationales are very unconvincing:

Making vsyscall work on 32bit is just a matter of doing it; apparently
nobody has put the effort into it, but there's no fundimental reason why
it wouldn't work.  Besides, who runs enterprise apps on 32-bit these
days?  Anything requiring even moderate amounts of memory is better run
on 64-bit.

Your mechanism will require kernel changes anyway, so there's no getting
around that.

Once vsyscall does Xen/KVM properly, then every app will automatically
do the right thing without modification.  There's no need for
specialized APIs that nobody will end up using anyway.  It only makes
sense to go to this kind of effort if it ends up making a plain "rdtsc"
have the properties you want it to have.

    J

  reply	other threads:[~2009-09-01  0:22 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-25 21:54 write_tsc in a PV domain? Dan Magenheimer
2009-08-25 22:28 ` Jeremy Fitzhardinge
2009-08-25 23:09   ` Dan Magenheimer
2009-08-26  6:23     ` Keir Fraser
2009-08-26 15:42       ` Dan Magenheimer
2009-08-26 15:58         ` Keir Fraser
2009-08-26 19:45         ` Jeremy Fitzhardinge
2009-08-26 20:23           ` Dan Magenheimer
2009-08-26 22:30             ` Jeremy Fitzhardinge
2009-08-26 23:10               ` Dan Magenheimer
2009-08-27  8:39                 ` Chris Lalancette
2009-08-27 13:00                   ` Dan Magenheimer
2009-08-27 13:17                     ` Chris Lalancette
2009-08-27  8:48               ` Alan Cox
2009-08-27 19:10                 ` Jeremy Fitzhardinge
2009-08-28  3:29                   ` Dan Magenheimer
2009-08-28  9:49                     ` Alan Cox
2009-08-28 15:16                       ` Dan Magenheimer
2009-08-28 15:30                         ` Alan Cox
2009-08-28 17:49                           ` rdtsc: correctness vs performance on Xen (and KVM?) Dan Magenheimer
2009-08-31 23:52                             ` Dan Magenheimer
2009-09-01  0:22                               ` Jeremy Fitzhardinge [this message]
2009-09-01 13:54                                 ` Dan Magenheimer
2009-09-01 14:34                                   ` Keir Fraser
2009-09-01 14:53                                     ` Dan Magenheimer
2009-09-01 15:08                                       ` Keir Fraser
2009-09-01 15:26                                         ` Dan Magenheimer
2009-09-01 15:32                                           ` Jan Beulich
2009-09-01 15:56                                             ` Dan Magenheimer
2009-09-01 16:04                                               ` Jan Beulich
2009-09-01 16:41                                                 ` Dan Magenheimer
2009-09-02  7:05                                                   ` Jan Beulich
2009-09-01 21:25                                                 ` Keir Fraser
2009-09-01 22:08                                                   ` Dan Magenheimer
2009-09-01 22:21                                                     ` Jeremy Fitzhardinge
2009-09-01 22:41                                                       ` Dan Magenheimer
2009-09-01 23:26                                                         ` Jeremy Fitzhardinge
2009-09-02  7:20                                                           ` Keir Fraser
2009-09-02 21:44                                                             ` Jeremy Fitzhardinge
2009-09-02 21:50                                                               ` Keir Fraser
2009-09-02 22:05                                                                 ` Jeremy Fitzhardinge
2009-09-03  8:23                                                                   ` Jan Beulich
2009-09-03 17:29                                                                     ` Jeremy Fitzhardinge
2009-09-04  7:19                                                                       ` Jan Beulich
2009-09-04 15:44                                                                         ` Jeremy Fitzhardinge
2009-09-03 14:22                                                                   ` Dan Magenheimer
2009-09-02  7:16                                                     ` Jan Beulich
2009-09-02  7:01                                                   ` Jan Beulich
2009-09-01 16:06                                               ` Keir Fraser
2009-09-01 16:55                                                 ` Dan Magenheimer
2009-09-01 15:43                                           ` Keir Fraser
2009-08-28 17:49                           ` write_tsc in a PV domain? Dan Magenheimer
2009-08-28 17:02                     ` Jeremy Fitzhardinge
2009-08-28 17:49                       ` Dan Magenheimer
2009-08-28 23:01                         ` Jeremy Fitzhardinge
2009-08-29 17:51                           ` Dan Magenheimer
2009-08-31 18:11                             ` Dan Magenheimer
2009-08-31 19:06                               ` Keir Fraser
2009-08-31 21:06                                 ` Dan Magenheimer
2009-09-01  7:16                                   ` Keir Fraser
2009-08-31 19:18                               ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2009-08-28 19:56 rdtsc: correctness vs performance on Xen (and KVM?) Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A9C693C.3020704@goop.org \
    --to=jeremy@goop.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.