* rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen)
@ 2009-09-09 18:05 Dan Magenheimer
2009-09-09 18:59 ` Jeremy Fitzhardinge
2009-09-10 10:04 ` Jan Beulich
0 siblings, 2 replies; 5+ messages in thread
From: Dan Magenheimer @ 2009-09-09 18:05 UTC (permalink / raw)
To: Xen-Devel (E-mail); +Cc: Jeremy Fitzhardinge, Keir Fraser, Jan Beulich
(Although Jeremy and others are still discussing how to
implement vsyscall+pvclock for upstream Linux, I am still
looking for a way to allow apps to use rdtsc without
suffering the performance loss from rdtsc emulation
so I've begun a new thread.)
To recap: In order to properly implement the required
semantics of the rdtsc instruction in a virtual environment,
the current Xen method of allowing the rdtsc instruction
to execute natively is insufficient and may lead to
random failure, possibly resulting in data loss.
Upstream Xen now has a boot option to force rdtsc
to be emulated for both hvm and pv guests. Soon
this will be controlled by a per-guest vm.cfg option.
The default will likely be emulation.
However, some apps do tens-to-hundreds of thousands
of rdtsc's per core per second. On my dual-core Conroe
box, an rdtsc instruction takes about 22ns in hardware
and about 360ns to emulate. So emulation may slow
performance in the worst case by as much as 5-10%.
Vsyscall+pvclock in upstream 64-bit Linux may be
the right answer at some point in the future. BUT
(IMPORTANT NEW POINT!!!) the pvclock algorithm requires
an rdtsc instruction, and there is no way to
emulate some guest rdtsc instructions (e.g. only
those in apps) and not others (e.g. only those in
the kernel). Thus, for guests that have rdtsc emulation
enabled, vsyscall+pvclock will be SLOWER than emulation,
thus meaning it is still not a palatable alternative.
I'm looking for something that provides correctness
TODAY with less of a performance hit AND does not
require guest operating systems to change. (App
changes and Xen changes are allowed.)
Previous attempts have run into insurmountable x86
architecture barriers (see the previous thread).
But it recently occurred to me to compare the
performance of a hypercall vs rdtsc emulation.
The results are promising, at least on 64-bit guests:
rdtsc native: 22ns
rdtsc emulated: 360ns
nearly-NULL hypercall (32b guest): 260ns
nearly-NULL hypercall (64b guest): 125ns
(Note these measurements are normal kernel-land
hypercalls.) Currently all hypercalls from userland
are illegal, but this need not be the case for ALL
hypercalls. Is it possible
for Xen to implement a "rdtsc hypercall" that
is executable from userland, without requiring
OS changes? Early discussions look promising.
Certainly, it makes sense to implement a normal
kernel-callable rdtsc hypercall so that
vsyscall+pvclock can execute more quickly.
I'll be taking a look at that, but I'd be grateful
for assistance in architecting a userland hypercall
mechanism that will work for "hyper-rdtsc".
(While implementing a userland "hyper-rdtsc" is
highest priority, I'd also be interested in whether
the mechanism can be more generic... I'd like
to explore the use of tmem from apps, Ian
Pratt has suggested that userland hypercalls
might be interesting for blktap, and there are
probably other OS-independent ideas to explore
assuming security issues can be handled.)
Thanks,
Dan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen)
2009-09-09 18:05 rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen) Dan Magenheimer
@ 2009-09-09 18:59 ` Jeremy Fitzhardinge
2009-09-10 9:59 ` Jan Beulich
2009-09-10 10:04 ` Jan Beulich
1 sibling, 1 reply; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2009-09-09 18:59 UTC (permalink / raw)
To: Dan Magenheimer; +Cc: Xen-Devel (E-mail), Keir Fraser, Jan Beulich
On 09/09/09 11:05, Dan Magenheimer wrote:
> BUT
> (IMPORTANT NEW POINT!!!) the pvclock algorithm requires
> an rdtsc instruction, and there is no way to
> emulate some guest rdtsc instructions (e.g. only
> those in apps) and not others (e.g. only those in
> the kernel). Thus, for guests that have rdtsc emulation
> enabled, vsyscall+pvclock will be SLOWER than emulation,
> thus meaning it is still not a palatable alternative.
>
You could enable/disable emulation rdtsc each context switch according
to the app's desires/requrements. It would require an extra hypercall
per context switch, but it could be batched with the others, resulting
in little marginal cost. It would, however, leave the kernel's use of
rdtsc in a confused state.
J
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen)
2009-09-09 18:59 ` Jeremy Fitzhardinge
@ 2009-09-10 9:59 ` Jan Beulich
0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2009-09-10 9:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge, Dan Magenheimer; +Cc: Xen-Devel (E-mail), Keir Fraser
>>> Jeremy Fitzhardinge <jeremy@goop.org> 09.09.09 20:59 >>>
>You could enable/disable emulation rdtsc each context switch according
>to the app's desires/requrements. It would require an extra hypercall
>per context switch, but it could be batched with the others, resulting
>in little marginal cost. It would, however, leave the kernel's use of
>rdtsc in a confused state.
Not necessarily: There could be two flags, one saying app rdtsc needs to
be emulated, and a second one for the kernel ones'. Unless a pv kernel
wants this, its (emulated) reads could still return the real (hardware)
value rather than the calculated, 1GHz-based one.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen)
2009-09-09 18:05 rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen) Dan Magenheimer
2009-09-09 18:59 ` Jeremy Fitzhardinge
@ 2009-09-10 10:04 ` Jan Beulich
2009-09-10 10:14 ` Ian Pratt
1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2009-09-10 10:04 UTC (permalink / raw)
To: Dan Magenheimer; +Cc: Jeremy Fitzhardinge, Xen-Devel (E-mail), Keir Fraser
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 09.09.09 20:05 >>>
>(Note these measurements are normal kernel-land
>hypercalls.) Currently all hypercalls from userland
>are illegal, but this need not be the case for ALL
>hypercalls. Is it possible
>for Xen to implement a "rdtsc hypercall" that
>is executable from userland, without requiring
>OS changes? Early discussions look promising.
While possible, I'd suspect that the good performance you see for
64-bits wouldn't hold: You can't (without potential for ambiguity)
re-use syscall for this purpose, and the alternative ways (interrupt
or call gate) are likely to be more in the performance range of what
you measured for 32-bits, which doesn't seem that much better
than emulated rdtsc.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Re: rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen)
2009-09-10 10:04 ` Jan Beulich
@ 2009-09-10 10:14 ` Ian Pratt
0 siblings, 0 replies; 5+ messages in thread
From: Ian Pratt @ 2009-09-10 10:14 UTC (permalink / raw)
To: Jan Beulich, Dan Magenheimer
Cc: Ian, Jeremy Fitzhardinge, Xen-Devel (E-mail), Pratt, Keir Fraser
> While possible, I'd suspect that the good performance you see for
> 64-bits wouldn't hold: You can't (without potential for ambiguity)
> re-use syscall for this purpose.
I'll bet all current 64b PV OSes use EAX as a simple system call number, so it's probably possible to do something hacky with negative values, after suitable auditing of current PV OSes and other common OSes. Not pretty, but I wouldn't throw the scheme out if an audit confirms the behaviour.
Ian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-09-10 10:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-09 18:05 rdtsc hypercall, from userland?!? (was: rdtsc: correctness vs performance on Xen) Dan Magenheimer
2009-09-09 18:59 ` Jeremy Fitzhardinge
2009-09-10 9:59 ` Jan Beulich
2009-09-10 10:04 ` Jan Beulich
2009-09-10 10:14 ` Ian Pratt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.