From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
Dan Magenheimer <dan.magenheimer@oracle.com>,
Xen-devel <xen-devel@lists.xensource.com>,
kurt.hackel@oracle.com, the arch/x86 maintainers <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Glauber de Oliveira Costa <gcosta@redhat.com>,
Keir Fraser <keir.fraser@eu.citrix.com>,
Zach Brown <zach.brown@oracle.com>,
Chris Mason <chris.mason@oracle.com>
Subject: Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation
Date: Wed, 07 Oct 2009 12:25:32 +0200 [thread overview]
Message-ID: <4ACC6C9C.7080707@redhat.com> (raw)
In-Reply-To: <4ACB9074.1000804@goop.org>
On 10/06/2009 08:46 PM, Jeremy Fitzhardinge wrote:
>
>> Instead of using vgetcpu() and rdtsc() independently, you can use
>> rdtscp to read both atomically. This removes the need for the preempt
>> notifier.
>>
> rdtscp first appeared on Intel with Nehalem, so we need to support older
> Intel chips.
>
We can support them by falling back to the kernel. I'm a bit worried
about the kernel playing with the hypervisor's version field. It's
better to introduce yet a new version for the kernel, and check both.
> You could use rdscp to get (tsc,cpu) atomically, but that's not
> sufficient to be able to get a consistent snapshot of (tsc, time_info)
> because it doesn't give you the pvclock_vcpu_time_info version number.
> If TSC_AUX contained that too, it might be possible. Alternatively you
> could compare the tsc with pvclock.tsc_timestamp, but unfortunately the
> ABI doesn't specify that tsc_timestamp is updated in any particular
> order compared to the rest of the fields, so you still can't use that to
> get a consistent snapshot (we can revise the ABI, of course).
>
> So either way it doesn't avoid the need to iterate. vgetcpu will use
> rdtscp if available, but I agree it is unfortunate we need to do a
> redundant rdtsc in that case.
>
>
def try_pvclock_vtime():
tsc, p0 = rdtscp()
v0 = pvclock[p0].version
tsc, p = rdtscp()
t = pvclock_time(pvclock[p], tsc)
if p != p0 or pvclock[p].version != v0:
raise Exception("Processor or timebased change under our feet")
return t
def pvclock_time():
while True:
try:
return try_pvlock_time()
except:
pass
So, two rdtscps and two compares.
>>> + for (cpu = 0; cpu< nr_cpu_ids; cpu++)
>>> + pvclock_vsyscall_time_info[cpu].version = ~0;
>>> +
>>> + __set_fixmap(FIX_PVCLOCK_TIME_INFO,
>>> __pa(pvclock_vsyscall_time_info),
>>> + PAGE_KERNEL_VSYSCALL);
>>> +
>>> + preempt_notifier_init(&pvclock_vsyscall_notifier,
>>> +&pvclock_vsyscall_preempt_ops);
>>> + preempt_notifier_register(&pvclock_vsyscall_notifier);
>>> +
>>>
>> preempt notifiers are per-thread, not global, and will upset the cycle
>> counters.
>>
> Ah, so I need to register it on every new thread? That's a bit awkward.
>
It's used to manage processor registers, much like the fpu. If a thread
uses a register that's not saved and restored by the normal context
switch code, it can register a preempt notifier to do that instead.
> This is intended to satisfy the cycle-counters who want to do
> gettimeofday a million times a second, where I guess the tradeoff of
> avoiding a pile of syscalls is worth a bit of context-switch overhead.
>
It's sufficient to increment a version counter on thread migration, no
need to do it on context switch.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>,
kurt.hackel@oracle.com, the arch/x86 maintainers <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Glauber de Oliveira Costa <gcosta@redhat.com>,
Xen-devel <xen-devel@lists.xensource.com>,
Keir Fraser <keir.fraser@eu.citrix.com>,
Zach Brown <zach.brown@oracle.com>,
Chris Mason <chris.mason@oracle.com>
Subject: Re: Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation
Date: Wed, 07 Oct 2009 12:25:32 +0200 [thread overview]
Message-ID: <4ACC6C9C.7080707@redhat.com> (raw)
In-Reply-To: <4ACB9074.1000804@goop.org>
On 10/06/2009 08:46 PM, Jeremy Fitzhardinge wrote:
>
>> Instead of using vgetcpu() and rdtsc() independently, you can use
>> rdtscp to read both atomically. This removes the need for the preempt
>> notifier.
>>
> rdtscp first appeared on Intel with Nehalem, so we need to support older
> Intel chips.
>
We can support them by falling back to the kernel. I'm a bit worried
about the kernel playing with the hypervisor's version field. It's
better to introduce yet a new version for the kernel, and check both.
> You could use rdscp to get (tsc,cpu) atomically, but that's not
> sufficient to be able to get a consistent snapshot of (tsc, time_info)
> because it doesn't give you the pvclock_vcpu_time_info version number.
> If TSC_AUX contained that too, it might be possible. Alternatively you
> could compare the tsc with pvclock.tsc_timestamp, but unfortunately the
> ABI doesn't specify that tsc_timestamp is updated in any particular
> order compared to the rest of the fields, so you still can't use that to
> get a consistent snapshot (we can revise the ABI, of course).
>
> So either way it doesn't avoid the need to iterate. vgetcpu will use
> rdtscp if available, but I agree it is unfortunate we need to do a
> redundant rdtsc in that case.
>
>
def try_pvclock_vtime():
tsc, p0 = rdtscp()
v0 = pvclock[p0].version
tsc, p = rdtscp()
t = pvclock_time(pvclock[p], tsc)
if p != p0 or pvclock[p].version != v0:
raise Exception("Processor or timebased change under our feet")
return t
def pvclock_time():
while True:
try:
return try_pvlock_time()
except:
pass
So, two rdtscps and two compares.
>>> + for (cpu = 0; cpu< nr_cpu_ids; cpu++)
>>> + pvclock_vsyscall_time_info[cpu].version = ~0;
>>> +
>>> + __set_fixmap(FIX_PVCLOCK_TIME_INFO,
>>> __pa(pvclock_vsyscall_time_info),
>>> + PAGE_KERNEL_VSYSCALL);
>>> +
>>> + preempt_notifier_init(&pvclock_vsyscall_notifier,
>>> +&pvclock_vsyscall_preempt_ops);
>>> + preempt_notifier_register(&pvclock_vsyscall_notifier);
>>> +
>>>
>> preempt notifiers are per-thread, not global, and will upset the cycle
>> counters.
>>
> Ah, so I need to register it on every new thread? That's a bit awkward.
>
It's used to manage processor registers, much like the fpu. If a thread
uses a register that's not saved and restored by the normal context
switch code, it can register a preempt notifier to do that instead.
> This is intended to satisfy the cycle-counters who want to do
> gettimeofday a million times a second, where I guess the tradeoff of
> avoiding a pile of syscalls is worth a bit of context-switch overhead.
>
It's sufficient to increment a version counter on thread migration, no
need to do it on context switch.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2009-10-07 10:27 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-06 0:50 [PATCH RFC] Extending pvclock down to usermode for vsyscall Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 0:50 ` [PATCH 1/5] x86/pvclock: make sure rdtsc doesn't speculate out of region Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 0:50 ` [PATCH 2/5] x86/pvclock: no need to use strong read barriers in pvclock_get_time_values Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 0:50 ` [PATCH 3/5] x86/pvclock: add vsyscall implementation Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 9:04 ` Avi Kivity
2009-10-06 9:04 ` Avi Kivity
2009-10-06 14:19 ` Dan Magenheimer
2009-10-06 14:19 ` Dan Magenheimer
2009-10-06 15:11 ` Avi Kivity
2009-10-06 15:11 ` Avi Kivity
2009-10-06 18:46 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:46 ` Jeremy Fitzhardinge
2009-10-07 10:25 ` Avi Kivity [this message]
2009-10-07 10:25 ` Avi Kivity
2009-10-07 19:29 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 19:29 ` Jeremy Fitzhardinge
2009-10-07 20:09 ` [Xen-devel] " Avi Kivity
2009-10-07 20:09 ` Avi Kivity
2009-10-07 21:19 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 21:19 ` Jeremy Fitzhardinge
2009-10-07 21:37 ` [Xen-devel] " Avi Kivity
2009-10-07 21:37 ` Avi Kivity
2009-10-07 21:51 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-07 21:51 ` Jeremy Fitzhardinge
2009-10-07 21:53 ` [Xen-devel] " Avi Kivity
2009-10-07 21:53 ` Avi Kivity
2009-10-07 20:48 ` [Xen-devel] " Dan Magenheimer
2009-10-07 20:48 ` Dan Magenheimer
2009-10-07 21:08 ` [Xen-devel] " Avi Kivity
2009-10-07 21:08 ` Avi Kivity
2009-10-07 22:36 ` [Xen-devel] " Dan Magenheimer
2009-10-07 22:36 ` Dan Magenheimer
2009-10-10 0:24 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-10 0:24 ` Jeremy Fitzhardinge
2009-10-10 18:10 ` [Xen-devel] " Avi Kivity
2009-10-10 18:10 ` Avi Kivity
2009-10-12 18:20 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-12 18:20 ` Jeremy Fitzhardinge
2009-10-12 18:29 ` [Xen-devel] " Avi Kivity
2009-10-12 18:29 ` Avi Kivity
2009-10-12 19:13 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-12 19:13 ` Jeremy Fitzhardinge
2009-10-13 6:39 ` [Xen-devel] " Avi Kivity
2009-10-13 6:39 ` Avi Kivity
2009-10-13 20:00 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-13 20:00 ` Jeremy Fitzhardinge
2009-10-14 12:32 ` [Xen-devel] " Avi Kivity
2009-10-14 12:32 ` Avi Kivity
2009-10-15 19:17 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-15 19:17 ` Jeremy Fitzhardinge
2009-10-27 17:29 ` [Xen-devel] " Dan Magenheimer
2009-10-27 17:29 ` Dan Magenheimer
2009-10-27 18:20 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-27 18:20 ` Jeremy Fitzhardinge
2009-10-28 5:52 ` [Xen-devel] " Avi Kivity
2009-10-28 5:52 ` Avi Kivity
2009-10-28 9:29 ` [Xen-devel] " Glauber Costa
2009-10-28 9:34 ` Avi Kivity
2009-10-28 9:34 ` Avi Kivity
2009-10-28 17:47 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-28 17:47 ` Jeremy Fitzhardinge
2009-10-29 12:13 ` [Xen-devel] " Avi Kivity
2009-10-29 12:13 ` Avi Kivity
2009-10-29 13:03 ` [Xen-devel] " Chris Mason
2009-10-29 13:03 ` Chris Mason
2009-10-29 14:46 ` [Xen-devel] " Dan Magenheimer
2009-10-29 14:46 ` Dan Magenheimer
2009-10-29 15:07 ` [Xen-devel] " Avi Kivity
2009-10-29 15:07 ` Avi Kivity
2009-10-29 15:55 ` [Xen-devel] " Dan Magenheimer
2009-10-29 15:55 ` Dan Magenheimer
2009-10-29 16:15 ` [Xen-devel] " Dan Magenheimer
2009-10-29 16:15 ` Dan Magenheimer
2009-11-01 9:28 ` [Xen-devel] " Avi Kivity
2009-11-01 9:28 ` Avi Kivity
2009-11-02 15:28 ` [Xen-devel] " Dan Magenheimer
2009-11-02 15:28 ` Dan Magenheimer
2009-11-02 15:41 ` [Xen-devel] " Avi Kivity
2009-11-02 15:41 ` Avi Kivity
2009-11-01 9:32 ` [Xen-devel] " Avi Kivity
2009-11-01 9:32 ` Avi Kivity
2009-11-02 15:46 ` [Xen-devel] " Dan Magenheimer
2009-11-02 15:46 ` Dan Magenheimer
2009-11-03 5:12 ` [Xen-devel] " Avi Kivity
2009-11-03 5:12 ` Avi Kivity
2009-11-04 20:30 ` [Xen-devel] " Dan Magenheimer
2009-11-04 20:30 ` Dan Magenheimer
2009-11-05 6:47 ` [Xen-devel] " Avi Kivity
2009-11-05 6:47 ` Avi Kivity
2009-11-05 14:52 ` [Xen-devel] " Dan Magenheimer
2009-11-05 14:52 ` Dan Magenheimer
2009-11-05 15:07 ` [Xen-devel] " Keir Fraser
2009-11-05 15:07 ` Keir Fraser
2009-11-04 21:19 ` [Xen-devel] " john stultz
2009-11-04 21:19 ` john stultz
2009-11-04 21:28 ` Dan Magenheimer
2009-11-04 21:28 ` Dan Magenheimer
2009-11-05 0:02 ` [Xen-devel] " john stultz
2009-11-05 0:02 ` john stultz
2009-11-05 0:45 ` [Xen-devel] " Dan Magenheimer
2009-11-05 0:45 ` Dan Magenheimer
2009-10-30 23:30 ` pvclock implementation in pv_ops kernel: why not __native_read_tsc()? Dan Magenheimer
2009-10-31 1:17 ` Jeremy Fitzhardinge
2009-10-06 0:50 ` [PATCH 4/5] x86/fixmap: add a predicate for usermode fixmaps Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 10:23 ` [Xen-devel] " Jan Beulich
2009-10-06 10:23 ` Jan Beulich
2009-10-06 18:47 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:47 ` Jeremy Fitzhardinge
2009-10-06 0:50 ` [PATCH 5/5] xen/time: add pvclock_clocksource_vread support Jeremy Fitzhardinge
2009-10-06 0:50 ` Jeremy Fitzhardinge
2009-10-06 10:28 ` [Xen-devel] " Jan Beulich
2009-10-06 10:28 ` Jan Beulich
2009-10-06 18:48 ` [Xen-devel] " Jeremy Fitzhardinge
2009-10-06 18:48 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ACC6C9C.7080707@redhat.com \
--to=avi@redhat.com \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=gcosta@redhat.com \
--cc=jeremy.fitzhardinge@citrix.com \
--cc=jeremy@goop.org \
--cc=keir.fraser@eu.citrix.com \
--cc=kurt.hackel@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xensource.com \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.