Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Vrabel <dvrabel@cantab.net>
To: Andy Lutomirski <luto@amacapital.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kvm list <kvm@vger.kernel.org>
Subject: Re: [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
Date: Thu, 08 Jan 2015 12:51:27 +0000	[thread overview]
Message-ID: <54AE7D4F.2040300@cantab.net> (raw)
In-Reply-To: <8d09c16eb39cbe264417cc66c4aca730af10b70b.1419295081.git.luto@amacapital.net>

On 23/12/2014 00:39, Andy Lutomirski wrote:
> The pvclock vdso code was too abstracted to understand easily and
> excessively paranoid.  Simplify it for a huge speedup.
>
> This opens the door for additional simplifications, as the vdso no
> longer accesses the pvti for any vcpu other than vcpu 0.
>
> Before, vclock_gettime using kvm-clock took about 64ns on my machine.
> With this change, it takes 19ns, which is almost as fast as the pure TSC
> implementation.

Xen guests don't use any of this at the moment, and I don't think this 
change would prevent us from using it in the future, so:

Acked-by: David Vrabel <david.vrabel@citrix.com>

But see some additional comments below.

> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -78,47 +78,59 @@ static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>
>   static notrace cycle_t vread_pvclock(int *mode)
>   {
> -	const struct pvclock_vsyscall_time_info *pvti;
> +	const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;

Xen updates pvti when scheduling a VCPU.  Using 0 here requires that 
VCPU 0 has been recently scheduled by Xen.  Perhaps using the current 
CPU here would be better?  It doesn't matter if the task is subsequently 
moved to a different CPU before using pvti.

> +	 * Note: The kernel and hypervisor must guarantee that cpu ID
> +	 * number maps 1:1 to per-CPU pvclock time info.
> +	 *
> +	 * Because the hypervisor is entirely unaware of guest userspace
> +	 * preemption, it cannot guarantee that per-CPU pvclock time
> +	 * info is updated if the underlying CPU changes or that that
> +	 * version is increased whenever underlying CPU changes.
> +	 *
> +	 * On KVM, we are guaranteed that pvti updates for any vCPU are
> +	 * atomic as seen by *all* vCPUs.  This is an even stronger
> +	 * guarantee than we get with a normal seqlock.
>   	 *
> +	 * On Xen, we don't appear to have that guarantee, but Xen still
> +	 * supplies a valid seqlock using the version field.
> +
> +	 * We only do pvclock vdso timing at all if
> +	 * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> +	 * mean that all vCPUs have matching pvti and that the TSC is
> +	 * synced, so we can just look at vCPU 0's pvti.

I think this is a much stronger requirement than you actually need.

You only require:

- the system time (pvti->system_time) for all pvti's is synchronized; and
- TSC is synchronized; and
- the pvti has been updated sufficiently recently (so the error in the 
result is within acceptable margins).

Can you add documentation to arch/x86/include/asm/pvclock-abi.h to 
describe what properties PVCLOCK_TSC_STABLE_BIT guarantees?

David

WARNING: multiple messages have this Message-ID (diff)

From: David Vrabel <dvrabel@cantab.net>
To: Andy Lutomirski <luto@amacapital.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kvm list <kvm@vger.kernel.org>
Subject: Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
Date: Thu, 08 Jan 2015 12:51:27 +0000	[thread overview]
Message-ID: <54AE7D4F.2040300@cantab.net> (raw)
In-Reply-To: <8d09c16eb39cbe264417cc66c4aca730af10b70b.1419295081.git.luto@amacapital.net>

On 23/12/2014 00:39, Andy Lutomirski wrote:
> The pvclock vdso code was too abstracted to understand easily and
> excessively paranoid.  Simplify it for a huge speedup.
>
> This opens the door for additional simplifications, as the vdso no
> longer accesses the pvti for any vcpu other than vcpu 0.
>
> Before, vclock_gettime using kvm-clock took about 64ns on my machine.
> With this change, it takes 19ns, which is almost as fast as the pure TSC
> implementation.

Xen guests don't use any of this at the moment, and I don't think this 
change would prevent us from using it in the future, so:

Acked-by: David Vrabel <david.vrabel@citrix.com>

But see some additional comments below.

> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -78,47 +78,59 @@ static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
>
>   static notrace cycle_t vread_pvclock(int *mode)
>   {
> -	const struct pvclock_vsyscall_time_info *pvti;
> +	const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;

Xen updates pvti when scheduling a VCPU.  Using 0 here requires that 
VCPU 0 has been recently scheduled by Xen.  Perhaps using the current 
CPU here would be better?  It doesn't matter if the task is subsequently 
moved to a different CPU before using pvti.

> +	 * Note: The kernel and hypervisor must guarantee that cpu ID
> +	 * number maps 1:1 to per-CPU pvclock time info.
> +	 *
> +	 * Because the hypervisor is entirely unaware of guest userspace
> +	 * preemption, it cannot guarantee that per-CPU pvclock time
> +	 * info is updated if the underlying CPU changes or that that
> +	 * version is increased whenever underlying CPU changes.
> +	 *
> +	 * On KVM, we are guaranteed that pvti updates for any vCPU are
> +	 * atomic as seen by *all* vCPUs.  This is an even stronger
> +	 * guarantee than we get with a normal seqlock.
>   	 *
> +	 * On Xen, we don't appear to have that guarantee, but Xen still
> +	 * supplies a valid seqlock using the version field.
> +
> +	 * We only do pvclock vdso timing at all if
> +	 * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> +	 * mean that all vCPUs have matching pvti and that the TSC is
> +	 * synced, so we can just look at vCPU 0's pvti.

I think this is a much stronger requirement than you actually need.

You only require:

- the system time (pvti->system_time) for all pvti's is synchronized; and
- TSC is synchronized; and
- the pvti has been updated sufficiently recently (so the error in the 
result is within acceptable margins).

Can you add documentation to arch/x86/include/asm/pvclock-abi.h to 
describe what properties PVCLOCK_TSC_STABLE_BIT guarantees?

David

next prev parent reply	other threads:[~2015-01-08 12:51 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-23  0:39 [RFC 0/2] x86, vdso, pvclock: Cleanups and speedups Andy Lutomirski
2014-12-23  0:39 ` [RFC 1/2] x86, vdso: Use asm volatile in __getcpu Andy Lutomirski
2014-12-23  0:39 ` Andy Lutomirski
2014-12-23  0:39 ` [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader Andy Lutomirski
2014-12-23  0:39 ` Andy Lutomirski
2014-12-23 10:28   ` [Xen-devel] " David Vrabel
2014-12-23 10:28   ` David Vrabel
2014-12-23 15:14   ` Boris Ostrovsky
2014-12-23 15:14   ` [Xen-devel] " Boris Ostrovsky
2014-12-23 15:14     ` Paolo Bonzini
2014-12-23 15:14     ` [Xen-devel] " Paolo Bonzini
2014-12-23 15:25       ` Boris Ostrovsky
2014-12-23 15:25       ` [Xen-devel] " Boris Ostrovsky
2014-12-24 21:30   ` David Matlack
2014-12-24 21:30   ` David Matlack
2014-12-24 21:43     ` Andy Lutomirski
2014-12-24 21:43     ` Andy Lutomirski
2015-01-05 15:25   ` Marcelo Tosatti
2015-01-05 15:25   ` Marcelo Tosatti
2015-01-05 18:56     ` Andy Lutomirski
2015-01-05 19:17       ` Marcelo Tosatti
2015-01-05 22:38         ` Andy Lutomirski
2015-01-05 22:48           ` Marcelo Tosatti
2015-01-05 22:53             ` Andy Lutomirski
2015-01-05 22:53             ` Andy Lutomirski
2015-01-06  8:42             ` Paolo Bonzini
2015-01-06  8:42               ` Paolo Bonzini
2015-01-06 12:01               ` Paolo Bonzini
2015-01-06 16:56                 ` Andy Lutomirski
2015-01-06 16:56                 ` Andy Lutomirski
2015-01-06 18:13                   ` Marcelo Tosatti
2015-01-06 18:26                     ` Andy Lutomirski
2015-01-06 18:26                     ` Andy Lutomirski
2015-01-06 18:45                       ` Marcelo Tosatti
2015-01-06 18:45                       ` Marcelo Tosatti
2015-01-06 19:49                         ` Andy Lutomirski
2015-01-06 19:49                         ` Andy Lutomirski
2015-01-06 20:20                           ` Marcelo Tosatti
2015-01-06 20:20                           ` Marcelo Tosatti
2015-01-06 21:54                             ` Andy Lutomirski
2015-01-06 21:54                             ` Andy Lutomirski
2015-01-08 22:31                           ` Marcelo Tosatti
2015-01-08 22:31                           ` Marcelo Tosatti
2015-01-08 22:43                             ` Andy Lutomirski
2015-01-08 22:43                             ` Andy Lutomirski
2015-02-26 22:46                               ` Andy Lutomirski
2015-02-26 22:46                               ` Andy Lutomirski
2015-01-07  5:41                       ` Paolo Bonzini
2015-01-07  5:41                       ` Paolo Bonzini
2015-01-06 18:13                   ` Marcelo Tosatti
2015-01-07  5:38                   ` Paolo Bonzini
2015-01-07  7:18                     ` Andy Lutomirski
2015-01-07  7:18                     ` Andy Lutomirski
2015-01-07  9:00                       ` Paolo Bonzini
2015-01-07  9:00                       ` Paolo Bonzini
2015-01-07 14:45                       ` Marcelo Tosatti
2015-01-07 14:45                       ` Marcelo Tosatti
2015-01-07  5:38                   ` Paolo Bonzini
2015-01-05 22:48           ` Marcelo Tosatti
2015-01-05 22:38         ` Andy Lutomirski
2015-01-06  8:39         ` Paolo Bonzini
2015-01-06  8:39         ` Paolo Bonzini
2015-01-05 19:17       ` Marcelo Tosatti
2015-01-05 22:23       ` Paolo Bonzini
2015-01-05 22:23       ` Paolo Bonzini
2015-01-06 14:35       ` Konrad Rzeszutek Wilk
2015-01-06 14:35         ` [Xen-devel] " Konrad Rzeszutek Wilk
2015-01-05 18:56     ` Andy Lutomirski
2015-01-08 12:51   ` David Vrabel [this message]
2015-01-08 12:51     ` [Xen-devel] " David Vrabel
2014-12-23  7:21 ` [RFC 0/2] x86, vdso, pvclock: Cleanups and speedups Paolo Bonzini
2014-12-23  7:21 ` Paolo Bonzini
2014-12-23  8:16   ` Andy Lutomirski
2014-12-23  8:16   ` Andy Lutomirski
2014-12-23  8:30     ` Paolo Bonzini
2014-12-23  8:30     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54AE7D4F.2040300@cantab.net \
    --to=dvrabel@cantab.net \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.