All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Jan Beulich <JBeulich@novell.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
	"Xen-Devel (E-mail)" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: RE: rdtsc: correctness vs performance on Xen 	 (and KVM?)
Date: Thu, 03 Sep 2009 10:29:40 -0700	[thread overview]
Message-ID: <4A9FFD04.4020908@goop.org> (raw)
In-Reply-To: <4A9F9917020000780001320C@vpn.id2.novell.com>

On 09/03/09 01:23, Jan Beulich wrote:
>>   1. Add a hypercall to set the desired location of the clock
>>      correction info rather than putting it in the shared-info area
>>      (akin to vcpu placement).  KVM already has this; they write the
>>      address to a magic MSR.
>>     
> But this is already subject to placement, as it's part of the vcpu_info
> structure. While of course you don't want to make the whole vcpu_info
> visible to guests, it would seem awkward to further segregate the
> shared_info pieces. I'd rather consider adding a second (optional) copy
> of it, since the updating of this is rather little overhead in Xen,

Hm, I guess that's possible.  Though once you've added a new "other time
struct" pointer, it would be easier to just make Xen update that pointer
rather than update two.  I don't think a guest is going to know/care
about having two versions of the info (except that it opens the
possibility of getting confused by looking at the wrong one).

I'd propose that there'd be just one, and the non-valid pvclock
structure have its version set to 0xffffffff, since a guest should never
see a version in that state.

>  but
> using this in the kernel time handling code would eliminate the
> potential for accessing all the vcpu_info fields using percpu_read().
>   

I don't think that's a big concern.  The kernel's pvclock handing is
common between Xen and KVM now, and it just gets a pointer to the
structure; it never accesses it as a percpu variable.

>>   2. Pack all the clock structures into a single page, indexed by vcpu
>>      number
>>     
> That adds a scalability issue, albeit a relatively light one: You shouldn't
> anymore assume there's a limit on the number of vCPU-s. 
>   

Well, that's up to the kernel rather than Xen.  If there a lot of CPUs
it can span multiple pages.  There's no need to make them physically
contiguous, since the kernel never needs to treat them as an array and
we can map disjoint pages contiguously into userspace (it might take a
chunk of fixmap slots).

I guess one concern is that it ends up exposing the scheduling info
about all the VCPUs to all usermode.  I doubt that's a problem in
itself, but who knows if it could be used as part of a larger attack.

>>   5. On context switch, the kernel would increment the version of the
>>      *old* vcpu clock structure, so that when the usermode code
>>      re-checks the version at the end of its time calculation, it can
>>      tell that it has a stale vcpu and it needs to iterate with a new
>>      vcpu+clock structure
>>     
> I don't think you can re-use the hypervisor updated version field here,
> unless you add a protocol on how the two updaters avoid collision.
> struct vcpu_time_info has a padding field, which might be designated
> as guest-kernel-version.
>   

There's no padding.  It would be an extension of the pvclock ABI, which
KVM also implements, so we'd need to make sure they can cope too.

We only need to worry about Xen preempting a kernel update rather than
the other way around.  I think it ends up being very simple:

void ctxtsw_update_pvclock(struct pvclock_vcpu_time_info *pvclock)
{
	BUG_ON(preemptible());	/* Switching VCPUs would be a disaster */

	/*
	 * We just need to update version; if Xen did it behind our back, then
	 * that's OK with us.  We should never see an update-in-progress because Xen
	 * will always completely update the pvclock structure before rescheduling the
	 * VCPU, so version should always be even.  We don't care if Xen updates the
	 * timing parameters here because we're not in the middle of a clock read.
	 * Usermode might be in the middle of a read, but all it needs to see is version
	 * changing to a new even number, even if this add gets preempted by Xen in
	 * the middle.  There are no cross-PCPU writes going on, so we don't need to
	 * worry about bus-level atomicity.
	 */
	pvclock->version += 2;
}

Looks like this would work for KVM too.

    J

  reply	other threads:[~2009-09-03 17:29 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-25 21:54 write_tsc in a PV domain? Dan Magenheimer
2009-08-25 22:28 ` Jeremy Fitzhardinge
2009-08-25 23:09   ` Dan Magenheimer
2009-08-26  6:23     ` Keir Fraser
2009-08-26 15:42       ` Dan Magenheimer
2009-08-26 15:58         ` Keir Fraser
2009-08-26 19:45         ` Jeremy Fitzhardinge
2009-08-26 20:23           ` Dan Magenheimer
2009-08-26 22:30             ` Jeremy Fitzhardinge
2009-08-26 23:10               ` Dan Magenheimer
2009-08-27  8:39                 ` Chris Lalancette
2009-08-27 13:00                   ` Dan Magenheimer
2009-08-27 13:17                     ` Chris Lalancette
2009-08-27  8:48               ` Alan Cox
2009-08-27 19:10                 ` Jeremy Fitzhardinge
2009-08-28  3:29                   ` Dan Magenheimer
2009-08-28  9:49                     ` Alan Cox
2009-08-28 15:16                       ` Dan Magenheimer
2009-08-28 15:30                         ` Alan Cox
2009-08-28 17:49                           ` rdtsc: correctness vs performance on Xen (and KVM?) Dan Magenheimer
2009-08-31 23:52                             ` Dan Magenheimer
2009-09-01  0:22                               ` Jeremy Fitzhardinge
2009-09-01 13:54                                 ` Dan Magenheimer
2009-09-01 14:34                                   ` Keir Fraser
2009-09-01 14:53                                     ` Dan Magenheimer
2009-09-01 15:08                                       ` Keir Fraser
2009-09-01 15:26                                         ` Dan Magenheimer
2009-09-01 15:32                                           ` Jan Beulich
2009-09-01 15:56                                             ` Dan Magenheimer
2009-09-01 16:04                                               ` Jan Beulich
2009-09-01 16:41                                                 ` Dan Magenheimer
2009-09-02  7:05                                                   ` Jan Beulich
2009-09-01 21:25                                                 ` Keir Fraser
2009-09-01 22:08                                                   ` Dan Magenheimer
2009-09-01 22:21                                                     ` Jeremy Fitzhardinge
2009-09-01 22:41                                                       ` Dan Magenheimer
2009-09-01 23:26                                                         ` Jeremy Fitzhardinge
2009-09-02  7:20                                                           ` Keir Fraser
2009-09-02 21:44                                                             ` Jeremy Fitzhardinge
2009-09-02 21:50                                                               ` Keir Fraser
2009-09-02 22:05                                                                 ` Jeremy Fitzhardinge
2009-09-03  8:23                                                                   ` Jan Beulich
2009-09-03 17:29                                                                     ` Jeremy Fitzhardinge [this message]
2009-09-04  7:19                                                                       ` Jan Beulich
2009-09-04 15:44                                                                         ` Jeremy Fitzhardinge
2009-09-03 14:22                                                                   ` Dan Magenheimer
2009-09-02  7:16                                                     ` Jan Beulich
2009-09-02  7:01                                                   ` Jan Beulich
2009-09-01 16:06                                               ` Keir Fraser
2009-09-01 16:55                                                 ` Dan Magenheimer
2009-09-01 15:43                                           ` Keir Fraser
2009-08-28 17:49                           ` write_tsc in a PV domain? Dan Magenheimer
2009-08-28 17:02                     ` Jeremy Fitzhardinge
2009-08-28 17:49                       ` Dan Magenheimer
2009-08-28 23:01                         ` Jeremy Fitzhardinge
2009-08-29 17:51                           ` Dan Magenheimer
2009-08-31 18:11                             ` Dan Magenheimer
2009-08-31 19:06                               ` Keir Fraser
2009-08-31 21:06                                 ` Dan Magenheimer
2009-09-01  7:16                                   ` Keir Fraser
2009-08-31 19:18                               ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A9FFD04.4020908@goop.org \
    --to=jeremy@goop.org \
    --cc=JBeulich@novell.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.