All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: kurt.hackel@oracle.com,
	"Xen-Devel (E-mail)" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir.fraser@eu.citrix.com>,
	Jan Beulich <JBeulich@novell.com>
Subject: Re: rdtscP and xen (and maybe the app-tsc answer I've been looking for)
Date: Mon, 21 Sep 2009 17:42:01 -0700	[thread overview]
Message-ID: <4AB81D59.3030203@goop.org> (raw)
In-Reply-To: <8f2bf802-e080-40dc-8315-f2eea33c7926@default>

On 09/21/09 17:11, Dan Magenheimer wrote:
>>> Yes, I neglected an important pre-condition.  ASSUME the first
>>> rdtscp on pcpu-A gets a version mismatch so that it must fetch
>>> the parameters again.  Then: the vcpu switches pcpu TWICE
>>> from pcpu-A to pcpu-B and back to pcpu-A and does rdtscp
>>> each time on pcpu-A but reads one or more pvclock parameters
>>> (that are too big to be encoded in TSC_AUX) on pcpu-B.
>>>
>>> I agree that this is vanishingly low probability but on
>>> a pcpu-oversubscribed machine I think it only takes one
>>> vcpu-to-pcpu reschedule and then a poorly timed interrupt that
>>> causes the vcpu to be unscheduled, and then later rescheduled
>>> on the original processor.
>>>   
>>>       
>> Sure.  It just has to keep iterating until it gets consistency.  If it
>> iterates too long (10 times?  100? 1000?) it should give up and assume
>> something is inherently broken.
>>     
> No, I'm not talking about iteration.  In the scenario I'm
> trying to describe, the version number hasn't changed on
> pcpu-A so the algorithm doesn't iterate.
>   

Well, not "change" so much as "not updated".  If the program keeps doing
a rdtsc which shows that its local copy of the parameters is out of
date, but its attempts to get up-to-date parameters keeps failing
(because it keeps migrating cpus), then it will keep iterating without
converging.  Specifically, the algorithm would be:

	u64 tsc, time_ns;
	u32 aux;
	unsigned int version, cpu;
again:
	rdtscp(&tsc, &aux);
	cpu = aux >> 24;		/* physical cpu */
	version = aux & ((1 << 24) - 1);
	
	/* At this point tsc and cpu+version are all fetched
	   atomically and consistent, so context switch doesn't
	   matter here; apply_fixup is not dependent on currently
	   executing cpu. */

	/* note that this prob. needs some local synchronization if
	   the usermode program is multithreaded... */
	if (unlikely(version != pvclockinfo[cpu].version)) {
		struct pvclock info;
		int curcpu;		/* again, physical cpu */

		/* Always fetches current cpu parameters,
		   and tells us which cpu it is for.  If we
		   switched cpus since the rdtscp we won't end
		   up updating the out-of-date info we detected
		   but that doesn't matter because... */
		curcpu = get_new_pvclock_info(&info);
		pvclockinfo[curcpu] = info;

		/* ...we repeat assuming that we're almost certainly
		   still on the same cpu when we do rdtscp again */
		goto again;
	}

	time_ns = apply_fixup(tsc, &pvclockinfo[cpu]);


get_new_pvclock_info() can either be a syscall, hypercall or some other
mechanism which
can get a good atomic snapshot of the params along with cpu number from
a shared memory region.

> I realized after I sent this that I'm not really sure
> I understand the pvclock implementation, particularly
> under what circumstances the version number changes
> or doesn't.  And if this is different in any way
> than the versions you are proposing that the app
> would see.  So I'm not positive we are considering
> the same cases.
>   

The pvclock algorithm only changes the version if the either the tsc
offset or scale have changed.  In the standard pvclock algorithm, a vcpu
sees its own pvclock version change if
either the pcpu undergoes some change which affects the tsc, *or* if the
vcpu gets scheduled on a new pcpu (which could have different
offset/scale). 

In the case we're talking about above, the code isn't pinned to a
particular pcpu or vcpu (as it is usermode code with no real control
over the kernel or xen schedulers), so it has to cope with preempt at
any point.  That's simplified by having the tsc and metadata fetch
atomic, so it can revalidate its parameters every time it fetches the
tsc.  In that case, Xen need only update its internal version numbers
when there's an actual change to the tsc's offset/scale without regard
to vcpu scheduling.  (And of course if the offset/scale end up being
constant, then it will never need to update the offset, and usermode
will only ever end up fetching it once per cpu.)

    J

  reply	other threads:[~2009-09-22  0:42 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-18 16:30 rdtscP and xen (and maybe the app-tsc answer I've been looking for) Dan Magenheimer
2009-09-18 20:27 ` Dan Magenheimer
2009-09-18 22:55   ` Jeremy Fitzhardinge
2009-09-19 15:34     ` Dan Magenheimer
2009-09-21 14:47       ` Dan Magenheimer
2009-09-21 18:36       ` Jeremy Fitzhardinge
2009-09-21 22:20         ` Dan Magenheimer
2009-09-21 22:50           ` Jeremy Fitzhardinge
2009-09-21 23:29             ` Dan Magenheimer
2009-09-21 23:55               ` Jeremy Fitzhardinge
2009-09-22  0:11                 ` Dan Magenheimer
2009-09-22  0:42                   ` Jeremy Fitzhardinge [this message]
2009-09-22 19:36                 ` Dan Magenheimer
2009-09-22 19:52                   ` Jeremy Fitzhardinge
2009-09-22 20:22                     ` Dan Magenheimer
2009-09-22 22:18                       ` Jeremy Fitzhardinge
2009-09-22  7:44               ` Jan Beulich
2009-09-22 15:00                 ` Dan Magenheimer
2009-09-22 15:16                   ` Jan Beulich
2009-09-22 17:15                     ` Jeremy Fitzhardinge
2009-09-22  7:39         ` Jan Beulich
2009-09-22 17:26           ` Jeremy Fitzhardinge
2009-09-21  8:17   ` Jan Beulich
2009-09-21 14:04     ` Dan Magenheimer
2009-09-21 14:18       ` Jan Beulich
2009-09-21 15:25         ` Dan Magenheimer
2009-09-21 15:41           ` Keir Fraser
2009-09-21 15:53             ` Keir Fraser
2009-09-21 16:55               ` Dan Magenheimer
2009-09-21 17:02                 ` Keir Fraser
2009-09-21 17:56                   ` Dan Magenheimer
2009-09-21 18:17                     ` Keir Fraser
2009-09-21 21:47                       ` Dan Magenheimer
2009-09-21 16:03           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AB81D59.3030203@goop.org \
    --to=jeremy@goop.org \
    --cc=JBeulich@novell.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=kurt.hackel@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.