All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keir Fraser <keir.fraser@eu.citrix.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Xen-Devel (E-mail)" <xen-devel@lists.xensource.com>
Subject: Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
Date: Sat, 21 Nov 2009 19:50:15 +0000	[thread overview]
Message-ID: <C72DF4F7.ACA%keir.fraser@eu.citrix.com> (raw)
In-Reply-To: <fae586af-0866-42b3-b309-4927042317c7@default>

What happens if you add BIG_OFFSET rather than subtract it? You'll be
creating some big 64-bit TSC stamps otherwise, which we'd never normally
expect to reach with a 64-bit-wide counter. Also you will be wrapping the
vTSC some time fairly soon after boot.

Just an easy thing to try. Other than that the patch does look plausible.

You see the high rate while the guest is idle? And 'normal' RDTSC rate is
hundreds per second?

 -- Keir

On 20/11/2009 23:45, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> Hi Jeremy/Keir (and any other PV time experts out there) --
> 
> Working on tsc_mode stuff I've run into a roadblock where
> there is some time-related interaction between xen and a
> PV kernel that I don't understand.  I'm hoping you
> might provide a clue.  There's also a reasonable chance
> that this might be uncovering a significant bug that's
> been around awhile, but never noticed as other than
> a barely noticeable vague slowdown because rdtsc was
> unemulated (and "fast").
> 
> The problem:
> 
> In order to preserve TSC across save/restore/migrate, I
> have implemented a "tsc offset" (and also a "tsc scale"
> but that isn't used yet).
> 
> The result is that the PV kernel starts doing rdtsc's at
> a VERY high frequency (1 MILLION / sec).  I suspect this
> may be a variation of what Jeremy reported at one point
> when emulated rdtsc was first in-tree, but seemed to go away.
> 
> By adding some debug code (and confirmed with xenctx)
> I can see that the millions of rdtsc's are half in
> get_nsec_offset() and half in do_gettimeofday() (presumably
> inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
> PV kernel, not upstream.  Poring through the 2.6.18 PV time
> code, I can find several places where an essentially infinite
> loop might happen if the version fields are wacko, but
> none where the timestamp contents make any difference
> in control flow, so don't see how modifying these
> values (by adding the offset) could cause a behavioral
> change in Linux, but obviously a big change is happening!
> 
> I can reproduce the problem with a very simple patch
> on xen-unstable that adds a fake fixed offset in the
> three places I add the "tsc offset", see attached.
> By changing BIG_OFFSET to 0, in this patch, the
> frequency of rdtsc's becomes normal again.
> 
> Suspecting some interaction with wallclock time, I
> tried shutting off ntpd and with/without independent
> wallclock in the PV guest.  No difference.
> 
> I also added debug code to see if the Xen-side code
> was churning through version numbers... it is not.
> 
> Any ideas?  (And, sorry, I know you're on a trans-
> hemisphere trip right now.)
> 
> Thanks,
> Dan
> 
> diff -r bec27eb6f72c xen/arch/x86/time.c
> --- a/xen/arch/x86/time.c Sat Nov 14 10:32:59 2009 +0000
> +++ b/xen/arch/x86/time.c Fri Nov 20 16:58:18 2009 -0500
> @@ -813,6 +813,8 @@ s_time_t get_s_time(void)
>  #define version_update_begin(v) (((v)+1)|1)
>  #define version_update_end(v)   ((v)+1)
>  
> +#define BIG_OFFSET 10000000000ULL
> +
>  static void __update_vcpu_system_time(struct vcpu *v, int force)
>  {
>      struct cpu_time       *t;
> @@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
>  
>      /* Don't bother unless timestamps have changed or we are forced. */
>      if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
> -                                         ? t->stime_local_stamp
> +                                         ? t->stime_local_stamp - BIG_OFFSET
>                                           : t->local_tsc_stamp)) )
>          return;
>  
> @@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
>  
>      if ( v->domain->arch.vtsc )
>      {
> -        _u.tsc_timestamp     = t->stime_local_stamp;
> -        _u.system_time       = t->stime_local_stamp;
> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
>          _u.tsc_to_system_mul = 0x80000000u;
>          _u.tsc_shift         = 1;
>      }
> @@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
>  
>      spin_unlock(&v->domain->arch.vtsc_lock);
>  
> +    now -= BIG_OFFSET;
> +
>      regs->eax = (uint32_t)now;
>      regs->edx = (uint32_t)(now >> 32);
>  }

  parent reply	other threads:[~2009-11-21 19:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-20 23:45 Bizarre pv kernel ultra-high frequency rdtsc?!? Dan Magenheimer
2009-11-21 17:31 ` Dan Magenheimer
2009-11-21 20:02   ` Keir Fraser
2009-11-21 21:24     ` Dan Magenheimer
2009-11-21 19:50 ` Keir Fraser [this message]
2009-11-21 21:45 ` Keir Fraser
2009-11-21 22:05   ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C72DF4F7.ACA%keir.fraser@eu.citrix.com \
    --to=keir.fraser@eu.citrix.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.