Re: System time monotonicity

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: System time monotonicity
       [not found] <47FFC37A.4060402@virtualiron.com>
@ 2008-04-11 21:20 ` Keir Fraser
  2008-04-11 21:41   ` Keir Fraser
  2008-04-11 22:22   ` System time monotonicity Dan Magenheimer
  0 siblings, 2 replies; 24+ messages in thread
From: Keir Fraser @ 2008-04-11 21:20 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, dan.magenheimer@oracle.com,
	xen-devel@lists.xensource.com, Ian Pratt

On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:

> I turned to the hpet as I became frustrated trying to solve the problem
> in xen with pit.
> One of the solutions proposed to the customer was a max(curr time, last
> time) modification to Linux.
> They didn't want that.
> (Keir, what Linux version are you looking at when you say Linux already
> has this modification?)

This is part of our own Xen-specific time patches for Linux.

> I had tried hpet before to solve the time backwards problem and knew it
> was effective.
> But the accuracy of hpet was very poor. When I looked into the hpet I
> was surprised that it was
> based on tsc, as I was tring to get away from tsc. But note, even based
> on tsc the time was not
> going backwards, at least for this simple test case.

Yes, having all the virtual timers based on 'guest TSC' (which really is
basically host TSC + an offset) is not great.

> Its a fairly simple matter to base the hpet on the physical hpet. Its
> easy to share it among guests
> as no one really writes the physical hpet.  Offsets are kept in each
> vhpet such that each guest thinks
> he owns the hpet.

This is really no better than basing on Xen system time. Actually it's worse
since most systems don't even expose the HPET, so we can't probe it (without
hacks) and so we can't use it. Xen's system time abstraction, perhaps with
the max(last, curr) addition, is perfectly good enough.

> This goes along with some of the experiences
> Keir has had with drift, I think. I'm not sure why this happens - can
> the hpet hardware be that poor in quality?

It does appear to be, and I have no idea why.

> There are three factors that give hpet its great accuracy, in my opinion.
> 1) The hardware is very stable.
> 2) There is only one of them in the system, not one per cpu.
> 3) The Linux implementation for clock and hpet is very clean.  It
> calculates missed ticks and offsets without
>    including the interrupt delay.

Encouraging the guest to use HPET makes sense. It's a nice wide counter
which hence does not have the wrap issues of the 16-bit PIT counters. Also
in some cases the guest OS interface to the HPET is saner (for our purposes
at least) than the equivalent code to interface to PIT/TSC. This doesn't
mean it has to be plumbed right down to the physical HPET. HVM time sources
can be fixed for drift by moving them away from guest/host TSC and onto the
Xen system time abstraction.

 -- Keir

> Items 2 and 3 here are important factors in why the time stays
> monotonic. Another reason is that
> gettimeofday reads the hpet main counter for extrapolation, eliminating
> extrapolation error since
> the same counter is the sole determinator for the next interrupt time
> stamp. Furthermore, Linux can take the
> clock interrupt on any processor and the monotonicity is preserved
> because of item 2.
> 
> Thanks for reading this far!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System time monotonicity
  2008-04-11 21:20 ` System time monotonicity Keir Fraser
@ 2008-04-11 21:41   ` Keir Fraser
  2008-04-11 22:58     ` Dave Winchell
  2008-04-11 22:22   ` System time monotonicity Dan Magenheimer
  1 sibling, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-11 21:41 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, dan.magenheimer@oracle.com,
	xen-devel@lists.xensource.com, Ian Pratt

On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> Its a fairly simple matter to base the hpet on the physical hpet. Its
>> easy to share it among guests
>> as no one really writes the physical hpet.  Offsets are kept in each
>> vhpet such that each guest thinks
>> he owns the hpet.
> 
> This is really no better than basing on Xen system time. Actually it's worse
> since most systems don't even expose the HPET, so we can't probe it (without
> hacks) and so we can't use it. Xen's system time abstraction, perhaps with the
> max(last, curr) addition, is perfectly good enough.

Just to labour the point some more: If you still believe that diving to the
real platform timer on every guest time access is measurably more accurate,
you can cleanly prove that by building on Xen's system time abstraction, and
then switch between using get_s_time() (aka NOW()) and
read_platform_stime(). The latter calculates current system time by reading
from the platform timer *right now*. It's the function that all local CPUs
calibrate to once per second.

 -- Keir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System time monotonicity
  2008-04-11 21:20 ` System time monotonicity Keir Fraser
  2008-04-11 21:41   ` Keir Fraser
@ 2008-04-11 22:22   ` Dan Magenheimer
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Magenheimer @ 2008-04-11 22:22 UTC (permalink / raw)
  To: Keir Fraser, Dave Winchell
  Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin

IMHO, this all comes down to how bad the tsc drift gets
between calibrations.  If this is agreed, let me propose
the following:  I see local_time_calibration() has
some old printk's.  I propose re-enabling them (with
some rate-limiting) to record to a log how bad the
skew gets.  Then we can request feedback from anyone
running xen-unstable (and maybe xen-3.1-latest and
xen-3.2-latest also) so we can "measure" a broad set of
machines.

One nice approach to rate-limit is to printk each time
the value exceeds the next higher power of two.  Even
though this printk gets output for each processor,
I'd think this would be low overhead and sufficient
information for our needs.

A nice touch would be to include a little script to
run in domain0 that collects the cpuinfo, the relevant
log lines, and emails (or UDPs?) them to a pre-set address
(which is viewable via http).  And maybe a once-per-log
printk that says "please run this script".

Even tied to a boot parameter, this would be better
than the lack-of-information we have now.

Dan

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Friday, April 11, 2008 3:21 PM
> To: Dave Winchell
> Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin;
> xen-devel@lists.xensource.com
> Subject: Re: [xen-devel] System time monotonicity
> 
> 
> On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
> 
> > I turned to the hpet as I became frustrated trying to solve 
> the problem
> > in xen with pit.
> > One of the solutions proposed to the customer was a 
> max(curr time, last
> > time) modification to Linux.
> > They didn't want that.
> > (Keir, what Linux version are you looking at when you say 
> Linux already
> > has this modification?)
> 
> This is part of our own Xen-specific time patches for Linux.
> 
> > I had tried hpet before to solve the time backwards problem 
> and knew it
> > was effective.
> > But the accuracy of hpet was very poor. When I looked into 
> the hpet I
> > was surprised that it was
> > based on tsc, as I was tring to get away from tsc. But 
> note, even based
> > on tsc the time was not
> > going backwards, at least for this simple test case.
> 
> Yes, having all the virtual timers based on 'guest TSC' 
> (which really is
> basically host TSC + an offset) is not great.
> 
> > Its a fairly simple matter to base the hpet on the physical 
> hpet. Its
> > easy to share it among guests
> > as no one really writes the physical hpet.  Offsets are kept in each
> > vhpet such that each guest thinks
> > he owns the hpet.
> 
> This is really no better than basing on Xen system time. 
> Actually it's worse
> since most systems don't even expose the HPET, so we can't 
> probe it (without
> hacks) and so we can't use it. Xen's system time abstraction, 
> perhaps with
> the max(last, curr) addition, is perfectly good enough.
> 
> > This goes along with some of the experiences
> > Keir has had with drift, I think. I'm not sure why this 
> happens - can
> > the hpet hardware be that poor in quality?
> 
> It does appear to be, and I have no idea why.
> 
> > There are three factors that give hpet its great accuracy, 
> in my opinion.
> > 1) The hardware is very stable.
> > 2) There is only one of them in the system, not one per cpu.
> > 3) The Linux implementation for clock and hpet is very clean.  It
> > calculates missed ticks and offsets without
> >    including the interrupt delay.
> 
> Encouraging the guest to use HPET makes sense. It's a nice 
> wide counter
> which hence does not have the wrap issues of the 16-bit PIT 
> counters. Also
> in some cases the guest OS interface to the HPET is saner 
> (for our purposes
> at least) than the equivalent code to interface to PIT/TSC. 
> This doesn't
> mean it has to be plumbed right down to the physical HPET. 
> HVM time sources
> can be fixed for drift by moving them away from guest/host 
> TSC and onto the
> Xen system time abstraction.
> 
>  -- Keir
> 
> > Items 2 and 3 here are important factors in why the time stays
> > monotonic. Another reason is that
> > gettimeofday reads the hpet main counter for extrapolation, 
> eliminating
> > extrapolation error since
> > the same counter is the sole determinator for the next 
> interrupt time
> > stamp. Furthermore, Linux can take the
> > clock interrupt on any processor and the monotonicity is preserved
> > because of item 2.
> >
> > Thanks for reading this far!
> 
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System time monotonicity
  2008-04-11 21:41   ` Keir Fraser
@ 2008-04-11 22:58     ` Dave Winchell
  2008-04-12  7:09       ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-11 22:58 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Tian, Kevin, dan.magenheimer, xen-devel, Dave Winchell, Ian Pratt

[-- Attachment #1.1: Type: text/plain, Size: 1682 bytes --]

Hi Keir,

Your suggestion below is a good one. I'll give it a try and let you know.

(I thought most systems did expose an hpet, at least modern ones.
All the systems I use expose it.
My code defaults back to the tsc way of doing things when no hpet is
detected.)

Regards,
Dave

________________________________

From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Fri 4/11/2008 5:41 PM
To: Dave Winchell
Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin; xen-devel@lists.xensource.com
Subject: Re: [xen-devel] System time monotonicity

On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> Its a fairly simple matter to base the hpet on the physical hpet. Its
>> easy to share it among guests
>> as no one really writes the physical hpet.  Offsets are kept in each
>> vhpet such that each guest thinks
>> he owns the hpet.
>
> This is really no better than basing on Xen system time. Actually it's worse
> since most systems don't even expose the HPET, so we can't probe it (without
> hacks) and so we can't use it. Xen's system time abstraction, perhaps with the
> max(last, curr) addition, is perfectly good enough.

Just to labour the point some more: If you still believe that diving to the
real platform timer on every guest time access is measurably more accurate,
you can cleanly prove that by building on Xen's system time abstraction, and
then switch between using get_s_time() (aka NOW()) and
read_platform_stime(). The latter calculates current system time by reading
from the platform timer *right now*. It's the function that all local CPUs
calibrate to once per second.

 -- Keir

[-- Attachment #1.2: Type: text/html, Size: 2763 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System time monotonicity
  2008-04-11 22:58     ` Dave Winchell
@ 2008-04-12  7:09       ` Keir Fraser
  2008-04-21 19:26         ` Dan Magenheimer
  0 siblings, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-12  7:09 UTC (permalink / raw)
  To: Dave Winchell; +Cc: Tian, Kevin, dan.magenheimer, xen-devel, Ian Pratt


[-- Attachment #1.1: Type: text/plain, Size: 637 bytes --]

On 11/4/08 23:58, "Dave Winchell" <dwinchell@virtualiron.com> wrote:

> Your suggestion below is a good one. I'll give it a try and let you know.
>  
> (I thought most systems did expose an hpet, at least modern ones.
> All the systems I use expose it.
> My code defaults back to the tsc way of doing things when no hpet is
> detected.)

It used to be deliberately not exposed because Windows had problems using it
in some cases, iirc. If it¹s no longer getting hidden in newer systems then
that¹s a good thing.

Anyhow, yes, please do switch over to system time. I wouldn¹t take a
HPET-vs-TSC patch anyway.

 -- Keir


[-- Attachment #1.2: Type: text/html, Size: 1136 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System time monotonicity
  2008-04-12  7:09       ` Keir Fraser
@ 2008-04-21 19:26         ` Dan Magenheimer
  2008-04-21 19:31           ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Magenheimer @ 2008-04-21 19:26 UTC (permalink / raw)
  To: Keir Fraser, Dave Winchell
  Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin

> From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> Dan asked that I measure the cost of accessing the hpet. My 
> first set of
> measurements indicated that
> for 99.8% of the samples taken the cost is less than 1/2 usec, or 500
> cycles on my machine.
> This is about the cost of a single vmexit, for reference. The cost
> includes the cost
> of taking a spinlock. This cost also includes the overhead of 
> one or two
> rdtsc instructions for the measurement.
> I'm still working with Dan on these measurements as he sees (much)
> higher costs measured from Linux.

FYI, I did a more precise measurement of reading hpet (and pit)
on my machine by modifying the kernel and recording rdtscll
differences around the platform timer reads (as well as the
entire function calls which compares better against my prior
measurements using systemtap).  For HPET reads, about 86% of
the (100000) samples were between 4K and 8K cycles and about
13% were between 16K and 32K cycles.  For PIT reads, all reads
are between 64K and 128K cycles.  (I also measured the rdtscll
calls by putting two calls back-to-back... 90% of them were
between 128-255 cycles and 10% between 64-127 cycles.)

Since my system is 3.0 Ghz, HPET read averages on the order of
3 usec, which is what my previous measure showed.

I suspect that Dave's measurements and mine are "both right"
and that read overhead of HPET varies on different systems.

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> It used to be deliberately not exposed because Windows had problems
> using it in some cases, iirc. If it's no longer getting hidden in
> newer systems then that's a good thing.

On my box (and several others at Oracle), HPET is present but
disabled by default in the BIOS.

Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System time monotonicity
  2008-04-21 19:26         ` Dan Magenheimer
@ 2008-04-21 19:31           ` Keir Fraser
  2008-04-21 20:32             ` Fix for get_s_time() Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-21 19:31 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Dave Winchell
  Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin

On 21/4/08 20:26, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> Since my system is 3.0 Ghz, HPET read averages on the order of
> 3 usec, which is what my previous measure showed.
> 
> I suspect that Dave's measurements and mine are "both right"
> and that read overhead of HPET varies on different systems.

That would make sense. HPET accesses have to go all the way to the
southbridge, and that's bound to vary a lot between chipsets let alone
across the very different interconnect topologies of AMD vs Intel.

 -- Keir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Fix for get_s_time()
  2008-04-21 19:31           ` Keir Fraser
@ 2008-04-21 20:32             ` Dave Winchell
  2008-04-21 22:55               ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-21 20:32 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Tian, Kevin, dan.magenheimer@oracle.com,
	xen-devel@lists.xensource.com, Dave Winchell, Ian Pratt

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

Keir,

In my work on layering hpet on get_s_time, I found a problem
in get_s_time and related code. Because of the problem I was getting
large jumps in the offset between local time and ntp time.
These jumps were on the order of many seconds.

The issue is the race between local_time_calibration() executing
on one processor and platform_time_calibration() on another.

I have included a patch which addresses the race in
local_time_calibration(), cpu_frequency_change(), and
init_percpu_time().

I'm giving you this ahead of the hpet work as it affects all users
of get_s_time().

I'm confident of the fix in local_time_calibration() as I had failures there
before the fix and no failures after. The other two I'm less confident 
in, so check
my work closely there.

On the hpet over get_s_time() front, this fix allows me to get .0014% error.
This is very close to the error going to the hardware hpet each time.

Regards,
Dave



[-- Attachment #2: p.time.c.4.21 --]
[-- Type: text/plain, Size: 2830 bytes --]

diff -r a38a41de0800 xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Wed Apr 16 16:42:47 2008 +0100
+++ b/xen/arch/x86/time.c	Mon Apr 21 15:19:37 2008 -0400
@@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void
 
     return stime;
 }
+static s_time_t read_platform_stime_locked(void)
+{
+    u64 count;
+    s_time_t stime;
+
+    count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask);
+    stime = __read_platform_stime(count);
+
+    return stime;
+}
 
 static void platform_time_calibration(void)
 {
@@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq)
 {
     struct cpu_time *t = &this_cpu(cpu_time);
     u64 curr_tsc;
+    unsigned long flags;
 
     /* Sanity check: CPU frequency allegedly dropping below 1MHz? */
     if ( freq < 1000000u )
@@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq)
         return -EINVAL;
     }
 
-    local_irq_disable();
+    spin_lock_irqsave(&platform_timer_lock, flags);
     rdtscll(curr_tsc);
     t->local_tsc_stamp = curr_tsc;
-    t->stime_master_stamp = read_platform_stime();
+    t->stime_master_stamp = read_platform_stime_locked();
     /* TSC-extrapolated time may be bogus after frequency change. */
     /*t->stime_local_stamp = get_s_time();*/
     t->stime_local_stamp = t->stime_master_stamp;
     set_time_scale(&t->tsc_scale, freq);
-    local_irq_enable();
+    spin_unlock_irqrestore(&platform_timer_lock, flags);
 
     /* A full epoch should pass before we check for deviation. */
     set_timer(&t->calibration_timer, NOW() + EPOCH);
@@ -830,16 +841,18 @@ static void local_time_calibration(void 
     /* The overall calibration scale multiplier. */
     u32 calibration_mul_frac;
 
+    unsigned long flags;
+
     prev_tsc          = t->local_tsc_stamp;
     prev_local_stime  = t->stime_local_stamp;
     prev_master_stime = t->stime_master_stamp;
 
     /* Disable IRQs to get 'instantaneous' current timestamps. */
-    local_irq_disable();
+    spin_lock_irqsave(&platform_timer_lock, flags);
     rdtscll(curr_tsc);
     curr_local_stime  = get_s_time();
-    curr_master_stime = read_platform_stime();
-    local_irq_enable();
+    curr_master_stime = read_platform_stime_locked();
+    spin_unlock_irqrestore(&platform_timer_lock, flags);
 
 #if 0
     printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n",
@@ -944,10 +957,10 @@ void init_percpu_time(void)
     unsigned long flags;
     s_time_t now;
 
-    local_irq_save(flags);
+    spin_lock_irqsave(&platform_timer_lock, flags);
     rdtscll(t->local_tsc_stamp);
-    now = !plt_src.read_counter ? 0 : read_platform_stime();
-    local_irq_restore(flags);
+    now = !plt_src.read_counter ? 0 : read_platform_stime_locked();
+    spin_unlock_irqrestore(&platform_timer_lock, flags);
 
     t->stime_master_stamp = now;
     t->stime_local_stamp  = now;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fix for get_s_time()
  2008-04-21 20:32             ` Fix for get_s_time() Dave Winchell
@ 2008-04-21 22:55               ` Keir Fraser
  2008-04-21 23:10                 ` Keir Fraser
  0 siblings, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-21 22:55 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, dan.magenheimer@oracle.com,
	xen-devel@lists.xensource.com, Ian Pratt

What's the race? All accesses to platform time are already done under the
platform_timer_lock as far as I can see.

 -- Keir

On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote:

> Keir,
> 
> In my work on layering hpet on get_s_time, I found a problem
> in get_s_time and related code. Because of the problem I was getting
> large jumps in the offset between local time and ntp time.
> These jumps were on the order of many seconds.
> 
> The issue is the race between local_time_calibration() executing
> on one processor and platform_time_calibration() on another.
> 
> I have included a patch which addresses the race in
> local_time_calibration(), cpu_frequency_change(), and
> init_percpu_time().
> 
> I'm giving you this ahead of the hpet work as it affects all users
> of get_s_time().
> 
> I'm confident of the fix in local_time_calibration() as I had failures there
> before the fix and no failures after. The other two I'm less confident
> in, so check
> my work closely there.
> 
> On the hpet over get_s_time() front, this fix allows me to get .0014% error.
> This is very close to the error going to the hardware hpet each time.
> 
> Regards,
> Dave
> 
> 
> diff -r a38a41de0800 xen/arch/x86/time.c
> --- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100
> +++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400
> @@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void
>  
>      return stime;
>  }
> +static s_time_t read_platform_stime_locked(void)
> +{
> +    u64 count;
> +    s_time_t stime;
> +
> +    count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask);
> +    stime = __read_platform_stime(count);
> +
> +    return stime;
> +}
>  
>  static void platform_time_calibration(void)
>  {
> @@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq)
>  {
>      struct cpu_time *t = &this_cpu(cpu_time);
>      u64 curr_tsc;
> +    unsigned long flags;
>  
>      /* Sanity check: CPU frequency allegedly dropping below 1MHz? */
>      if ( freq < 1000000u )
> @@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq)
>          return -EINVAL;
>      }
>  
> -    local_irq_disable();
> +    spin_lock_irqsave(&platform_timer_lock, flags);
>      rdtscll(curr_tsc);
>      t->local_tsc_stamp = curr_tsc;
> -    t->stime_master_stamp = read_platform_stime();
> +    t->stime_master_stamp = read_platform_stime_locked();
>      /* TSC-extrapolated time may be bogus after frequency change. */
>      /*t->stime_local_stamp = get_s_time();*/
>      t->stime_local_stamp = t->stime_master_stamp;
>      set_time_scale(&t->tsc_scale, freq);
> -    local_irq_enable();
> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>  
>      /* A full epoch should pass before we check for deviation. */
>      set_timer(&t->calibration_timer, NOW() + EPOCH);
> @@ -830,16 +841,18 @@ static void local_time_calibration(void
>      /* The overall calibration scale multiplier. */
>      u32 calibration_mul_frac;
>  
> +    unsigned long flags;
> +
>      prev_tsc          = t->local_tsc_stamp;
>      prev_local_stime  = t->stime_local_stamp;
>      prev_master_stime = t->stime_master_stamp;
>  
>      /* Disable IRQs to get 'instantaneous' current timestamps. */
> -    local_irq_disable();
> +    spin_lock_irqsave(&platform_timer_lock, flags);
>      rdtscll(curr_tsc);
>      curr_local_stime  = get_s_time();
> -    curr_master_stime = read_platform_stime();
> -    local_irq_enable();
> +    curr_master_stime = read_platform_stime_locked();
> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>  
>  #if 0
>      printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n",
> @@ -944,10 +957,10 @@ void init_percpu_time(void)
>      unsigned long flags;
>      s_time_t now;
>  
> -    local_irq_save(flags);
> +    spin_lock_irqsave(&platform_timer_lock, flags);
>      rdtscll(t->local_tsc_stamp);
> -    now = !plt_src.read_counter ? 0 : read_platform_stime();
> -    local_irq_restore(flags);
> +    now = !plt_src.read_counter ? 0 : read_platform_stime_locked();
> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>  
>      t->stime_master_stamp = now;
>      t->stime_local_stamp  = now;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-21 22:55               ` Keir Fraser
@ 2008-04-21 23:10                 ` Keir Fraser
  2008-04-22 14:09                   ` Dave Winchell
  2008-04-24 16:04                   ` Dave Winchell
  0 siblings, 2 replies; 24+ messages in thread
From: Keir Fraser @ 2008-04-21 23:10 UTC (permalink / raw)
  To: Keir Fraser, Dave Winchell
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt,
	dan.magenheimer@oracle.com

Is the issue that there is a variable delay in waiting to acquire the
platform_timer_lock? If so, you can fix that by, for example, making
read_platform_stime() the *first* thing you do inside the
local_irq_disable/enable() block in local-time_calibration(). i.e., *before*
doing the rdtsc() and get_s_time(). This avoids a variable delay between the
TSC-based time estimates and the platform-timer-based time estimate.

If that fixes your issue I'll apply that in preference.

 -- Keir

On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> What's the race? All accesses to platform time are already done under the
> platform_timer_lock as far as I can see.
> 
>  -- Keir
> 
> On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
> 
>> Keir,
>> 
>> In my work on layering hpet on get_s_time, I found a problem
>> in get_s_time and related code. Because of the problem I was getting
>> large jumps in the offset between local time and ntp time.
>> These jumps were on the order of many seconds.
>> 
>> The issue is the race between local_time_calibration() executing
>> on one processor and platform_time_calibration() on another.
>> 
>> I have included a patch which addresses the race in
>> local_time_calibration(), cpu_frequency_change(), and
>> init_percpu_time().
>> 
>> I'm giving you this ahead of the hpet work as it affects all users
>> of get_s_time().
>> 
>> I'm confident of the fix in local_time_calibration() as I had failures there
>> before the fix and no failures after. The other two I'm less confident
>> in, so check
>> my work closely there.
>> 
>> On the hpet over get_s_time() front, this fix allows me to get .0014% error.
>> This is very close to the error going to the hardware hpet each time.
>> 
>> Regards,
>> Dave
>> 
>> 
>> diff -r a38a41de0800 xen/arch/x86/time.c
>> --- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100
>> +++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400
>> @@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void
>>  
>>      return stime;
>>  }
>> +static s_time_t read_platform_stime_locked(void)
>> +{
>> +    u64 count;
>> +    s_time_t stime;
>> +
>> +    count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask);
>> +    stime = __read_platform_stime(count);
>> +
>> +    return stime;
>> +}
>>  
>>  static void platform_time_calibration(void)
>>  {
>> @@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq)
>>  {
>>      struct cpu_time *t = &this_cpu(cpu_time);
>>      u64 curr_tsc;
>> +    unsigned long flags;
>>  
>>      /* Sanity check: CPU frequency allegedly dropping below 1MHz? */
>>      if ( freq < 1000000u )
>> @@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq)
>>          return -EINVAL;
>>      }
>>  
>> -    local_irq_disable();
>> +    spin_lock_irqsave(&platform_timer_lock, flags);
>>      rdtscll(curr_tsc);
>>      t->local_tsc_stamp = curr_tsc;
>> -    t->stime_master_stamp = read_platform_stime();
>> +    t->stime_master_stamp = read_platform_stime_locked();
>>      /* TSC-extrapolated time may be bogus after frequency change. */
>>      /*t->stime_local_stamp = get_s_time();*/
>>      t->stime_local_stamp = t->stime_master_stamp;
>>      set_time_scale(&t->tsc_scale, freq);
>> -    local_irq_enable();
>> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>  
>>      /* A full epoch should pass before we check for deviation. */
>>      set_timer(&t->calibration_timer, NOW() + EPOCH);
>> @@ -830,16 +841,18 @@ static void local_time_calibration(void
>>      /* The overall calibration scale multiplier. */
>>      u32 calibration_mul_frac;
>>  
>> +    unsigned long flags;
>> +
>>      prev_tsc          = t->local_tsc_stamp;
>>      prev_local_stime  = t->stime_local_stamp;
>>      prev_master_stime = t->stime_master_stamp;
>>  
>>      /* Disable IRQs to get 'instantaneous' current timestamps. */
>> -    local_irq_disable();
>> +    spin_lock_irqsave(&platform_timer_lock, flags);
>>      rdtscll(curr_tsc);
>>      curr_local_stime  = get_s_time();
>> -    curr_master_stime = read_platform_stime();
>> -    local_irq_enable();
>> +    curr_master_stime = read_platform_stime_locked();
>> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>  
>>  #if 0
>>      printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n",
>> @@ -944,10 +957,10 @@ void init_percpu_time(void)
>>      unsigned long flags;
>>      s_time_t now;
>>  
>> -    local_irq_save(flags);
>> +    spin_lock_irqsave(&platform_timer_lock, flags);
>>      rdtscll(t->local_tsc_stamp);
>> -    now = !plt_src.read_counter ? 0 : read_platform_stime();
>> -    local_irq_restore(flags);
>> +    now = !plt_src.read_counter ? 0 : read_platform_stime_locked();
>> +    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>  
>>      t->stime_master_stamp = now;
>>      t->stime_local_stamp  = now;
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-21 23:10                 ` Keir Fraser
@ 2008-04-22 14:09                   ` Dave Winchell
  2008-04-24 16:04                   ` Dave Winchell
  1 sibling, 0 replies; 24+ messages in thread
From: Dave Winchell @ 2008-04-22 14:09 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Dave Winchell, Tian, Kevin, xen-devel@lists.xensource.com,
	Ian Pratt, dan.magenheimer@oracle.com

Keir,

Well, I'm glad you didn't take the patch as my overnight test was
not successful. I'm afraid that I was overly confident.
I'll re-submit when I get to the bottom of this.

thanks,
Dave

Keir Fraser wrote:

>Is the issue that there is a variable delay in waiting to acquire the
>platform_timer_lock? If so, you can fix that by, for example, making
>read_platform_stime() the *first* thing you do inside the
>local_irq_disable/enable() block in local-time_calibration(). i.e., *before*
>doing the rdtsc() and get_s_time(). This avoids a variable delay between the
>TSC-based time estimates and the platform-timer-based time estimate.
>
>If that fixes your issue I'll apply that in preference.
>
> -- Keir
>
>On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>
>  
>
>>What's the race? All accesses to platform time are already done under the
>>platform_timer_lock as far as I can see.
>>
>> -- Keir
>>
>>On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>>
>>    
>>
>>>Keir,
>>>
>>>In my work on layering hpet on get_s_time, I found a problem
>>>in get_s_time and related code. Because of the problem I was getting
>>>large jumps in the offset between local time and ntp time.
>>>These jumps were on the order of many seconds.
>>>
>>>The issue is the race between local_time_calibration() executing
>>>on one processor and platform_time_calibration() on another.
>>>
>>>I have included a patch which addresses the race in
>>>local_time_calibration(), cpu_frequency_change(), and
>>>init_percpu_time().
>>>
>>>I'm giving you this ahead of the hpet work as it affects all users
>>>of get_s_time().
>>>
>>>I'm confident of the fix in local_time_calibration() as I had failures there
>>>before the fix and no failures after. The other two I'm less confident
>>>in, so check
>>>my work closely there.
>>>
>>>On the hpet over get_s_time() front, this fix allows me to get .0014% error.
>>>This is very close to the error going to the hardware hpet each time.
>>>
>>>Regards,
>>>Dave
>>>
>>>
>>>diff -r a38a41de0800 xen/arch/x86/time.c
>>>--- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100
>>>+++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400
>>>@@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void
>>> 
>>>     return stime;
>>> }
>>>+static s_time_t read_platform_stime_locked(void)
>>>+{
>>>+    u64 count;
>>>+    s_time_t stime;
>>>+
>>>+    count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask);
>>>+    stime = __read_platform_stime(count);
>>>+
>>>+    return stime;
>>>+}
>>> 
>>> static void platform_time_calibration(void)
>>> {
>>>@@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq)
>>> {
>>>     struct cpu_time *t = &this_cpu(cpu_time);
>>>     u64 curr_tsc;
>>>+    unsigned long flags;
>>> 
>>>     /* Sanity check: CPU frequency allegedly dropping below 1MHz? */
>>>     if ( freq < 1000000u )
>>>@@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq)
>>>         return -EINVAL;
>>>     }
>>> 
>>>-    local_irq_disable();
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(curr_tsc);
>>>     t->local_tsc_stamp = curr_tsc;
>>>-    t->stime_master_stamp = read_platform_stime();
>>>+    t->stime_master_stamp = read_platform_stime_locked();
>>>     /* TSC-extrapolated time may be bogus after frequency change. */
>>>     /*t->stime_local_stamp = get_s_time();*/
>>>     t->stime_local_stamp = t->stime_master_stamp;
>>>     set_time_scale(&t->tsc_scale, freq);
>>>-    local_irq_enable();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>>     /* A full epoch should pass before we check for deviation. */
>>>     set_timer(&t->calibration_timer, NOW() + EPOCH);
>>>@@ -830,16 +841,18 @@ static void local_time_calibration(void
>>>     /* The overall calibration scale multiplier. */
>>>     u32 calibration_mul_frac;
>>> 
>>>+    unsigned long flags;
>>>+
>>>     prev_tsc          = t->local_tsc_stamp;
>>>     prev_local_stime  = t->stime_local_stamp;
>>>     prev_master_stime = t->stime_master_stamp;
>>> 
>>>     /* Disable IRQs to get 'instantaneous' current timestamps. */
>>>-    local_irq_disable();
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(curr_tsc);
>>>     curr_local_stime  = get_s_time();
>>>-    curr_master_stime = read_platform_stime();
>>>-    local_irq_enable();
>>>+    curr_master_stime = read_platform_stime_locked();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>> #if 0
>>>     printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n",
>>>@@ -944,10 +957,10 @@ void init_percpu_time(void)
>>>     unsigned long flags;
>>>     s_time_t now;
>>> 
>>>-    local_irq_save(flags);
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(t->local_tsc_stamp);
>>>-    now = !plt_src.read_counter ? 0 : read_platform_stime();
>>>-    local_irq_restore(flags);
>>>+    now = !plt_src.read_counter ? 0 : read_platform_stime_locked();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>>     t->stime_master_stamp = now;
>>>     t->stime_local_stamp  = now;
>>>      
>>>
>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>    
>>
>
>
>  
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-21 23:10                 ` Keir Fraser
  2008-04-22 14:09                   ` Dave Winchell
@ 2008-04-24 16:04                   ` Dave Winchell
  2008-04-24 16:37                     ` Keir Fraser
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-24 16:04 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Dave Winchell, Tian, Kevin, xen-devel@lists.xensource.com,
	Ian Pratt, dan.magenheimer@oracle.com

Keir Fraser wrote:

>Is the issue that there is a variable delay in waiting to acquire the
>platform_timer_lock? If so, you can fix that by, for example, making
>read_platform_stime() the *first* thing you do inside the
>local_irq_disable/enable() block in local-time_calibration(). i.e., *before*
>doing the rdtsc() and get_s_time(). This avoids a variable delay between the
>TSC-based time estimates and the platform-timer-based time estimate.
>  
>
yes, this is the issue. What you suggest should be fine and I am trying 
it now.
With the locking version (and a fix to a bug I introduced) I got .0012% 
error
on an overnight run with hpet layered on get_s_time_mono(), which is the
max(prev, cur) layer on get_s_time we discussed.

-Dave

>If that fixes your issue I'll apply that in preference.
>
> -- Keir
>
>On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>
>  
>
>>What's the race? All accesses to platform time are already done under the
>>platform_timer_lock as far as I can see.
>>
>> -- Keir
>>
>>On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>>
>>    
>>
>>>Keir,
>>>
>>>In my work on layering hpet on get_s_time, I found a problem
>>>in get_s_time and related code. Because of the problem I was getting
>>>large jumps in the offset between local time and ntp time.
>>>These jumps were on the order of many seconds.
>>>
>>>The issue is the race between local_time_calibration() executing
>>>on one processor and platform_time_calibration() on another.
>>>
>>>I have included a patch which addresses the race in
>>>local_time_calibration(), cpu_frequency_change(), and
>>>init_percpu_time().
>>>
>>>I'm giving you this ahead of the hpet work as it affects all users
>>>of get_s_time().
>>>
>>>I'm confident of the fix in local_time_calibration() as I had failures there
>>>before the fix and no failures after. The other two I'm less confident
>>>in, so check
>>>my work closely there.
>>>
>>>On the hpet over get_s_time() front, this fix allows me to get .0014% error.
>>>This is very close to the error going to the hardware hpet each time.
>>>
>>>Regards,
>>>Dave
>>>
>>>
>>>diff -r a38a41de0800 xen/arch/x86/time.c
>>>--- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100
>>>+++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400
>>>@@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void
>>> 
>>>     return stime;
>>> }
>>>+static s_time_t read_platform_stime_locked(void)
>>>+{
>>>+    u64 count;
>>>+    s_time_t stime;
>>>+
>>>+    count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask);
>>>+    stime = __read_platform_stime(count);
>>>+
>>>+    return stime;
>>>+}
>>> 
>>> static void platform_time_calibration(void)
>>> {
>>>@@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq)
>>> {
>>>     struct cpu_time *t = &this_cpu(cpu_time);
>>>     u64 curr_tsc;
>>>+    unsigned long flags;
>>> 
>>>     /* Sanity check: CPU frequency allegedly dropping below 1MHz? */
>>>     if ( freq < 1000000u )
>>>@@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq)
>>>         return -EINVAL;
>>>     }
>>> 
>>>-    local_irq_disable();
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(curr_tsc);
>>>     t->local_tsc_stamp = curr_tsc;
>>>-    t->stime_master_stamp = read_platform_stime();
>>>+    t->stime_master_stamp = read_platform_stime_locked();
>>>     /* TSC-extrapolated time may be bogus after frequency change. */
>>>     /*t->stime_local_stamp = get_s_time();*/
>>>     t->stime_local_stamp = t->stime_master_stamp;
>>>     set_time_scale(&t->tsc_scale, freq);
>>>-    local_irq_enable();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>>     /* A full epoch should pass before we check for deviation. */
>>>     set_timer(&t->calibration_timer, NOW() + EPOCH);
>>>@@ -830,16 +841,18 @@ static void local_time_calibration(void
>>>     /* The overall calibration scale multiplier. */
>>>     u32 calibration_mul_frac;
>>> 
>>>+    unsigned long flags;
>>>+
>>>     prev_tsc          = t->local_tsc_stamp;
>>>     prev_local_stime  = t->stime_local_stamp;
>>>     prev_master_stime = t->stime_master_stamp;
>>> 
>>>     /* Disable IRQs to get 'instantaneous' current timestamps. */
>>>-    local_irq_disable();
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(curr_tsc);
>>>     curr_local_stime  = get_s_time();
>>>-    curr_master_stime = read_platform_stime();
>>>-    local_irq_enable();
>>>+    curr_master_stime = read_platform_stime_locked();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>> #if 0
>>>     printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n",
>>>@@ -944,10 +957,10 @@ void init_percpu_time(void)
>>>     unsigned long flags;
>>>     s_time_t now;
>>> 
>>>-    local_irq_save(flags);
>>>+    spin_lock_irqsave(&platform_timer_lock, flags);
>>>     rdtscll(t->local_tsc_stamp);
>>>-    now = !plt_src.read_counter ? 0 : read_platform_stime();
>>>-    local_irq_restore(flags);
>>>+    now = !plt_src.read_counter ? 0 : read_platform_stime_locked();
>>>+    spin_unlock_irqrestore(&platform_timer_lock, flags);
>>> 
>>>     t->stime_master_stamp = now;
>>>     t->stime_local_stamp  = now;
>>>      
>>>
>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>    
>>
>
>
>  
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-24 16:04                   ` Dave Winchell
@ 2008-04-24 16:37                     ` Keir Fraser
  2008-04-24 18:32                       ` Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-24 16:37 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt,
	dan.magenheimer@oracle.com

On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote:

> yes, this is the issue. What you suggest should be fine and I am trying
> it now.
> With the locking version (and a fix to a bug I introduced) I got .0012%
> error
> on an overnight run with hpet layered on get_s_time_mono(), which is the
> max(prev, cur) layer on get_s_time we discussed.

12 parts per million is pretty good. Is that cumulative deviation from 'wall
time' over ~12 hours? That could easily be explained by the fact that Xen
system time is not sync'ed with ntp.

 -- Keir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-24 16:37                     ` Keir Fraser
@ 2008-04-24 18:32                       ` Dave Winchell
  2008-04-25 19:48                         ` Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-24 18:32 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Dave Winchell, Tian, Kevin, xen-devel@lists.xensource.com,
	Ian Pratt, dan.magenheimer@oracle.com

Keir Fraser wrote:

>On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>
>  
>
>>yes, this is the issue. What you suggest should be fine and I am trying
>>it now.
>>With the locking version (and a fix to a bug I introduced) I got .0012%
>>error
>>on an overnight run with hpet layered on get_s_time_mono(), which is the
>>max(prev, cur) layer on get_s_time we discussed.
>>    
>>
>
>12 parts per million is pretty good. Is that cumulative deviation from 'wall
>time' over ~12 hours?
>
yes, deviation between the guest's time and an ntp reference.

> That could easily be explained by the fact that Xen
>system time is not sync'ed with ntp.
>  
>
That's true. And, as we have discussed, this error seems to vary quite a bit
platform to platform for some reason. I will verify that this still is 
the case.

-Dave

> -- Keir
>
>
>  
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-24 18:32                       ` Dave Winchell
@ 2008-04-25 19:48                         ` Dave Winchell
  2008-04-25 21:03                           ` Dan Magenheimer
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-25 19:48 UTC (permalink / raw)
  Cc: Tian, Kevin, Dave Winchell, dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

Keir,

Last nights run had the error in the 12 ppm range.
Here is the change we have been talking about.

-Dave

Dave Winchell wrote:

> Keir Fraser wrote:
>
>> On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>>
>>  
>>
>>> yes, this is the issue. What you suggest should be fine and I am trying
>>> it now.
>>> With the locking version (and a fix to a bug I introduced) I got .0012%
>>> error
>>> on an overnight run with hpet layered on get_s_time_mono(), which is 
>>> the
>>> max(prev, cur) layer on get_s_time we discussed.
>>>   
>>
>>
>> 12 parts per million is pretty good. Is that cumulative deviation 
>> from 'wall
>> time' over ~12 hours?
>>
> yes, deviation between the guest's time and an ntp reference.
>
>> That could easily be explained by the fact that Xen
>> system time is not sync'ed with ntp.
>>  
>>
> That's true. And, as we have discussed, this error seems to vary quite 
> a bit
> platform to platform for some reason. I will verify that this still is 
> the case.
>
> -Dave
>
>> -- Keir
>>
>>
>>  
>>
>


[-- Attachment #2: p.calib.order --]
[-- Type: text/plain, Size: 543 bytes --]

diff -r 483d006cc607 xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Fri Apr 25 13:46:27 2008 +0100
+++ b/xen/arch/x86/time.c	Fri Apr 25 15:40:31 2008 -0400
@@ -836,9 +836,9 @@ static void local_time_calibration(void 
 
     /* Disable IRQs to get 'instantaneous' current timestamps. */
     local_irq_disable();
+    curr_master_stime = read_platform_stime();
+    curr_local_stime  = get_s_time();
     rdtscll(curr_tsc);
-    curr_local_stime  = get_s_time();
-    curr_master_stime = read_platform_stime();
     local_irq_enable();
 
 #if 0

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Re: Fix for get_s_time()
  2008-04-25 19:48                         ` Dave Winchell
@ 2008-04-25 21:03                           ` Dan Magenheimer
  2008-04-26  1:54                             ` Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Magenheimer @ 2008-04-25 21:03 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt,
	Keir Fraser

Hi Dave --

Are you ready to release the guest-virtual-platform-timer
on xen-system-time patch yet?  If so, we'd be happy to
give it some testing.

Thanks,
Dan

> -----Original Message-----
> From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> Sent: Friday, April 25, 2008 1:48 PM
> To: Dave Winchell
> Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
> xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
> Subject: Re: [Xen-devel] Re: Fix for get_s_time()
> 
> 
> Keir,
> 
> Last nights run had the error in the 12 ppm range.
> Here is the change we have been talking about.
> 
> -Dave
> 
> Dave Winchell wrote:
> 
> > Keir Fraser wrote:
> >
> >> On 24/4/08 17:04, "Dave Winchell" 
> <dwinchell@virtualiron.com> wrote:
> >>
> >>
> >>
> >>> yes, this is the issue. What you suggest should be fine 
> and I am trying
> >>> it now.
> >>> With the locking version (and a fix to a bug I 
> introduced) I got .0012%
> >>> error
> >>> on an overnight run with hpet layered on 
> get_s_time_mono(), which is
> >>> the
> >>> max(prev, cur) layer on get_s_time we discussed.
> >>>
> >>
> >>
> >> 12 parts per million is pretty good. Is that cumulative deviation
> >> from 'wall
> >> time' over ~12 hours?
> >>
> > yes, deviation between the guest's time and an ntp reference.
> >
> >> That could easily be explained by the fact that Xen
> >> system time is not sync'ed with ntp.
> >>
> >>
> > That's true. And, as we have discussed, this error seems to 
> vary quite
> > a bit
> > platform to platform for some reason. I will verify that 
> this still is
> > the case.
> >
> > -Dave
> >
> >> -- Keir
> >>
> >>
> >>
> >>
> >
> 
> 
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Re: Fix for get_s_time()
  2008-04-25 21:03                           ` Dan Magenheimer
@ 2008-04-26  1:54                             ` Dave Winchell
  2008-04-28 17:39                               ` Dan Magenheimer
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-26  1:54 UTC (permalink / raw)
  To: dan.magenheimer
  Cc: Dave Winchell, Tian, Kevin, xen-devel, Ian Pratt, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 2210 bytes --]

Hi Dan,

I just need to remove some debug and merge with unstable.
I should be able to send you a patch Monday or Tuesday.
You know, its more like hpet on system time.
Thanks for the testing offer. 

Regards,
Dave


-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
Sent: Fri 4/25/2008 5:03 PM
To: Dave Winchell
Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt
Subject: RE: [Xen-devel] Re: Fix for get_s_time()
 
Hi Dave --

Are you ready to release the guest-virtual-platform-timer
on xen-system-time patch yet?  If so, we'd be happy to
give it some testing.

Thanks,
Dan

> -----Original Message-----
> From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> Sent: Friday, April 25, 2008 1:48 PM
> To: Dave Winchell
> Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
> xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
> Subject: Re: [Xen-devel] Re: Fix for get_s_time()
> 
> 
> Keir,
> 
> Last nights run had the error in the 12 ppm range.
> Here is the change we have been talking about.
> 
> -Dave
> 
> Dave Winchell wrote:
> 
> > Keir Fraser wrote:
> >
> >> On 24/4/08 17:04, "Dave Winchell" 
> <dwinchell@virtualiron.com> wrote:
> >>
> >>
> >>
> >>> yes, this is the issue. What you suggest should be fine 
> and I am trying
> >>> it now.
> >>> With the locking version (and a fix to a bug I 
> introduced) I got .0012%
> >>> error
> >>> on an overnight run with hpet layered on 
> get_s_time_mono(), which is
> >>> the
> >>> max(prev, cur) layer on get_s_time we discussed.
> >>>
> >>
> >>
> >> 12 parts per million is pretty good. Is that cumulative deviation
> >> from 'wall
> >> time' over ~12 hours?
> >>
> > yes, deviation between the guest's time and an ntp reference.
> >
> >> That could easily be explained by the fact that Xen
> >> system time is not sync'ed with ntp.
> >>
> >>
> > That's true. And, as we have discussed, this error seems to 
> vary quite
> > a bit
> > platform to platform for some reason. I will verify that 
> this still is
> > the case.
> >
> > -Dave
> >
> >> -- Keir
> >>
> >>
> >>
> >>
> >
> 
> 
> 



[-- Attachment #1.2: Type: text/html, Size: 3492 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Re: Fix for get_s_time()
  2008-04-26  1:54                             ` Dave Winchell
@ 2008-04-28 17:39                               ` Dan Magenheimer
  2008-04-28 18:09                                 ` Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Magenheimer @ 2008-04-28 17:39 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt,
	Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 2958 bytes --]

RE: [Xen-devel] Re: Fix for get_s_time()Hi Dave --

> You know, its more like hpet on system time.

I wonder how much of the problems we observed with skew on pit was due to
the pit-on-tsc "bug"... in other words, should the virtual pit be based on
system time also?

Dan


  -----Original Message-----
  From: Dave Winchell [mailto:dwinchell@virtualiron.com]
  Sent: Friday, April 25, 2008 7:54 PM
  To: dan.magenheimer@oracle.com
  Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
  Subject: RE: [Xen-devel] Re: Fix for get_s_time()


  Hi Dan,

  I just need to remove some debug and merge with unstable.
  I should be able to send you a patch Monday or Tuesday.
  You know, its more like hpet on system time.
  Thanks for the testing offer.

  Regards,
  Dave


  -----Original Message-----
  From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
  Sent: Fri 4/25/2008 5:03 PM
  To: Dave Winchell
  Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt
  Subject: RE: [Xen-devel] Re: Fix for get_s_time()

  Hi Dave --

  Are you ready to release the guest-virtual-platform-timer
  on xen-system-time patch yet?  If so, we'd be happy to
  give it some testing.

  Thanks,
  Dan

  > -----Original Message-----
  > From: Dave Winchell [mailto:dwinchell@virtualiron.com]
  > Sent: Friday, April 25, 2008 1:48 PM
  > To: Dave Winchell
  > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
  > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
  > Subject: Re: [Xen-devel] Re: Fix for get_s_time()
  >
  >
  > Keir,
  >
  > Last nights run had the error in the 12 ppm range.
  > Here is the change we have been talking about.
  >
  > -Dave
  >
  > Dave Winchell wrote:
  >
  > > Keir Fraser wrote:
  > >
  > >> On 24/4/08 17:04, "Dave Winchell"
  > <dwinchell@virtualiron.com> wrote:
  > >>
  > >>
  > >>
  > >>> yes, this is the issue. What you suggest should be fine
  > and I am trying
  > >>> it now.
  > >>> With the locking version (and a fix to a bug I
  > introduced) I got .0012%
  > >>> error
  > >>> on an overnight run with hpet layered on
  > get_s_time_mono(), which is
  > >>> the
  > >>> max(prev, cur) layer on get_s_time we discussed.
  > >>>
  > >>
  > >>
  > >> 12 parts per million is pretty good. Is that cumulative deviation
  > >> from 'wall
  > >> time' over ~12 hours?
  > >>
  > > yes, deviation between the guest's time and an ntp reference.
  > >
  > >> That could easily be explained by the fact that Xen
  > >> system time is not sync'ed with ntp.
  > >>
  > >>
  > > That's true. And, as we have discussed, this error seems to
  > vary quite
  > > a bit
  > > platform to platform for some reason. I will verify that
  > this still is
  > > the case.
  > >
  > > -Dave
  > >
  > >> -- Keir
  > >>
  > >>
  > >>
  > >>
  > >
  >
  >
  >




[-- Attachment #1.2: Type: text/html, Size: 5523 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-28 17:39                               ` Dan Magenheimer
@ 2008-04-28 18:09                                 ` Dave Winchell
  2008-04-28 18:40                                   ` Dave Winchell
  2008-04-28 20:11                                   ` Dan Magenheimer
  0 siblings, 2 replies; 24+ messages in thread
From: Dave Winchell @ 2008-04-28 18:09 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com
  Cc: Dave Winchell, Tian, Kevin, xen-devel@lists.xensource.com,
	Ian Pratt, Keir Fraser

Dan Magenheimer wrote:

> Hi Dave --
>  
>> You know, its more like hpet on system time.
>  
> I wonder how much of the problems we observed with skew on pit was due to
> the pit-on-tsc "bug"... in other words, should the virtual pit be based on
> system time also?

For guests that compute missed ticks, it may not help. That's because here
the guests are using tsc in their computations of offset and last 
interrupt time stamp.
Also, there is the esoteric use of delay in the computations for pit.
With hpet, on the other hand, the guests don't read the tsc and don't 
use delay -
they only rely on the hpet main counter.

It might improve accuracy for a guest that does not compute missed 
ticks. But you
would still have the time going backwards issue, unless you patched the 
guest.

Most of the hpet accuracy we see is due to clean and correct algorithms 
in the guest,
in my opinion. Of course we have to do the right things in emulating the 
hpet in xen.

-Dave

>  
> Dan
>
>     -----Original Message-----
>     *From:* Dave Winchell [mailto:dwinchell@virtualiron.com]
>     *Sent:* Friday, April 25, 2008 7:54 PM
>     *To:* dan.magenheimer@oracle.com
>     *Cc:* Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian
>     Pratt; Dave Winchell
>     *Subject:* RE: [Xen-devel] Re: Fix for get_s_time()
>
>     Hi Dan,
>
>     I just need to remove some debug and merge with unstable.
>     I should be able to send you a patch Monday or Tuesday.
>     You know, its more like hpet on system time.
>     Thanks for the testing offer.
>
>     Regards,
>     Dave
>
>
>     -----Original Message-----
>     From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>     Sent: Fri 4/25/2008 5:03 PM
>     To: Dave Winchell
>     Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt
>     Subject: RE: [Xen-devel] Re: Fix for get_s_time()
>
>     Hi Dave --
>
>     Are you ready to release the guest-virtual-platform-timer
>     on xen-system-time patch yet?  If so, we'd be happy to
>     give it some testing.
>
>     Thanks,
>     Dan
>
>     > -----Original Message-----
>     > From: Dave Winchell [mailto:dwinchell@virtualiron.com]
>     > Sent: Friday, April 25, 2008 1:48 PM
>     > To: Dave Winchell
>     > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
>     > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
>     > Subject: Re: [Xen-devel] Re: Fix for get_s_time()
>     >
>     >
>     > Keir,
>     >
>     > Last nights run had the error in the 12 ppm range.
>     > Here is the change we have been talking about.
>     >
>     > -Dave
>     >
>     > Dave Winchell wrote:
>     >
>     > > Keir Fraser wrote:
>     > >
>     > >> On 24/4/08 17:04, "Dave Winchell"
>     > <dwinchell@virtualiron.com> wrote:
>     > >>
>     > >>
>     > >>
>     > >>> yes, this is the issue. What you suggest should be fine
>     > and I am trying
>     > >>> it now.
>     > >>> With the locking version (and a fix to a bug I
>     > introduced) I got .0012%
>     > >>> error
>     > >>> on an overnight run with hpet layered on
>     > get_s_time_mono(), which is
>     > >>> the
>     > >>> max(prev, cur) layer on get_s_time we discussed.
>     > >>>
>     > >>
>     > >>
>     > >> 12 parts per million is pretty good. Is that cumulative deviation
>     > >> from 'wall
>     > >> time' over ~12 hours?
>     > >>
>     > > yes, deviation between the guest's time and an ntp reference.
>     > >
>     > >> That could easily be explained by the fact that Xen
>     > >> system time is not sync'ed with ntp.
>     > >>
>     > >>
>     > > That's true. And, as we have discussed, this error seems to
>     > vary quite
>     > > a bit
>     > > platform to platform for some reason. I will verify that
>     > this still is
>     > > the case.
>     > >
>     > > -Dave
>     > >
>     > >> -- Keir
>     > >>
>     > >>
>     > >>
>     > >>
>     > >
>     >
>     >
>     >
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-28 18:09                                 ` Dave Winchell
@ 2008-04-28 18:40                                   ` Dave Winchell
  2008-04-29  7:14                                     ` Keir Fraser
  2008-04-28 20:11                                   ` Dan Magenheimer
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Winchell @ 2008-04-28 18:40 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Keir Fraser
  Cc: Tian, Kevin, Dave Winchell, Ian Pratt,
	xen-devel@lists.xensource.com

Dan, Keir:

Here is where I stand on the overhead of (hpet)read_64_main_counter()
for the version layered on get_s_time with the max function
compared to a version that goes to the hardware each time.
There are two histograms, each with 100 buckets, each bucket is 64 cycles.
There are 1991 cycles per usec on this box. Bucket 99 contains all
events where overhead >= (99*64) cycles.

Layered on stime the overhead is probably lower on average.
Both histograms are bi-modal, but the going-to-the-hardware one
seems to have a stronger second mode.  As we have discussed, the cost
of going to the hardware could vary quite a bit from platform to platform.

I optimized the code around read_64_main_counter() over stime quite a 
bit, but
I'm sure there is room for improvement.

-Dave

read_64_main_counter() On stime:

(VMM)  cycles per bucket 64
(VMM)
(VMM)  0: 0 78795 148271 21173 15902 47704 89195 121962
(VMM)  8: 83632 51848 17531 12987 10976 8816 9120 8608
(VMM)  16: 5685 3972 3783 2518 1052 710 608 469
(VMM)  24: 277 159 83 46 34 23 19 16
(VMM)  32: 9 6 7 3 4 8 5 6
(VMM)  40: 9 7 14 13 17 25 22 29
(VMM)  48: 25 19 35 27 30 26 21 23
(VMM)  56: 17 24 12 27 22 18 10 22
(VMM)  64: 19 16 16 16 28 18 23 16
(VMM)  72: 22 22 12 14 21 19 17 19
(VMM)  80: 18 14 10 14 11 12 8 18
(VMM)  88: 16 10 17 14 10 8 11 11
(VMM)  96: 10 10 0 175

read_64_main_counter() Going to the hardware:

(VMM)  cycles per bucket 64
(VMM)
(VMM)  0: 92529 148423 27850 12532 28042 43336 60516 59011
(VMM)  8: 36895 14043 8162 6857 7794 7401 5099 2986
(VMM)  16: 1636 1066 796 592 459 409 314 248
(VMM)  24: 206 195 138 97 71 45 35 34
(VMM)  32: 33 36 40 40 25 26 25 26
(VMM)  40: 37 23 18 30 27 30 34 44
(VMM)  48: 38 19 25 23 23 25 21 27
(VMM)  56: 28 24 43 80 220 324 568 599
(VMM)  64: 610 565 611 699 690 846 874 788
(VMM)  72: 703 542 556 613 605 603 559 500
(VMM)  80: 485 493 512 578 561 594 575 614
(VMM)  88: 759 851 895 856 807 770 719 958
(VMM)  96: 1127 1263 0 18219

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Re: Fix for get_s_time()
  2008-04-28 18:09                                 ` Dave Winchell
  2008-04-28 18:40                                   ` Dave Winchell
@ 2008-04-28 20:11                                   ` Dan Magenheimer
  2008-04-28 21:29                                     ` Dave Winchell
  1 sibling, 1 reply; 24+ messages in thread
From: Dan Magenheimer @ 2008-04-28 20:11 UTC (permalink / raw)
  To: Dave Winchell
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt,
	Keir Fraser

OK, then although you are not saying it directly, assuming
your patch is accepted, it sounds like the hpet will now
always be more accurate than pit and the change a couple
of months ago that turns OFF the guest hpet by default
should now be reversed so that the guest hpet is turned ON
by default (or perhaps the option should just be removed
or ignored, with the previous behavior being restored that
hpet is always on).

True?

Thanks,
Dan

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of 
> Dave Winchell
> Sent: Monday, April 28, 2008 12:09 PM
> To: dan.magenheimer@oracle.com
> Cc: Dave Winchell; Tian, Kevin; xen-devel@lists.xensource.com; Ian
> Pratt; Keir Fraser
> Subject: Re: [Xen-devel] Re: Fix for get_s_time()
> 
> 
> Dan Magenheimer wrote:
> 
> > Hi Dave --
> >
> >> You know, its more like hpet on system time.
> >
> > I wonder how much of the problems we observed with skew on 
> pit was due to
> > the pit-on-tsc "bug"... in other words, should the virtual 
> pit be based on
> > system time also?
> 
> For guests that compute missed ticks, it may not help. That's 
> because here
> the guests are using tsc in their computations of offset and last
> interrupt time stamp.
> Also, there is the esoteric use of delay in the computations for pit.
> With hpet, on the other hand, the guests don't read the tsc and don't
> use delay -
> they only rely on the hpet main counter.
> 
> It might improve accuracy for a guest that does not compute missed
> ticks. But you
> would still have the time going backwards issue, unless you 
> patched the
> guest.
> 
> Most of the hpet accuracy we see is due to clean and correct 
> algorithms
> in the guest,
> in my opinion. Of course we have to do the right things in 
> emulating the
> hpet in xen.
> 
> -Dave
> 
> >
> > Dan
> >
> >     -----Original Message-----
> >     *From:* Dave Winchell [mailto:dwinchell@virtualiron.com]
> >     *Sent:* Friday, April 25, 2008 7:54 PM
> >     *To:* dan.magenheimer@oracle.com
> >     *Cc:* Keir Fraser; Tian, Kevin; 
> xen-devel@lists.xensource.com; Ian
> >     Pratt; Dave Winchell
> >     *Subject:* RE: [Xen-devel] Re: Fix for get_s_time()
> >
> >     Hi Dan,
> >
> >     I just need to remove some debug and merge with unstable.
> >     I should be able to send you a patch Monday or Tuesday.
> >     You know, its more like hpet on system time.
> >     Thanks for the testing offer.
> >
> >     Regards,
> >     Dave
> >
> >
> >     -----Original Message-----
> >     From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >     Sent: Fri 4/25/2008 5:03 PM
> >     To: Dave Winchell
> >     Cc: Keir Fraser; Tian, Kevin; 
> xen-devel@lists.xensource.com; Ian Pratt
> >     Subject: RE: [Xen-devel] Re: Fix for get_s_time()
> >
> >     Hi Dave --
> >
> >     Are you ready to release the guest-virtual-platform-timer
> >     on xen-system-time patch yet?  If so, we'd be happy to
> >     give it some testing.
> >
> >     Thanks,
> >     Dan
> >
> >     > -----Original Message-----
> >     > From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> >     > Sent: Friday, April 25, 2008 1:48 PM
> >     > To: Dave Winchell
> >     > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
> >     > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
> >     > Subject: Re: [Xen-devel] Re: Fix for get_s_time()
> >     >
> >     >
> >     > Keir,
> >     >
> >     > Last nights run had the error in the 12 ppm range.
> >     > Here is the change we have been talking about.
> >     >
> >     > -Dave
> >     >
> >     > Dave Winchell wrote:
> >     >
> >     > > Keir Fraser wrote:
> >     > >
> >     > >> On 24/4/08 17:04, "Dave Winchell"
> >     > <dwinchell@virtualiron.com> wrote:
> >     > >>
> >     > >>
> >     > >>
> >     > >>> yes, this is the issue. What you suggest should be fine
> >     > and I am trying
> >     > >>> it now.
> >     > >>> With the locking version (and a fix to a bug I
> >     > introduced) I got .0012%
> >     > >>> error
> >     > >>> on an overnight run with hpet layered on
> >     > get_s_time_mono(), which is
> >     > >>> the
> >     > >>> max(prev, cur) layer on get_s_time we discussed.
> >     > >>>
> >     > >>
> >     > >>
> >     > >> 12 parts per million is pretty good. Is that 
> cumulative deviation
> >     > >> from 'wall
> >     > >> time' over ~12 hours?
> >     > >>
> >     > > yes, deviation between the guest's time and an ntp 
> reference.
> >     > >
> >     > >> That could easily be explained by the fact that Xen
> >     > >> system time is not sync'ed with ntp.
> >     > >>
> >     > >>
> >     > > That's true. And, as we have discussed, this error seems to
> >     > vary quite
> >     > > a bit
> >     > > platform to platform for some reason. I will verify that
> >     > this still is
> >     > > the case.
> >     > >
> >     > > -Dave
> >     > >
> >     > >> -- Keir
> >     > >>
> >     > >>
> >     > >>
> >     > >>
> >     > >
> >     >
> >     >
> >     >
> >
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-28 20:11                                   ` Dan Magenheimer
@ 2008-04-28 21:29                                     ` Dave Winchell
  0 siblings, 0 replies; 24+ messages in thread
From: Dave Winchell @ 2008-04-28 21:29 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Dave Winchell,
	Ian Pratt, Keir Fraser

Dan Magenheimer wrote:

>OK, then although you are not saying it directly, assuming
>your patch is accepted, it sounds like the hpet will now
>always be more accurate than pit and the change a couple
>of months ago that turns OFF the guest hpet by default
>should now be reversed so that the guest hpet is turned ON
>by default (or perhaps the option should just be removed
>or ignored, with the previous behavior being restored that
>hpet is always on).
>  
>
Dan,

We should have the option on a per guest basis, as to whether that
guest sees the hpet in the acpi tables. This is because one guest
that I know of, Windows 2k8, doesn't keep good time with hpet with
either of the two policies that I support now.  I may be able to come
up with a policy for 2k8, but still, we should have the ability
to turn hpet off. Furthermore, the great accuracy I have been reporting
is for Linux guests, which, so far, all have computed missed ticks.
Windows 2k3 does not compute missed ticks, and I have a policy for it
which keeps it relatively accurate, but as I recall, not as accurate
as the Linux guests.

To turn the hpet off, not only should we not advertise it in acpi,
but also we should return 0 for the hpet capabilities register as some
guests just try to use the hpet regardless of acpi.

Regards,
Dave

>True?
>
>Thanks,
>Dan
>
>  
>
>>-----Original Message-----
>>From: xen-devel-bounces@lists.xensource.com
>>[mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of 
>>Dave Winchell
>>Sent: Monday, April 28, 2008 12:09 PM
>>To: dan.magenheimer@oracle.com
>>Cc: Dave Winchell; Tian, Kevin; xen-devel@lists.xensource.com; Ian
>>Pratt; Keir Fraser
>>Subject: Re: [Xen-devel] Re: Fix for get_s_time()
>>
>>
>>Dan Magenheimer wrote:
>>
>>    
>>
>>>Hi Dave --
>>>
>>>      
>>>
>>>>You know, its more like hpet on system time.
>>>>        
>>>>
>>>I wonder how much of the problems we observed with skew on 
>>>      
>>>
>>pit was due to
>>    
>>
>>>the pit-on-tsc "bug"... in other words, should the virtual 
>>>      
>>>
>>pit be based on
>>    
>>
>>>system time also?
>>>      
>>>
>>For guests that compute missed ticks, it may not help. That's 
>>because here
>>the guests are using tsc in their computations of offset and last
>>interrupt time stamp.
>>Also, there is the esoteric use of delay in the computations for pit.
>>With hpet, on the other hand, the guests don't read the tsc and don't
>>use delay -
>>they only rely on the hpet main counter.
>>
>>It might improve accuracy for a guest that does not compute missed
>>ticks. But you
>>would still have the time going backwards issue, unless you 
>>patched the
>>guest.
>>
>>Most of the hpet accuracy we see is due to clean and correct 
>>algorithms
>>in the guest,
>>in my opinion. Of course we have to do the right things in 
>>emulating the
>>hpet in xen.
>>
>>-Dave
>>
>>    
>>
>>>Dan
>>>
>>>    -----Original Message-----
>>>    *From:* Dave Winchell [mailto:dwinchell@virtualiron.com]
>>>    *Sent:* Friday, April 25, 2008 7:54 PM
>>>    *To:* dan.magenheimer@oracle.com
>>>    *Cc:* Keir Fraser; Tian, Kevin; 
>>>      
>>>
>>xen-devel@lists.xensource.com; Ian
>>    
>>
>>>    Pratt; Dave Winchell
>>>    *Subject:* RE: [Xen-devel] Re: Fix for get_s_time()
>>>
>>>    Hi Dan,
>>>
>>>    I just need to remove some debug and merge with unstable.
>>>    I should be able to send you a patch Monday or Tuesday.
>>>    You know, its more like hpet on system time.
>>>    Thanks for the testing offer.
>>>
>>>    Regards,
>>>    Dave
>>>
>>>
>>>    -----Original Message-----
>>>    From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>>>    Sent: Fri 4/25/2008 5:03 PM
>>>    To: Dave Winchell
>>>    Cc: Keir Fraser; Tian, Kevin; 
>>>      
>>>
>>xen-devel@lists.xensource.com; Ian Pratt
>>    
>>
>>>    Subject: RE: [Xen-devel] Re: Fix for get_s_time()
>>>
>>>    Hi Dave --
>>>
>>>    Are you ready to release the guest-virtual-platform-timer
>>>    on xen-system-time patch yet?  If so, we'd be happy to
>>>    give it some testing.
>>>
>>>    Thanks,
>>>    Dan
>>>
>>>    > -----Original Message-----
>>>    > From: Dave Winchell [mailto:dwinchell@virtualiron.com]
>>>    > Sent: Friday, April 25, 2008 1:48 PM
>>>    > To: Dave Winchell
>>>    > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com;
>>>    > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell
>>>    > Subject: Re: [Xen-devel] Re: Fix for get_s_time()
>>>    >
>>>    >
>>>    > Keir,
>>>    >
>>>    > Last nights run had the error in the 12 ppm range.
>>>    > Here is the change we have been talking about.
>>>    >
>>>    > -Dave
>>>    >
>>>    > Dave Winchell wrote:
>>>    >
>>>    > > Keir Fraser wrote:
>>>    > >
>>>    > >> On 24/4/08 17:04, "Dave Winchell"
>>>    > <dwinchell@virtualiron.com> wrote:
>>>    > >>
>>>    > >>
>>>    > >>
>>>    > >>> yes, this is the issue. What you suggest should be fine
>>>    > and I am trying
>>>    > >>> it now.
>>>    > >>> With the locking version (and a fix to a bug I
>>>    > introduced) I got .0012%
>>>    > >>> error
>>>    > >>> on an overnight run with hpet layered on
>>>    > get_s_time_mono(), which is
>>>    > >>> the
>>>    > >>> max(prev, cur) layer on get_s_time we discussed.
>>>    > >>>
>>>    > >>
>>>    > >>
>>>    > >> 12 parts per million is pretty good. Is that 
>>>      
>>>
>>cumulative deviation
>>    
>>
>>>    > >> from 'wall
>>>    > >> time' over ~12 hours?
>>>    > >>
>>>    > > yes, deviation between the guest's time and an ntp 
>>>      
>>>
>>reference.
>>    
>>
>>>    > >
>>>    > >> That could easily be explained by the fact that Xen
>>>    > >> system time is not sync'ed with ntp.
>>>    > >>
>>>    > >>
>>>    > > That's true. And, as we have discussed, this error seems to
>>>    > vary quite
>>>    > > a bit
>>>    > > platform to platform for some reason. I will verify that
>>>    > this still is
>>>    > > the case.
>>>    > >
>>>    > > -Dave
>>>    > >
>>>    > >> -- Keir
>>>    > >>
>>>    > >>
>>>    > >>
>>>    > >>
>>>    > >
>>>    >
>>>    >
>>>    >
>>>
>>>
>>>      
>>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>
>>    
>>
>
>  
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-28 18:40                                   ` Dave Winchell
@ 2008-04-29  7:14                                     ` Keir Fraser
  2008-04-29 14:21                                       ` Dave Winchell
  0 siblings, 1 reply; 24+ messages in thread
From: Keir Fraser @ 2008-04-29  7:14 UTC (permalink / raw)
  To: Dave Winchell, dan.magenheimer@oracle.com
  Cc: Tian, Kevin, xen-devel@lists.xensource.com, Ian Pratt

So does the below indicate that 92529 of your HPET accesses took between 0
and 63 cycles? That seems rather short for an access off-chip and to the
southbridge.

 -- Keir

On 28/4/08 19:40, "Dave Winchell" <dwinchell@virtualiron.com> wrote:

> read_64_main_counter() On stime:
> 
> (VMM)  cycles per bucket 64
> (VMM)
> (VMM)  0: 0 78795 148271 21173 15902 47704 89195 121962
> (VMM)  8: 83632 51848 17531 12987 10976 8816 9120 8608
> (VMM)  16: 5685 3972 3783 2518 1052 710 608 469
> (VMM)  24: 277 159 83 46 34 23 19 16
> (VMM)  32: 9 6 7 3 4 8 5 6
> (VMM)  40: 9 7 14 13 17 25 22 29
> (VMM)  48: 25 19 35 27 30 26 21 23
> (VMM)  56: 17 24 12 27 22 18 10 22
> (VMM)  64: 19 16 16 16 28 18 23 16
> (VMM)  72: 22 22 12 14 21 19 17 19
> (VMM)  80: 18 14 10 14 11 12 8 18
> (VMM)  88: 16 10 17 14 10 8 11 11
> (VMM)  96: 10 10 0 175
> 
> read_64_main_counter() Going to the hardware:
> 
> (VMM)  cycles per bucket 64
> (VMM)
> (VMM)  0: 92529 148423 27850 12532 28042 43336 60516 59011
> (VMM)  8: 36895 14043 8162 6857 7794 7401 5099 2986
> (VMM)  16: 1636 1066 796 592 459 409 314 248
> (VMM)  24: 206 195 138 97 71 45 35 34
> (VMM)  32: 33 36 40 40 25 26 25 26
> (VMM)  40: 37 23 18 30 27 30 34 44
> (VMM)  48: 38 19 25 23 23 25 21 27
> (VMM)  56: 28 24 43 80 220 324 568 599
> (VMM)  64: 610 565 611 699 690 846 874 788
> (VMM)  72: 703 542 556 613 605 603 559 500
> (VMM)  80: 485 493 512 578 561 594 575 614
> (VMM)  88: 759 851 895 856 807 770 719 958
> (VMM)  96: 1127 1263 0 18219
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: Fix for get_s_time()
  2008-04-29  7:14                                     ` Keir Fraser
@ 2008-04-29 14:21                                       ` Dave Winchell
  0 siblings, 0 replies; 24+ messages in thread
From: Dave Winchell @ 2008-04-29 14:21 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Dave Winchell, dan.magenheimer@oracle.com,
	xen-devel@lists.xensource.com, Ian Pratt, Tian, Kevin

Yes. Its an AMD Warthog. I'll try some other platforms when I get a chance.

-Dave

Keir Fraser wrote:

>So does the below indicate that 92529 of your HPET accesses took between 0
>and 63 cycles? That seems rather short for an access off-chip and to the
>southbridge.
>
> -- Keir
>
>On 28/4/08 19:40, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>
>  
>
>>read_64_main_counter() On stime:
>>
>>(VMM)  cycles per bucket 64
>>(VMM)
>>(VMM)  0: 0 78795 148271 21173 15902 47704 89195 121962
>>(VMM)  8: 83632 51848 17531 12987 10976 8816 9120 8608
>>(VMM)  16: 5685 3972 3783 2518 1052 710 608 469
>>(VMM)  24: 277 159 83 46 34 23 19 16
>>(VMM)  32: 9 6 7 3 4 8 5 6
>>(VMM)  40: 9 7 14 13 17 25 22 29
>>(VMM)  48: 25 19 35 27 30 26 21 23
>>(VMM)  56: 17 24 12 27 22 18 10 22
>>(VMM)  64: 19 16 16 16 28 18 23 16
>>(VMM)  72: 22 22 12 14 21 19 17 19
>>(VMM)  80: 18 14 10 14 11 12 8 18
>>(VMM)  88: 16 10 17 14 10 8 11 11
>>(VMM)  96: 10 10 0 175
>>
>>read_64_main_counter() Going to the hardware:
>>
>>(VMM)  cycles per bucket 64
>>(VMM)
>>(VMM)  0: 92529 148423 27850 12532 28042 43336 60516 59011
>>(VMM)  8: 36895 14043 8162 6857 7794 7401 5099 2986
>>(VMM)  16: 1636 1066 796 592 459 409 314 248
>>(VMM)  24: 206 195 138 97 71 45 35 34
>>(VMM)  32: 33 36 40 40 25 26 25 26
>>(VMM)  40: 37 23 18 30 27 30 34 44
>>(VMM)  48: 38 19 25 23 23 25 21 27
>>(VMM)  56: 28 24 43 80 220 324 568 599
>>(VMM)  64: 610 565 611 699 690 846 874 788
>>(VMM)  72: 703 542 556 613 605 603 559 500
>>(VMM)  80: 485 493 512 578 561 594 575 614
>>(VMM)  88: 759 851 895 856 807 770 719 958
>>(VMM)  96: 1127 1263 0 18219
>>
>>    
>>
>
>
>  
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-04-29 14:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <47FFC37A.4060402@virtualiron.com>
2008-04-11 21:20 ` System time monotonicity Keir Fraser
2008-04-11 21:41   ` Keir Fraser
2008-04-11 22:58     ` Dave Winchell
2008-04-12  7:09       ` Keir Fraser
2008-04-21 19:26         ` Dan Magenheimer
2008-04-21 19:31           ` Keir Fraser
2008-04-21 20:32             ` Fix for get_s_time() Dave Winchell
2008-04-21 22:55               ` Keir Fraser
2008-04-21 23:10                 ` Keir Fraser
2008-04-22 14:09                   ` Dave Winchell
2008-04-24 16:04                   ` Dave Winchell
2008-04-24 16:37                     ` Keir Fraser
2008-04-24 18:32                       ` Dave Winchell
2008-04-25 19:48                         ` Dave Winchell
2008-04-25 21:03                           ` Dan Magenheimer
2008-04-26  1:54                             ` Dave Winchell
2008-04-28 17:39                               ` Dan Magenheimer
2008-04-28 18:09                                 ` Dave Winchell
2008-04-28 18:40                                   ` Dave Winchell
2008-04-29  7:14                                     ` Keir Fraser
2008-04-29 14:21                                       ` Dave Winchell
2008-04-28 20:11                                   ` Dan Magenheimer
2008-04-28 21:29                                     ` Dave Winchell
2008-04-11 22:22   ` System time monotonicity Dan Magenheimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.