* System time monotonicity
@ 2007-03-26 18:23 John Levon
2007-03-26 18:47 ` Keir Fraser
2007-03-26 18:50 ` Ian Pratt
0 siblings, 2 replies; 36+ messages in thread
From: John Levon @ 2007-03-26 18:23 UTC (permalink / raw)
To: xen-devel
It seems that VCPU system time isn't monotonic (using 3.0.4). It seems
it might be correlated to when a VCPU is switched across real CPUs but I
haven't conclusively proved that. But e.g.:
{
old = {
time = {
version = 0x4ec
pad0 = 0xe8e0
tsc_timestamp = 0x22cc8398b7194
system_time = 0xe8e0345d8805
tsc_to_system_mul = 0xd62c0083
tsc_shift = '\377'
pad1 = [ '\002', '\027', '\365' ]
}
result = 0xe8e0484568fa
tsc = 0x22cc86921ab00
cpu = 0
}
new = {
time = {
version = 0x4ee
pad0 = 0
tsc_timestamp = 0x22cc7db96cd29
system_time = 0xe8e00d1031f3
tsc_to_system_mul = 0xd62ae844
tsc_shift = '\377'
pad1 = [ '\357', '\002', '\365' ]
}
result = 0xe8e048456012
tsc = 0x22cc869225443
cpu = 0
}
delta = 0xfffffffffffff718
}
>From what I can work out, time is supposed to be monotonic but I admit I
can't really understand the time code yet at least. I couldn't find any
documentation on what to expect from system time. Any suggestions?
This seems to happen across all the hardware we've tried but this
particular case is a Sun V20Z with two CPUs:
x86 (AuthenticAMD family 15 model 5 step 10 clock 2392 MHz)
AMD Opteron(tm) Processor 250
cheers,
john
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: System time monotonicity
2007-03-26 18:23 System time monotonicity John Levon
@ 2007-03-26 18:47 ` Keir Fraser
2007-03-26 20:04 ` John Levon
2007-03-26 18:50 ` Ian Pratt
1 sibling, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2007-03-26 18:47 UTC (permalink / raw)
To: John Levon, xen-devel
On 26/3/07 19:23, "John Levon" <levon@movementarian.org> wrote:
>> From what I can work out, time is supposed to be monotonic but I admit I
> can't really understand the time code yet at least. I couldn't find any
> documentation on what to expect from system time. Any suggestions?
>
> This seems to happen across all the hardware we've tried but this
> particular case is a Sun V20Z with two CPUs:
>
> x86 (AuthenticAMD family 15 model 5 step 10 clock 2392 MHz)
> AMD Opteron(tm) Processor 250
Small backwards time deltas are possible from the current time code. You'll
have to filter them out yourself if you can't deal with them. We could add
extra code in Xen to stop this happening for any individual VCPU.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-03-26 18:47 ` Keir Fraser
@ 2007-03-26 20:04 ` John Levon
2007-03-27 10:47 ` Keir Fraser
2007-04-03 14:03 ` John Levon
0 siblings, 2 replies; 36+ messages in thread
From: John Levon @ 2007-03-26 20:04 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
On Mon, Mar 26, 2007 at 07:47:49PM +0100, Keir Fraser wrote:
> >> From what I can work out, time is supposed to be monotonic but I admit I
> > can't really understand the time code yet at least. I couldn't find any
> > documentation on what to expect from system time. Any suggestions?
> >
> > This seems to happen across all the hardware we've tried but this
> > particular case is a Sun V20Z with two CPUs:
> >
> > x86 (AuthenticAMD family 15 model 5 step 10 clock 2392 MHz)
> > AMD Opteron(tm) Processor 250
>
> Small backwards time deltas are possible from the current time code. You'll
> have to filter them out yourself if you can't deal with them. We could add
> extra code in Xen to stop this happening for any individual VCPU
Some instrumentation indicated that we had cross-VCPU jitter of
significant deltas, ~18us at worst. Though the instrumentation wasn't
completely reliable so that might not be accurate.
On real hardware we deal with any jitter via comparing against a regular
clock on CPU0. This obviously won't work for Xen since we have no
control over where VCPU0 actually lives. Presumably the Linux code
(monotonic_clock() especially) must have something to handle it, but I
can't see where...
regards
john
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-03-26 20:04 ` John Levon
@ 2007-03-27 10:47 ` Keir Fraser
2007-04-03 14:03 ` John Levon
1 sibling, 0 replies; 36+ messages in thread
From: Keir Fraser @ 2007-03-27 10:47 UTC (permalink / raw)
To: John Levon; +Cc: xen-devel
On 26/3/07 21:04, "John Levon" <levon@movementarian.org> wrote:
>> Small backwards time deltas are possible from the current time code. You'll
>> have to filter them out yourself if you can't deal with them. We could add
>> extra code in Xen to stop this happening for any individual VCPU
>
> Some instrumentation indicated that we had cross-VCPU jitter of
> significant deltas, ~18us at worst. Though the instrumentation wasn't
> completely reliable so that might not be accurate.
I should add that time synchronisation is currently broken in xen-unstable,
and has been for about two weeks. I just checked in a patch (based on one
from Jan Beulich) to fix this (changeset 14573:ba9d3fd4ee4b).
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-03-26 20:04 ` John Levon
2007-03-27 10:47 ` Keir Fraser
@ 2007-04-03 14:03 ` John Levon
1 sibling, 0 replies; 36+ messages in thread
From: John Levon @ 2007-04-03 14:03 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
On Mon, Mar 26, 2007 at 09:04:47PM +0100, John Levon wrote:
> > >> From what I can work out, time is supposed to be monotonic but I admit I
> > > can't really understand the time code yet at least. I couldn't find any
> > > documentation on what to expect from system time. Any suggestions?
> > >
> > > This seems to happen across all the hardware we've tried but this
> > > particular case is a Sun V20Z with two CPUs:
> > >
> > > x86 (AuthenticAMD family 15 model 5 step 10 clock 2392 MHz)
> > > AMD Opteron(tm) Processor 250
> >
> > Small backwards time deltas are possible from the current time code. You'll
> > have to filter them out yourself if you can't deal with them. We could add
> > extra code in Xen to stop this happening for any individual VCPU
>
> Some instrumentation indicated that we had cross-VCPU jitter of
> significant deltas, ~18us at worst. Though the instrumentation wasn't
> completely reliable so that might not be accurate.
I've done some more testing, the problem appears to be significantly
worse than you're claiming. I store the previously read value in the
thread structure and breakpoint whenever it fails:
[2]> fffffffece9e6040::print kthread_t t_dhrtime t_dhrtime_tsc t_dhrtime_cpu t_dhrtime_vers
t_dhrtime = 0x3de7923989c
t_dhrtime_tsc = 0x9414981ce0e
t_dhrtime_cpu = 0x2
t_dhrtime_vers = 0x2616
And the new values:
t_dhrtime = 0x3de7923088b
t_dhrtime_tsc = 0x9414982169b
t_dhrtime_cpu = 0x2
t_dhrtime_vers = 0x2618
[2]> 0x3de7923989c - 0x000003de7923088b=D
36881
That's 36us. It doesn't even seem that we migrated across physical CPUs
given the version field. So something seems significantly awry to me?
(I must admit I still don't quite understand why monotonic_clock() is
called monotonic_clock() ...)
regards
john
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2007-03-26 18:23 System time monotonicity John Levon
2007-03-26 18:47 ` Keir Fraser
@ 2007-03-26 18:50 ` Ian Pratt
2007-03-26 18:59 ` Keir Fraser
1 sibling, 1 reply; 36+ messages in thread
From: Ian Pratt @ 2007-03-26 18:50 UTC (permalink / raw)
To: John Levon, xen-devel
> It seems that VCPU system time isn't monotonic (using 3.0.4). It seems
> it might be correlated to when a VCPU is switched across real CPUs but
I
> haven't conclusively proved that. But e.g.:
What output do you get when you hit 't' a few times on the xen serial
console?
There's no guarantee that the system time calculated will be perfectly
monotonic, but it should be very close. If the guest needs it to be
monotonic, the time reading code should simply clamp the value that's
read to ensure it always goes up. A little jitter at the microsecond
granularity should be just fine.
On your system it appears to be a couple of microseconds out, which is
on the high side of what we've observed. Normally you only see that kind
of mismatch on systems with TSCs running off different crystals.
Ian
> {
> old = {
> time = {
> version = 0x4ec
> pad0 = 0xe8e0
> tsc_timestamp = 0x22cc8398b7194
> system_time = 0xe8e0345d8805
> tsc_to_system_mul = 0xd62c0083
> tsc_shift = '\377'
> pad1 = [ '\002', '\027', '\365' ]
> }
> result = 0xe8e0484568fa
> tsc = 0x22cc86921ab00
> cpu = 0
> }
> new = {
> time = {
> version = 0x4ee
> pad0 = 0
> tsc_timestamp = 0x22cc7db96cd29
> system_time = 0xe8e00d1031f3
> tsc_to_system_mul = 0xd62ae844
> tsc_shift = '\377'
> pad1 = [ '\357', '\002', '\365' ]
> }
> result = 0xe8e048456012
> tsc = 0x22cc869225443
> cpu = 0
> }
> delta = 0xfffffffffffff718
> }
>
> >From what I can work out, time is supposed to be monotonic but I
admit
> I
> can't really understand the time code yet at least. I couldn't find
any
> documentation on what to expect from system time. Any suggestions?
>
> This seems to happen across all the hardware we've tried but this
> particular case is a Sun V20Z with two CPUs:
>
> x86 (AuthenticAMD family 15 model 5 step 10 clock 2392 MHz)
> AMD Opteron(tm) Processor 250
>
> cheers,
> john
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: System time monotonicity
2007-03-26 18:50 ` Ian Pratt
@ 2007-03-26 18:59 ` Keir Fraser
2007-03-26 20:14 ` John Levon
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2007-03-26 18:59 UTC (permalink / raw)
To: Ian Pratt, John Levon, xen-devel
On 26/3/07 19:50, "Ian Pratt" <Ian.Pratt@cl.cam.ac.uk> wrote:
> On your system it appears to be a couple of microseconds out, which is
> on the high side of what we've observed. Normally you only see that kind
> of mismatch on systems with TSCs running off different crystals.
More likely a jittery chipset timer -- we've observed less-than-ideal
stability from some chipset timers, which can throw us off a bit when
independently sync'ing the TSCs (which each CPU does for its TSC
independently every couple of seconds).
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-03-26 18:59 ` Keir Fraser
@ 2007-03-26 20:14 ` John Levon
2007-03-26 21:55 ` Ian Pratt
2007-03-27 0:27 ` Keir Fraser
0 siblings, 2 replies; 36+ messages in thread
From: John Levon @ 2007-03-26 20:14 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ian Pratt, xen-devel
On Mon, Mar 26, 2007 at 07:59:27PM +0100, Keir Fraser wrote:
> On 26/3/07 19:50, "Ian Pratt" <Ian.Pratt@cl.cam.ac.uk> wrote:
>
> > On your system it appears to be a couple of microseconds out, which is
> > on the high side of what we've observed. Normally you only see that kind
> > of mismatch on systems with TSCs running off different crystals.
>
> More likely a jittery chipset timer -- we've observed less-than-ideal
> stability from some chipset timers, which can throw us off a bit when
> independently sync'ing the TSCs (which each CPU does for its TSC
> independently every couple of seconds).
And what about cross-VCPU monotonicity? It's called very frequently and
having to fake monotonicity via a single variable across all CPUs would
not be pleasant, unless I can think of something smarter...
regards
john
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2007-03-26 20:14 ` John Levon
@ 2007-03-26 21:55 ` Ian Pratt
2007-03-27 0:27 ` Keir Fraser
1 sibling, 0 replies; 36+ messages in thread
From: Ian Pratt @ 2007-03-26 21:55 UTC (permalink / raw)
To: John Levon, Keir Fraser; +Cc: Ian Pratt, xen-devel
> > More likely a jittery chipset timer -- we've observed
less-than-ideal
> > stability from some chipset timers, which can throw us off a bit
when
> > independently sync'ing the TSCs (which each CPU does for its TSC
> > independently every couple of seconds).
>
> And what about cross-VCPU monotonicity? It's called very frequently
and
> having to fake monotonicity via a single variable across all CPUs
would
> not be pleasant, unless I can think of something smarter...
One LOCK'ed cmpxchg is still a lot quicker than reading the pm_timer or
hpet...
Ian
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-03-26 20:14 ` John Levon
2007-03-26 21:55 ` Ian Pratt
@ 2007-03-27 0:27 ` Keir Fraser
1 sibling, 0 replies; 36+ messages in thread
From: Keir Fraser @ 2007-03-27 0:27 UTC (permalink / raw)
To: John Levon; +Cc: Ian Pratt, xen-devel
On 26/3/07 21:14, "John Levon" <levon@movementarian.org> wrote:
>> More likely a jittery chipset timer -- we've observed less-than-ideal
>> stability from some chipset timers, which can throw us off a bit when
>> independently sync'ing the TSCs (which each CPU does for its TSC
>> independently every couple of seconds).
>
> And what about cross-VCPU monotonicity? It's called very frequently and
> having to fake monotonicity via a single variable across all CPUs would
> not be pleasant, unless I can think of something smarter...
Yet it is what you need to do if you want to absolutely guarantee monotonic
time across all VCPUS. I don't think there's any way around that, but as Ian
says it's still going to be cheaper than reading a chipset timer. I'd be
inclined to settle for per-vcpu and per-OS-task monotonicity.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
@ 2007-04-03 14:36 Ian Pratt
2007-04-03 14:57 ` John Levon
0 siblings, 1 reply; 36+ messages in thread
From: Ian Pratt @ 2007-04-03 14:36 UTC (permalink / raw)
To: John Levon, Keir Fraser; +Cc: xen-devel
> > Some instrumentation indicated that we had cross-VCPU jitter of
> > significant deltas, ~18us at worst. Though the
> instrumentation wasn't
> > completely reliable so that might not be accurate.
>
> I've done some more testing, the problem appears to be
> significantly worse than you're claiming. I store the
> previously read value in the thread structure and breakpoint
> whenever it fails:
Try booting with one physical (maxcpus=1 on the xen command line) CPU
just to verify this isn't a multi-CPU issue.
If you're still seeing jitter, please post the boot logs.
We've seen this on white-box AMD systems which were over heating and the
CPU was bouncing in and out of thermal throttling, but never on a tier-1
vendor system.
You could try switching between the PIT and HPET as the calibration
source.
Ian
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2007-04-03 14:36 Ian Pratt
@ 2007-04-03 14:57 ` John Levon
0 siblings, 0 replies; 36+ messages in thread
From: John Levon @ 2007-04-03 14:57 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
On Tue, Apr 03, 2007 at 03:36:48PM +0100, Ian Pratt wrote:
> Try booting with one physical (maxcpus=1 on the xen command line) CPU
> just to verify this isn't a multi-CPU issue.
Can reproduce:
[1]> 0xfffffffed5cc0120::print kthread_t t_dhrtime t_dhrtime_cpu
t_dhrtime_tsc t_dhrtime_vers
t_dhrtime = 0x509957b7c2
t_dhrtime_cpu = 0x1
t_dhrtime_tsc = 0xf6f1529888
t_dhrtime_vers = 0x132
[1]> 0x509957b7c2 - 0x000000509957039e=D
46116
See boot log below.
> You could try switching between the PIT and HPET as the calibration
> source.
Doesn't look like there's a flag to force PIT usage. I'll modify the source and
try it.
john
http://www.cl.cam.ac.uk/netos/xen
University of Cambridge Computer Laboratory
Xen version 3.0.4-1-sun (johnlev@mpklab.sfbay.sun.com) (gcc version 3.4.3 (csl-sol210-3_4-20050802)) Mon Apr 2 16:14:36 PDT 2007
Latest ChangeSet: Mon Apr 02 16:11:18 2007 -0700 13188:3a69441c8c65
(XEN) Command line: /boot/amd64/xen.gz console=com1 com1=9600,8n1 maxcpus=1
(XEN) Physical RAM map:
(XEN) 0000000000000000 - 000000000009a000 (usable)
(XEN) 000000000009a000 - 00000000000a0000 (reserved)
(XEN) 00000000000d0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000fbf70000 (usable)
(XEN) 00000000fbf70000 - 00000000fbf77000 (ACPI data)
(XEN) 00000000fbf77000 - 00000000fbf80000 (ACPI NVS)
(XEN) 00000000fbf80000 - 00000000fc000000 (reserved)
(XEN) 00000000fec00000 - 00000000fec00400 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000fff80000 - 0000000100000000 (reserved)
(XEN) System RAM: 4031MB (4127784kB)
(XEN) ACPI: RSDP (v002 PTLTD ) @ 0x00000000000f7dc0
(XEN) ACPI: XSDT (v001 PTLTD XSDT 0x06040000 LTP 0x00000000) @ 0x00000000fbf74bd4
(XEN) ACPI: FADT (v003 SUN V20z 0x06040000 PTEC 0x000f4240) @ 0x00000000fbf76c0c
(XEN) ACPI: HPET (v001 Sun V20z 0x06040000 PTEC 0x00000000) @ 0x00000000fbf76d00
(XEN) ACPI: MADT (v001 PTLTD APIC 0x06040000 LTP 0x00000000) @ 0x00000000fbf76d38
(XEN) ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x06040000 PTL 0x00000001) @ 0x00000000fbf76dae
(XEN) ACPI: SSDT (v001 SUN V20z 0x06040000 LTP 0x00000001) @ 0x00000000fbf76dfe
(XEN) ACPI: SSDT (v001 SUN V20z 0x06040000 LTP 0x00000001) @ 0x00000000fbf76e9b
(XEN) ACPI: SRAT (v001 SUN V20z 0x06040000 SUN 0x00000001) @ 0x00000000fbf76f38
(XEN) ACPI: DSDT (v001 Sun V20z 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000
(XEN) NUMA turned off
(XEN) Faking a node at 0000000000000000-00000000fbf70000
(XEN) Domain heap initialised: DMA width 30 bits
(XEN) Xen heap: 13MB (14044kB)
(XEN) found SMP MP-table at 000f7df0
(XEN) DMI present.
(XEN) Using APIC driver default
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 15:5 APIC version 16
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 15:5 APIC version 16
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x03] address[0xfc800000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 3, version 17, address 0xfc800000, GSI 24-27
(XEN) ACPI: IOAPIC (id[0x04] address[0xfc801000] gsi_base[28])
(XEN) IOAPIC[2]: apic_id 4, version 17, address 0xfc801000, GSI 28-31
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) Enabling APIC mode: Flat. Using 3 I/O APICs
(XEN) ACPI: HPET id: 0x102282a0 base: 0xfed00000
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Initializing CPU#0
(XEN) Detected 2392.196 MHz processor.
(XEN) CPU0: AMD Flush Filter disabled
(XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 1024K (64 bytes/line)
(XEN) Intel machine check architecture supported.
(XEN) Intel machine check reporting enabled on CPU#0.
(XEN) CPU0: AMD Opteron(tm) Processor 250 stepping 0a
(XEN) Mapping cpu 0 to node 255
(XEN) Total of 1 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 1 CPUs
(XEN) Machine check exception polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 000000000c000000->0000000010000000 (966649 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: 0000000040000000->00000000408263a8
(XEN) Init. ramdisk: 0000000040827000->0000000043273800
(XEN) Phys-Mach map: 0000000043274000->00000000439f3fc8
(XEN) Start info: 00000000439f4000->00000000439f449c
(XEN) Page tables: 00000000439f5000->0000000043a16000
(XEN) Boot stack: 0000000043a16000->0000000043a17000
(XEN) TOTAL: 0000000040000000->0000000043c00000
(XEN) ENTRY ADDRESS: 0000000040800000
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Initrd len 0x2a4c800, start at 0x40827000
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).
Loading kmdb...
SunOS Release 5.11 Version onnv-johnlev 64-bit
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
@ 2007-04-03 17:51 Ian Pratt
0 siblings, 0 replies; 36+ messages in thread
From: Ian Pratt @ 2007-04-03 17:51 UTC (permalink / raw)
To: John Levon, Ian Pratt; +Cc: xen-devel
> > Try booting with one physical (maxcpus=1 on the xen command
> line) CPU
> > just to verify this isn't a multi-CPU issue.
>
> Can reproduce:
>
> [1]> 0xfffffffed5cc0120::print kthread_t t_dhrtime
> t_dhrtime_cpu t_dhrtime_tsc t_dhrtime_vers t_dhrtime =
> 0x509957b7c2 t_dhrtime_cpu = 0x1 t_dhrtime_tsc = 0xf6f1529888
> t_dhrtime_vers = 0x132
>
> [1]> 0x509957b7c2 - 0x000000509957039e=D
> 46116
Please can you try hitting 't' a few times on the Xen debug console
(again, maxcpus=1)
If you're seeing time jump back on a single CPU, it must mean that the
TSC ran quicker during the current period than what it was calibrated to
be running at in the last period. It's worth adding some debuging
printk's to the calibration code in Xen to see what it thinks is going
on from one calibration period to the next.
Thanks,
Ian
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
@ 2008-04-08 16:34 Dan Magenheimer
2008-04-08 16:42 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-08 16:34 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
[-- Attachment #1: Type: text/plain, Size: 1297 bytes --]
>On 26/3/07 19:50, "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxx> wrote:
>
>> On your system it appears to be a couple of microseconds out, which is
>> on the high side of what we've observed. Normally you only see that kind
>> of mismatch on systems with TSCs running off different crystals.
>
> More likely a jittery chipset timer -- we've observed less-than-ideal
> stability from some chipset timers, which can throw us off a bit when
> independently sync'ing the TSCs (which each CPU does for its TSC
> independently every couple of seconds).
>
> -- Keir
Sorry, a little slow on responding here, only took a year ;-)
Where is the code that does this independent TSC sync'ing? I see
code in smpboot.c that seems to do this at startup (though exactly
how I admit I haven't yet figured out... looks like some kind of
rendezvous loop triggered by the BP?). But I don't see where/how
this gets called "every couple of seconds", nor do I see any writing
to the TSC (except setting BP and each AP to zero at startup).
Thanks,
Dan
===================================
If Xen could save time in a bottle / then clocks wouldn't virtually skew /
It would save every tick / for VMs that aren't quick /
and Xen then would send them anew
(with apologies to the late great Jim Croce)
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-08 16:34 Dan Magenheimer
@ 2008-04-08 16:42 ` Keir Fraser
2008-04-08 17:39 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-08 16:42 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, xen-devel@lists.xensource.com
On 8/4/08 17:34, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> Sorry, a little slow on responding here, only took a year ;-)
>
> Where is the code that does this independent TSC sync'ing? I see
> code in smpboot.c that seems to do this at startup (though exactly
> how I admit I haven't yet figured out... looks like some kind of
> rendezvous loop triggered by the BP?). But I don't see where/how
> this gets called "every couple of seconds", nor do I see any writing
> to the TSC (except setting BP and each AP to zero at startup).
arch/x86/time.c:local_time_calibration()
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-08 16:42 ` Keir Fraser
@ 2008-04-08 17:39 ` Dan Magenheimer
2008-04-09 1:16 ` Tian, Kevin
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-08 17:39 UTC (permalink / raw)
To: Keir Fraser, xen-devel@lists.xensource.com
> > Where is the code that does this independent TSC sync'ing? I see
> > code in smpboot.c that seems to do this at startup (though exactly
> > how I admit I haven't yet figured out... looks like some kind of
> > rendezvous loop triggered by the BP?). But I don't see where/how
> > this gets called "every couple of seconds", nor do I see any writing
> > to the TSC (except setting BP and each AP to zero at startup).
>
> arch/x86/time.c:local_time_calibration()
OK, thanks.
If I read the code correctly, Xen goes through this effort to
ensure that the TSC's are synchronized, but maintains this
synchronization in a data structure and doesn't actually
change each processor's physical TSC. Correct? This is of
course just fine for the hypervisor's timer needs (and thus
indirectly for paravirtualized domains).
But I also observe that all of the hvm platform timer (pit,
hpet, and pmtimer) code is built on top of the physical TSC
plus the vmx/svm tsc_offset which doesn't seem to be affected
by the Xen TSC synchronization. True?
So assuming the above isn't mistaken, hvm domain reads of the
platform timer on an SMP system lacking hardware-synchronized
TSC may suffer from non-monotonicity. Correct?
Thanks,
Dan
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-08 17:39 ` Dan Magenheimer
@ 2008-04-09 1:16 ` Tian, Kevin
2008-04-09 1:55 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2008-04-09 1:16 UTC (permalink / raw)
To: dan.magenheimer, Keir Fraser, xen-devel
>From: Dan Magenheimer
>Sent: 2008年4月9日 1:40
>
>But I also observe that all of the hvm platform timer (pit,
>hpet, and pmtimer) code is built on top of the physical TSC
>plus the vmx/svm tsc_offset which doesn't seem to be affected
>by the Xen TSC synchronization. True?
For cpus on same system bus driven by one crystal, TSC drift among
cpus may be just dozen of cycles after boot time sync, which is
negligible enough compared to migration overhead and thus it's unlikely
to have HVM guest to observe a non-monotonic behavior after resume.
The issue comes with cpus running on different frequency, like driven
by multiple crystals or on-demand frequency change which affects TSC
too. HVM guest can be configured to avoid migrating among cpus with
different TSC freq, like limiting its cpu affinity to cpus on same system
bus. Or you have to configure HVM guest to not trust TSC...
Thanks,
Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 1:16 ` Tian, Kevin
@ 2008-04-09 1:55 ` Dan Magenheimer
2008-04-09 3:20 ` Tian, Kevin
2008-04-09 12:42 ` Ian Pratt
0 siblings, 2 replies; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-09 1:55 UTC (permalink / raw)
To: Tian, Kevin, Keir Fraser, xen-devel@lists.xensource.com
[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]
> >But I also observe that all of the hvm platform timer (pit,
> >hpet, and pmtimer) code is built on top of the physical TSC
> >plus the vmx/svm tsc_offset which doesn't seem to be affected
> >by the Xen TSC synchronization. True?
>
> For cpus on same system bus driven by one crystal, TSC drift among
> cpus may be just dozen of cycles after boot time sync, which is
> negligible enough compared to migration overhead and thus
> it's unlikely
> to have HVM guest to observe a non-monotonic behavior after resume.
I agree this case is not much of a problem.
> The issue comes with cpus running on different frequency, like driven
> by multiple crystals or on-demand frequency change which affects TSC
> too. HVM guest can be configured to avoid migrating among cpus with
> different TSC freq, like limiting its cpu affinity to cpus on
> same system bus.
These are the cases I am worried about. The linux kernel seems
to have a number of cases that mark TSC as unstable, but
Xen does not, nor (I think) does Xen expose this information
anywhere. So it seems SMP guests need to be pinned to physical
CPUs that are measured to have sync'ed TSC's to guarantee that
the (virtual) platform timer is monotonic.
> Or you have to configure HVM guest to not trust TSC...
Yes, that's what I'm thinking... like Linux, Xen could/should
build virtual platform timers on a physical clocksource other
than tsc if all of the potential vcpu->pcpu mappings are not
on sync'd-TSC-pcpus.
I assume this problem is worse with multi-socket Hypertransport
and future Intel QPI boxes? Or is TSC (and frequency changing)
synchronized for such systems?
Thanks,
Dan
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 1:55 ` Dan Magenheimer
@ 2008-04-09 3:20 ` Tian, Kevin
2008-04-09 12:42 ` Ian Pratt
1 sibling, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2008-04-09 3:20 UTC (permalink / raw)
To: dan.magenheimer, Keir Fraser, xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>Sent: 2008年4月9日 9:55
>
>> Or you have to configure HVM guest to not trust TSC...
>
>Yes, that's what I'm thinking... like Linux, Xen could/should
>build virtual platform timers on a physical clocksource other
>than tsc if all of the potential vcpu->pcpu mappings are not
>on sync'd-TSC-pcpus.
virtual platform timers are only one area. The most important is
TSC itself which is used frequently by guest to calculate relative
offset...
>
>I assume this problem is worse with multi-socket Hypertransport
>and future Intel QPI boxes? Or is TSC (and frequency changing)
>synchronized for such systems?
For same crystal case, Intel processors with VT-x support all have
TSC constant feature which is not bound to frequency change and
can be detected by CPUID. But for multiple crystals case, Xen may
need tackle affinity then.
Thanks,
Kevin
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 1:55 ` Dan Magenheimer
2008-04-09 3:20 ` Tian, Kevin
@ 2008-04-09 12:42 ` Ian Pratt
2008-04-09 14:25 ` Dan Magenheimer
1 sibling, 1 reply; 36+ messages in thread
From: Ian Pratt @ 2008-04-09 12:42 UTC (permalink / raw)
To: Dan Magenheimer, Tian, Kevin, Keir Fraser, xen-devel; +Cc: Ian Pratt
> > The issue comes with cpus running on different frequency, like
driven
> > by multiple crystals or on-demand frequency change which affects TSC
> > too. HVM guest can be configured to avoid migrating among cpus with
> > different TSC freq, like limiting its cpu affinity to cpus on same
> > system bus.
>
> These are the cases I am worried about. The linux kernel seems to
have
> a number of cases that mark TSC as unstable, but Xen does not, nor (I
> think) does Xen expose this information anywhere. So it seems SMP
> guests need to be pinned to physical CPUs that are measured to have
> sync'ed TSC's to guarantee that the (virtual) platform timer is
> monotonic.
Xen itself copes fine with CPUs running from entirely independent clock
sources. It calibrates the TSCs frequency against a global clock (e.g.
the hpet).
> > Or you have to configure HVM guest to not trust TSC...
>
> Yes, that's what I'm thinking... like Linux, Xen could/should build
> virtual platform timers on a physical clocksource other than tsc if
all
> of the potential vcpu->pcpu mappings are not on sync'd-TSC-pcpus.
Although Xen is fine, guests can get confused if they're relying on the
TSC. Fortunately, Windows doesn't rely on the TSC, and most folk run
Linux PV which also works fine.
If you want to make Linux work HVM on such a system you need to either
convince it to not to use the TSC, or arrange for TSC reads to trap to
Xen and then compute the result based on Xen's time base. If you're
doing the latter, better hope that TSC reads aren't called frequently...
Ian
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 12:42 ` Ian Pratt
@ 2008-04-09 14:25 ` Dan Magenheimer
2008-04-09 14:41 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-09 14:25 UTC (permalink / raw)
To: Ian Pratt, Tian, Kevin, Keir Fraser,
xen-devel@lists.xensource.com
> Although Xen is fine, guests can get confused if they're
> relying on the
> TSC. Fortunately, Windows doesn't rely on the TSC, and most folk run
> Linux PV which also works fine.
>
> If you want to make Linux work HVM on such a system you need to either
> convince it to not to use the TSC, or arrange for TSC reads to trap to
> Xen and then compute the result based on Xen's time base. If you're
> doing the latter, better hope that TSC reads aren't called
> frequently...
Hi Ian --
Let me clarify... unless my reading of the code is wrong, ALL hvm
guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
relying on the physical TSCs. Thus if the underlying physical
system has unsynchronized TSCs, different vcpus in an SMP HVM
guest (or even the SAME vcpu when rescheduled on another pcpu)
may find that consecutive reads of ANY (virtual) platform timer
are unexpectedly non-monotonic, which violates the whole purpose
of using a PLATFORM timer.
I suspect this is unintended and bad?
Thanks,
Dan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-09 14:25 ` Dan Magenheimer
@ 2008-04-09 14:41 ` Keir Fraser
2008-04-09 16:33 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-09 14:41 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
On 9/4/08 15:25, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> Let me clarify... unless my reading of the code is wrong, ALL hvm
> guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
> relying on the physical TSCs. Thus if the underlying physical
> system has unsynchronized TSCs, different vcpus in an SMP HVM
> guest (or even the SAME vcpu when rescheduled on another pcpu)
> may find that consecutive reads of ANY (virtual) platform timer
> are unexpectedly non-monotonic, which violates the whole purpose
> of using a PLATFORM timer.
This is all true. The logic in vpt.c should be fixed to use Xen's concept of
system time and everything, guest TSC included, should be derived from that.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 14:41 ` Keir Fraser
@ 2008-04-09 16:33 ` Dan Magenheimer
2008-04-09 16:40 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-09 16:33 UTC (permalink / raw)
To: Keir Fraser, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
> > Let me clarify... unless my reading of the code is wrong, ALL hvm
> > guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
> > relying on the physical TSCs. Thus if the underlying physical
> > system has unsynchronized TSCs, different vcpus in an SMP HVM
> > guest (or even the SAME vcpu when rescheduled on another pcpu)
> > may find that consecutive reads of ANY (virtual) platform timer
> > are unexpectedly non-monotonic, which violates the whole purpose
> > of using a PLATFORM timer.
>
> This is all true. The logic in vpt.c should be fixed to use
> Xen's concept of
> system time and everything, guest TSC included, should be
> derived from that.
Does Xen's concept of system time have sufficient resolution
and continuity to ensure both monotonicity and a reasonable
guest timer granularity? I'm thinking not; some form of
interpolation will probably be necessary which will require
reading a physical platform timer** (e.g. other than tsc).
Since a guest that is presented with a (virtual) platform timer
of a given resolution may come to rely on both the monotonicity
AND resolution of that timer, I'm beginning to understand why
"that other virtualization company" doesn't virtualize HPET.
Dan
** Lest anyone say "well then just read the d**n platform timer",
be aware that it must be done judiciously as it can be very
expensive: On one recent vintage box I have, I measured reading
HPET at about 10000 cycles and reading PIT at about 50000!
So if every vcpu on every guest reads the (virtual) platform
timer at 1000Hz, things can get ugly fast.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-09 16:33 ` Dan Magenheimer
@ 2008-04-09 16:40 ` Keir Fraser
2008-04-09 18:36 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-09 16:40 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
On 9/4/08 17:33, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>> This is all true. The logic in vpt.c should be fixed to use
>> Xen's concept of
>> system time and everything, guest TSC included, should be
>> derived from that.
>
> Does Xen's concept of system time have sufficient resolution
> and continuity to ensure both monotonicity and a reasonable
> guest timer granularity? I'm thinking not; some form of
> interpolation will probably be necessary which will require
> reading a physical platform timer** (e.g. other than tsc).
Xen's system time provides nanosecond precision and is intended to be as
accurate as the underlying platform timer (over long periods) and as
granular and accurate as the TSC over sub-second periods. It's quite good
enough for any guest purposes.
> Since a guest that is presented with a (virtual) platform timer
> of a given resolution may come to rely on both the monotonicity
> AND resolution of that timer, I'm beginning to understand why
> "that other virtualization company" doesn't virtualize HPET.
The HPET is a good example of the difference between precision and accuracy.
It may report its period in picoseconds, but the spec allows drift of 100s
of ppm.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-09 16:40 ` Keir Fraser
@ 2008-04-09 18:36 ` Dan Magenheimer
2008-04-10 7:08 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-09 18:36 UTC (permalink / raw)
To: Keir Fraser, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
> >> This is all true. The logic in vpt.c should be fixed to use
> >> Xen's concept of
> >> system time and everything, guest TSC included, should be
> >> derived from that.
> >
> > Does Xen's concept of system time have sufficient resolution
> > and continuity to ensure both monotonicity and a reasonable
> > guest timer granularity? I'm thinking not; some form of
> > interpolation will probably be necessary which will require
> > reading a physical platform timer** (e.g. other than tsc).
>
> Xen's system time provides nanosecond precision and is
> intended to be as
> accurate as the underlying platform timer (over long periods) and as
> granular and accurate as the TSC over sub-second periods.
> It's quite good enough for any guest purposes.
OK, as long as the maximum uncorrected drift between physical TSCs
does not exceed the guest-expected granularity of its virtual
platform timer, I agree its good enough.
It appears that TSC drift for each pcpu is corrected by Xen
once per second. Any idea for real systems out there what the
maximum drift (per second) is? Will this be affected by
existing or future power-savings designs (e.g. is it possible
for the TSCs in one socket to be slowed down while the TSCs
in another socket are not)? If so, as Kevin points out,
some kind of affinity enforcement might be necessary for
time-sensitive VMs.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-09 18:36 ` Dan Magenheimer
@ 2008-04-10 7:08 ` Keir Fraser
2008-04-10 21:27 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-10 7:08 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
On 9/4/08 19:36, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> OK, as long as the maximum uncorrected drift between physical TSCs
> does not exceed the guest-expected granularity of its virtual
> platform timer, I agree its good enough.
Ignoring power-saving events, TSCs are crystal-driven and hence we can
expect specified tolerance of a few ppm across temperature extremes, and in
practice over few-second periods I would expect tolerance of better than
1ppm. *However* I have seen platform timers (which also should be
crystal-driven) which inexplicably exhibit much worse behaviour.
> It appears that TSC drift for each pcpu is corrected by Xen
> once per second. Any idea for real systems out there what the
> maximum drift (per second) is? Will this be affected by
> existing or future power-savings designs (e.g. is it possible
> for the TSCs in one socket to be slowed down while the TSCs
> in another socket are not)? If so, as Kevin points out,
> some kind of affinity enforcement might be necessary for
> time-sensitive VMs.
P-state changes are informed to Xen so we can re-sync the local TSC
immediately. The tricky ones are unannounced thermal events because software
does not get informed about those. On some systems we can turn them off, on
others (new Intel platforms) TSC is constant-rate regardless. In a normal
running system thermal events are rare.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-10 7:08 ` Keir Fraser
@ 2008-04-10 21:27 ` Dan Magenheimer
2008-04-11 6:48 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-10 21:27 UTC (permalink / raw)
To: Keir Fraser, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
Cc: Dave Winchell
> > OK, as long as the maximum uncorrected drift between physical TSCs
> > does not exceed the guest-expected granularity of its virtual
> > platform timer, I agree its good enough.
>
> Ignoring power-saving events, TSCs are crystal-driven and hence we can
> expect specified tolerance of a few ppm across temperature
> extremes, and in
> practice over few-second periods I would expect tolerance of
> better than
> 1ppm. *However* I have seen platform timers (which also should be
> crystal-driven) which inexplicably exhibit much worse behaviour.
OK... back to monotonicity for a moment:
So regardless of ppms and thermal and P-state and drifts,
are you confident that the current corrected-tsc mechanism
will never see time going backwards for the following test?
(Apologies for pseudo-code, but hope you get the drift...
pun intended).
global val1, proceed = 0;
Guest thread 1:
spin_lock(lock);
val1 = read_hpet();
proceed = 1;
spin_unlock(lock);
Guest thread 2:
while (!proceed);
spin_unlock_wait(lock);
val2 = read_hpet();
if (val2 < val1) PANIC();
If you are not confident that this will be OK on existing and
(within-reason) future Xen platforms, perhaps the hvm virtual
platform timers should (at least optionally) be built on physical
platform timers (Dave Winchell cc'ed), which would ensure time
never goes backwards.
> > It appears that TSC drift for each pcpu is corrected by Xen
> > once per second. Any idea for real systems out there what the
> > maximum drift (per second) is? Will this be affected by
> > existing or future power-savings designs (e.g. is it possible
> > for the TSCs in one socket to be slowed down while the TSCs
> > in another socket are not)? If so, as Kevin points out,
> > some kind of affinity enforcement might be necessary for
> > time-sensitive VMs.
>
> P-state changes are informed to Xen so we can re-sync the local TSC
> immediately. The tricky ones are unannounced thermal events
> because software
> does not get informed about those. On some systems we can
> turn them off, on
> others (new Intel platforms) TSC is constant-rate regardless.
> In a normal
> running system thermal events are rare.
If it is possible to write code that can determine at
boot-time (or at hotplug cpu_online) what CPUs are
guaranteed-sync'ed with what other CPUs, it would be
nice if this information was exported by Xen
so that tools can manage very-time-sensitive guests
appropriately.
Personally, I think this code should be provided by the
CPU vendors ;-)
Dan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-10 21:27 ` Dan Magenheimer
@ 2008-04-11 6:48 ` Keir Fraser
2008-04-11 22:05 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-11 6:48 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, Ian Pratt, Tian, Kevin,
xen-devel@lists.xensource.com
Cc: Dave Winchell
On 10/4/08 22:27, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> If you are not confident that this will be OK on existing and
> (within-reason) future Xen platforms, perhaps the hvm virtual
> platform timers should (at least optionally) be built on physical
> platform timers (Dave Winchell cc'ed), which would ensure time
> never goes backwards.
If we wanted to be more certain we could maintain a last_system_time fields
per VCPU and, whenever using system time to compute current value for a
virtual timer for an HVM VCPU, we could actually use max(system time,
last_system_time). This would mean we were 100% sure that time didn't go
backwards, by turning small backwards deltas into very short periods of
stalled time.
As it is: no, since system time 'free runs' on each CPU over one-second
periods, there can be drift between CPUs if they are driven by different
oscillators. Also there are tolerances in our software calibration code to
consider. Which is why Linux guests implement the max(curr time, last time)
in their gettimeofday() code. It would be quite reasonable to the same,
inside Xen, for HVM guests. We can at least be pretty certain that any
drifts across CPUs/VCPUs will be on the order of less than 100us.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-11 6:48 ` Keir Fraser
@ 2008-04-11 22:05 ` Dan Magenheimer
0 siblings, 0 replies; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-11 22:05 UTC (permalink / raw)
To: Keir Fraser
Cc: Tian, Kevin, dan.magenheimer@oracle.com,
xen-devel@lists.xensource.com, Dave Winchell, Ian Pratt
> If we wanted to be more certain we could maintain a
> last_system_time fields per VCPU and
If you mean per VCPU *and* per guest this seems like
a good idea.
> backwards, by turning small backwards deltas into very short
> periods of stalled time.
The stalled time may be a problem, but only if the tsc skew
between processors is "bad". Your estimate of 100us seems
like it could be unacceptable for some applications.
Any idea how expensive arch/x86/time.c:local_time_calibration()
is? If it's not too bad, one option might be to add a xen
boot parameter "calibratehz" to calibrate more frequently.
Then systems running time-sensitive guests can be instructed
to increase the parameter accordingly to ensure tsc skew
is small enough.
> > If you are not confident that this will be OK on existing and
> > (within-reason) future Xen platforms, perhaps the hvm virtual
> > platform timers should (at least optionally) be built on physical
> > platform timers (Dave Winchell cc'ed), which would ensure time
> > never goes backwards.
>
> If we wanted to be more certain we could maintain a
> last_system_time fields
> per VCPU and, whenever using system time to compute current
> value for a
> virtual timer for an HVM VCPU, we could actually use max(system time,
> last_system_time). This would mean we were 100% sure that
> time didn't go
> backwards, by turning small backwards deltas into very short
> periods of
> stalled time.
>
> As it is: no, since system time 'free runs' on each CPU over
> one-second
> periods, there can be drift between CPUs if they are driven
> by different
> oscillators. Also there are tolerances in our software
> calibration code to
> consider. Which is why Linux guests implement the max(curr
> time, last time)
> in their gettimeofday() code. It would be quite reasonable to
> the same,
> inside Xen, for HVM guests. We can at least be pretty certain that any
> drifts across CPUs/VCPUs will be on the order of less than 100us.
>
> -- Keir
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <47FFC37A.4060402@virtualiron.com>]
* Re: System time monotonicity
[not found] <47FFC37A.4060402@virtualiron.com>
@ 2008-04-11 21:20 ` Keir Fraser
2008-04-11 21:41 ` Keir Fraser
2008-04-11 22:22 ` Dan Magenheimer
0 siblings, 2 replies; 36+ messages in thread
From: Keir Fraser @ 2008-04-11 21:20 UTC (permalink / raw)
To: Dave Winchell
Cc: Tian, Kevin, dan.magenheimer@oracle.com,
xen-devel@lists.xensource.com, Ian Pratt
On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
> I turned to the hpet as I became frustrated trying to solve the problem
> in xen with pit.
> One of the solutions proposed to the customer was a max(curr time, last
> time) modification to Linux.
> They didn't want that.
> (Keir, what Linux version are you looking at when you say Linux already
> has this modification?)
This is part of our own Xen-specific time patches for Linux.
> I had tried hpet before to solve the time backwards problem and knew it
> was effective.
> But the accuracy of hpet was very poor. When I looked into the hpet I
> was surprised that it was
> based on tsc, as I was tring to get away from tsc. But note, even based
> on tsc the time was not
> going backwards, at least for this simple test case.
Yes, having all the virtual timers based on 'guest TSC' (which really is
basically host TSC + an offset) is not great.
> Its a fairly simple matter to base the hpet on the physical hpet. Its
> easy to share it among guests
> as no one really writes the physical hpet. Offsets are kept in each
> vhpet such that each guest thinks
> he owns the hpet.
This is really no better than basing on Xen system time. Actually it's worse
since most systems don't even expose the HPET, so we can't probe it (without
hacks) and so we can't use it. Xen's system time abstraction, perhaps with
the max(last, curr) addition, is perfectly good enough.
> This goes along with some of the experiences
> Keir has had with drift, I think. I'm not sure why this happens - can
> the hpet hardware be that poor in quality?
It does appear to be, and I have no idea why.
> There are three factors that give hpet its great accuracy, in my opinion.
> 1) The hardware is very stable.
> 2) There is only one of them in the system, not one per cpu.
> 3) The Linux implementation for clock and hpet is very clean. It
> calculates missed ticks and offsets without
> including the interrupt delay.
Encouraging the guest to use HPET makes sense. It's a nice wide counter
which hence does not have the wrap issues of the 16-bit PIT counters. Also
in some cases the guest OS interface to the HPET is saner (for our purposes
at least) than the equivalent code to interface to PIT/TSC. This doesn't
mean it has to be plumbed right down to the physical HPET. HVM time sources
can be fixed for drift by moving them away from guest/host TSC and onto the
Xen system time abstraction.
-- Keir
> Items 2 and 3 here are important factors in why the time stays
> monotonic. Another reason is that
> gettimeofday reads the hpet main counter for extrapolation, eliminating
> extrapolation error since
> the same counter is the sole determinator for the next interrupt time
> stamp. Furthermore, Linux can take the
> clock interrupt on any processor and the monotonicity is preserved
> because of item 2.
>
> Thanks for reading this far!
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: System time monotonicity
2008-04-11 21:20 ` Keir Fraser
@ 2008-04-11 21:41 ` Keir Fraser
2008-04-11 22:58 ` Dave Winchell
2008-04-11 22:22 ` Dan Magenheimer
1 sibling, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-11 21:41 UTC (permalink / raw)
To: Dave Winchell
Cc: Tian, Kevin, dan.magenheimer@oracle.com,
xen-devel@lists.xensource.com, Ian Pratt
On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>> Its a fairly simple matter to base the hpet on the physical hpet. Its
>> easy to share it among guests
>> as no one really writes the physical hpet. Offsets are kept in each
>> vhpet such that each guest thinks
>> he owns the hpet.
>
> This is really no better than basing on Xen system time. Actually it's worse
> since most systems don't even expose the HPET, so we can't probe it (without
> hacks) and so we can't use it. Xen's system time abstraction, perhaps with the
> max(last, curr) addition, is perfectly good enough.
Just to labour the point some more: If you still believe that diving to the
real platform timer on every guest time access is measurably more accurate,
you can cleanly prove that by building on Xen's system time abstraction, and
then switch between using get_s_time() (aka NOW()) and
read_platform_stime(). The latter calculates current system time by reading
from the platform timer *right now*. It's the function that all local CPUs
calibrate to once per second.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-11 21:41 ` Keir Fraser
@ 2008-04-11 22:58 ` Dave Winchell
2008-04-12 7:09 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dave Winchell @ 2008-04-11 22:58 UTC (permalink / raw)
To: Keir Fraser
Cc: Tian, Kevin, dan.magenheimer, xen-devel, Dave Winchell, Ian Pratt
[-- Attachment #1.1: Type: text/plain, Size: 1682 bytes --]
Hi Keir,
Your suggestion below is a good one. I'll give it a try and let you know.
(I thought most systems did expose an hpet, at least modern ones.
All the systems I use expose it.
My code defaults back to the tsc way of doing things when no hpet is
detected.)
Regards,
Dave
________________________________
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Fri 4/11/2008 5:41 PM
To: Dave Winchell
Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin; xen-devel@lists.xensource.com
Subject: Re: [xen-devel] System time monotonicity
On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>> Its a fairly simple matter to base the hpet on the physical hpet. Its
>> easy to share it among guests
>> as no one really writes the physical hpet. Offsets are kept in each
>> vhpet such that each guest thinks
>> he owns the hpet.
>
> This is really no better than basing on Xen system time. Actually it's worse
> since most systems don't even expose the HPET, so we can't probe it (without
> hacks) and so we can't use it. Xen's system time abstraction, perhaps with the
> max(last, curr) addition, is perfectly good enough.
Just to labour the point some more: If you still believe that diving to the
real platform timer on every guest time access is measurably more accurate,
you can cleanly prove that by building on Xen's system time abstraction, and
then switch between using get_s_time() (aka NOW()) and
read_platform_stime(). The latter calculates current system time by reading
from the platform timer *right now*. It's the function that all local CPUs
calibrate to once per second.
-- Keir
[-- Attachment #1.2: Type: text/html, Size: 2763 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-11 22:58 ` Dave Winchell
@ 2008-04-12 7:09 ` Keir Fraser
2008-04-21 19:26 ` Dan Magenheimer
0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2008-04-12 7:09 UTC (permalink / raw)
To: Dave Winchell; +Cc: Tian, Kevin, dan.magenheimer, xen-devel, Ian Pratt
[-- Attachment #1.1: Type: text/plain, Size: 637 bytes --]
On 11/4/08 23:58, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
> Your suggestion below is a good one. I'll give it a try and let you know.
>
> (I thought most systems did expose an hpet, at least modern ones.
> All the systems I use expose it.
> My code defaults back to the tsc way of doing things when no hpet is
> detected.)
It used to be deliberately not exposed because Windows had problems using it
in some cases, iirc. If it¹s no longer getting hidden in newer systems then
that¹s a good thing.
Anyhow, yes, please do switch over to system time. I wouldn¹t take a
HPET-vs-TSC patch anyway.
-- Keir
[-- Attachment #1.2: Type: text/html, Size: 1136 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-12 7:09 ` Keir Fraser
@ 2008-04-21 19:26 ` Dan Magenheimer
2008-04-21 19:31 ` Keir Fraser
0 siblings, 1 reply; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-21 19:26 UTC (permalink / raw)
To: Keir Fraser, Dave Winchell
Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin
> From: Dave Winchell [mailto:dwinchell@virtualiron.com]
> Dan asked that I measure the cost of accessing the hpet. My
> first set of
> measurements indicated that
> for 99.8% of the samples taken the cost is less than 1/2 usec, or 500
> cycles on my machine.
> This is about the cost of a single vmexit, for reference. The cost
> includes the cost
> of taking a spinlock. This cost also includes the overhead of
> one or two
> rdtsc instructions for the measurement.
> I'm still working with Dan on these measurements as he sees (much)
> higher costs measured from Linux.
FYI, I did a more precise measurement of reading hpet (and pit)
on my machine by modifying the kernel and recording rdtscll
differences around the platform timer reads (as well as the
entire function calls which compares better against my prior
measurements using systemtap). For HPET reads, about 86% of
the (100000) samples were between 4K and 8K cycles and about
13% were between 16K and 32K cycles. For PIT reads, all reads
are between 64K and 128K cycles. (I also measured the rdtscll
calls by putting two calls back-to-back... 90% of them were
between 128-255 cycles and 10% between 64-127 cycles.)
Since my system is 3.0 Ghz, HPET read averages on the order of
3 usec, which is what my previous measure showed.
I suspect that Dave's measurements and mine are "both right"
and that read overhead of HPET varies on different systems.
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> It used to be deliberately not exposed because Windows had problems
> using it in some cases, iirc. If it's no longer getting hidden in
> newer systems then that's a good thing.
On my box (and several others at Oracle), HPET is present but
disabled by default in the BIOS.
Dan
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: System time monotonicity
2008-04-21 19:26 ` Dan Magenheimer
@ 2008-04-21 19:31 ` Keir Fraser
0 siblings, 0 replies; 36+ messages in thread
From: Keir Fraser @ 2008-04-21 19:31 UTC (permalink / raw)
To: dan.magenheimer@oracle.com, Dave Winchell
Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin
On 21/4/08 20:26, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> Since my system is 3.0 Ghz, HPET read averages on the order of
> 3 usec, which is what my previous measure showed.
>
> I suspect that Dave's measurements and mine are "both right"
> and that read overhead of HPET varies on different systems.
That would make sense. HPET accesses have to go all the way to the
southbridge, and that's bound to vary a lot between chipsets let alone
across the very different interconnect topologies of AMD vs Intel.
-- Keir
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: System time monotonicity
2008-04-11 21:20 ` Keir Fraser
2008-04-11 21:41 ` Keir Fraser
@ 2008-04-11 22:22 ` Dan Magenheimer
1 sibling, 0 replies; 36+ messages in thread
From: Dan Magenheimer @ 2008-04-11 22:22 UTC (permalink / raw)
To: Keir Fraser, Dave Winchell
Cc: Ian Pratt, xen-devel@lists.xensource.com, Tian, Kevin
IMHO, this all comes down to how bad the tsc drift gets
between calibrations. If this is agreed, let me propose
the following: I see local_time_calibration() has
some old printk's. I propose re-enabling them (with
some rate-limiting) to record to a log how bad the
skew gets. Then we can request feedback from anyone
running xen-unstable (and maybe xen-3.1-latest and
xen-3.2-latest also) so we can "measure" a broad set of
machines.
One nice approach to rate-limit is to printk each time
the value exceeds the next higher power of two. Even
though this printk gets output for each processor,
I'd think this would be low overhead and sufficient
information for our needs.
A nice touch would be to include a little script to
run in domain0 that collects the cpuinfo, the relevant
log lines, and emails (or UDPs?) them to a pre-set address
(which is viewable via http). And maybe a once-per-log
printk that says "please run this script".
Even tied to a boot parameter, this would be better
than the lack-of-information we have now.
Dan
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Friday, April 11, 2008 3:21 PM
> To: Dave Winchell
> Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin;
> xen-devel@lists.xensource.com
> Subject: Re: [xen-devel] System time monotonicity
>
>
> On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:
>
> > I turned to the hpet as I became frustrated trying to solve
> the problem
> > in xen with pit.
> > One of the solutions proposed to the customer was a
> max(curr time, last
> > time) modification to Linux.
> > They didn't want that.
> > (Keir, what Linux version are you looking at when you say
> Linux already
> > has this modification?)
>
> This is part of our own Xen-specific time patches for Linux.
>
> > I had tried hpet before to solve the time backwards problem
> and knew it
> > was effective.
> > But the accuracy of hpet was very poor. When I looked into
> the hpet I
> > was surprised that it was
> > based on tsc, as I was tring to get away from tsc. But
> note, even based
> > on tsc the time was not
> > going backwards, at least for this simple test case.
>
> Yes, having all the virtual timers based on 'guest TSC'
> (which really is
> basically host TSC + an offset) is not great.
>
> > Its a fairly simple matter to base the hpet on the physical
> hpet. Its
> > easy to share it among guests
> > as no one really writes the physical hpet. Offsets are kept in each
> > vhpet such that each guest thinks
> > he owns the hpet.
>
> This is really no better than basing on Xen system time.
> Actually it's worse
> since most systems don't even expose the HPET, so we can't
> probe it (without
> hacks) and so we can't use it. Xen's system time abstraction,
> perhaps with
> the max(last, curr) addition, is perfectly good enough.
>
> > This goes along with some of the experiences
> > Keir has had with drift, I think. I'm not sure why this
> happens - can
> > the hpet hardware be that poor in quality?
>
> It does appear to be, and I have no idea why.
>
> > There are three factors that give hpet its great accuracy,
> in my opinion.
> > 1) The hardware is very stable.
> > 2) There is only one of them in the system, not one per cpu.
> > 3) The Linux implementation for clock and hpet is very clean. It
> > calculates missed ticks and offsets without
> > including the interrupt delay.
>
> Encouraging the guest to use HPET makes sense. It's a nice
> wide counter
> which hence does not have the wrap issues of the 16-bit PIT
> counters. Also
> in some cases the guest OS interface to the HPET is saner
> (for our purposes
> at least) than the equivalent code to interface to PIT/TSC.
> This doesn't
> mean it has to be plumbed right down to the physical HPET.
> HVM time sources
> can be fixed for drift by moving them away from guest/host
> TSC and onto the
> Xen system time abstraction.
>
> -- Keir
>
> > Items 2 and 3 here are important factors in why the time stays
> > monotonic. Another reason is that
> > gettimeofday reads the hpet main counter for extrapolation,
> eliminating
> > extrapolation error since
> > the same counter is the sole determinator for the next
> interrupt time
> > stamp. Furthermore, Linux can take the
> > clock interrupt on any processor and the monotonicity is preserved
> > because of item 2.
> >
> > Thanks for reading this far!
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2008-04-21 19:31 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-26 18:23 System time monotonicity John Levon
2007-03-26 18:47 ` Keir Fraser
2007-03-26 20:04 ` John Levon
2007-03-27 10:47 ` Keir Fraser
2007-04-03 14:03 ` John Levon
2007-03-26 18:50 ` Ian Pratt
2007-03-26 18:59 ` Keir Fraser
2007-03-26 20:14 ` John Levon
2007-03-26 21:55 ` Ian Pratt
2007-03-27 0:27 ` Keir Fraser
-- strict thread matches above, loose matches on Subject: below --
2007-04-03 14:36 Ian Pratt
2007-04-03 14:57 ` John Levon
2007-04-03 17:51 Ian Pratt
2008-04-08 16:34 Dan Magenheimer
2008-04-08 16:42 ` Keir Fraser
2008-04-08 17:39 ` Dan Magenheimer
2008-04-09 1:16 ` Tian, Kevin
2008-04-09 1:55 ` Dan Magenheimer
2008-04-09 3:20 ` Tian, Kevin
2008-04-09 12:42 ` Ian Pratt
2008-04-09 14:25 ` Dan Magenheimer
2008-04-09 14:41 ` Keir Fraser
2008-04-09 16:33 ` Dan Magenheimer
2008-04-09 16:40 ` Keir Fraser
2008-04-09 18:36 ` Dan Magenheimer
2008-04-10 7:08 ` Keir Fraser
2008-04-10 21:27 ` Dan Magenheimer
2008-04-11 6:48 ` Keir Fraser
2008-04-11 22:05 ` Dan Magenheimer
[not found] <47FFC37A.4060402@virtualiron.com>
2008-04-11 21:20 ` Keir Fraser
2008-04-11 21:41 ` Keir Fraser
2008-04-11 22:58 ` Dave Winchell
2008-04-12 7:09 ` Keir Fraser
2008-04-21 19:26 ` Dan Magenheimer
2008-04-21 19:31 ` Keir Fraser
2008-04-11 22:22 ` Dan Magenheimer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.