Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
@ 2010-10-06 11:16 Mark Adams
  2010-10-06 12:20 ` Clock jumped 50 minutes in dom0 caused incorrect 2008R2 " James Harper
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Mark Adams @ 2010-10-06 11:16 UTC (permalink / raw)
  To: xen-devel

Hi Xen-Devel's

Please see my note below regarding a serious issue where my clock jumped
in dom0. I'm sending this through to the devel list as I haven't managed
to glean any clear help from xen-users and the debian bug team are
unsure what could have caused this.

Can you confirm if the kernel or xen controls the clock in dom0? I also
understand that this could be an underlying hardware issue but I have
another system on exactly the same hardware which hasn't had this occur.

Any advice on how to investigate further or ensure better clock
stability across dom0 and domU would be appreciated. 

Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
the time moves this much? My guess is that the domU crashed when the
time changed, and was thus rebooted automatically. Strangely the Windows
2003 server didn't get rebooted.

If you need any more info to help please let me know.

Thanks,
Mark

On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
> > Hi All,
> > 
> > Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.
> > Today I noticed (when kerberos to the domain controllers stopped
> > working..) that the clock was 50 minutes out in dom0 -- This caused the
> > HVM windows domain controllers to have the wrong time.
> > 
> > I'm not sure if this is a kernel issue or a xen issue, but the only
> > thing related is I can see the following in the kernel log:
> > 
> > Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
> > 
> > But I also see in the dmesg log that xen is using it's own clock.
> > 
> > [    7.676563] Switching to clocksource xen
> > 
> > I can't identify anything else in the logs to indicate when the time
> > might have changed. I have a few other dom0 at the same level that
> > haven't decided to change the time.
> > 
> > Can anyone confirm whether xen controls the time or the kernel? Also
> > when I corrected the time in dom0 it was still wrong in HVM domU -- How
> > long does it take for this to propogate? (I rebooted the VM's to correct
> > it immediately).
> > 
> > Any other pointers on how to ensure stability of clocks from dom0 to
> > domU HVM hosts (and pv for that matter..) would be appreciated.
> 
> Some further info on this, It appears the HVM domU (windows server 2008)
> unexpectedly shut down at 18:51, after the unstable clocksource error.
> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
> xend.log shows a reboot 
> 
> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has shutdown: name=ha-dc1 id=2 reason=reboot.
> 
> This is like someone issuing "xm reboot domain" is it not? Is it
> possible that xen could have issued this reboot itself due to a crash? I
> can't see any crash logs.
> 
> Cheers,
> Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Clock jumped 50 minutes in dom0 caused incorrect 2008R2 domU time
  2010-10-06 11:16 Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time Mark Adams
@ 2010-10-06 12:20 ` James Harper
  2010-10-06 12:24 ` James Harper
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: James Harper @ 2010-10-06 12:20 UTC (permalink / raw)
  To: Mark Adams, xen-devel

> Hi Xen-Devel's
> 
> Please see my note below regarding a serious issue where my clock
jumped
> in dom0. I'm sending this through to the devel list as I haven't
managed
> to glean any clear help from xen-users and the debian bug team are
> unsure what could have caused this.
> 
> Can you confirm if the kernel or xen controls the clock in dom0? I
also
> understand that this could be an underlying hardware issue but I have
> another system on exactly the same hardware which hasn't had this
occur.
> 
> Any advice on how to investigate further or ensure better clock
> stability across dom0 and domU would be appreciated.
> 
> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
> the time moves this much? My guess is that the domU crashed when the
> time changed, and was thus rebooted automatically. Strangely the
Windows
> 2003 server didn't get rebooted.
> 
> If you need any more info to help please let me know.
> 

Does DomU have more than one CPU? And if so, is viridian=1 specified in
the config? That relates to the crash, not the time jump though.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Clock jumped 50 minutes in dom0 caused incorrect 2008R2 domU time
  2010-10-06 11:16 Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time Mark Adams
  2010-10-06 12:20 ` Clock jumped 50 minutes in dom0 caused incorrect 2008R2 " James Harper
@ 2010-10-06 12:24 ` James Harper
  2010-10-06 13:04   ` Mark Adams
  2010-10-06 15:41 ` Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 " Jeremy Fitzhardinge
       [not found] ` <AANLkTinDMfrR5u2k3kPJfJ9Z+op53v6ziEYnLEO03FkG@mail.gmail.com>
  3 siblings, 1 reply; 20+ messages in thread
From: James Harper @ 2010-10-06 12:24 UTC (permalink / raw)
  To: Mark Adams, xen-devel

> > > Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc
unstable
> (delta = -2999660303788 ns)

This is a longshot, but it's daylight savings time of year. We just
moved from +10 to +11 hours at around 2am on Oct 3. I'm sure it's just a
coincidence as I can't see how changing a timezone offset could cause
such a problem but I've seen a few strange things this week with clocks
getting out of whack.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008R2 domU time
  2010-10-06 12:24 ` James Harper
@ 2010-10-06 13:04   ` Mark Adams
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Adams @ 2010-10-06 13:04 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hi James,

It hasn't been clock change time here in the UK yet. I know what you
mean though, I have also had issues in the past around this time.
In this instance though it only changed 50 minutes?

I don't have viridian=1 set in the hosts, I do have localhost=1 to keep
the time the same as the dom0. The 2008 R2 domU seem to be operating fine
without this setting until this clock issue, what does the viridian
setting achieve?

Regards,
Mark

On Wed, Oct 06, 2010 at 11:24:27PM +1100, James Harper wrote:
> > > > Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc
> unstable
> > (delta = -2999660303788 ns)
> 
> This is a longshot, but it's daylight savings time of year. We just
> moved from +10 to +11 hours at around 2am on Oct 3. I'm sure it's just a
> coincidence as I can't see how changing a timezone offset could cause
> such a problem but I've seen a few strange things this week with clocks
> getting out of whack.
> 
> James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-06 11:16 Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time Mark Adams
  2010-10-06 12:20 ` Clock jumped 50 minutes in dom0 caused incorrect 2008R2 " James Harper
  2010-10-06 12:24 ` James Harper
@ 2010-10-06 15:41 ` Jeremy Fitzhardinge
  2010-10-06 16:15   ` Mark Adams
       [not found] ` <AANLkTinDMfrR5u2k3kPJfJ9Z+op53v6ziEYnLEO03FkG@mail.gmail.com>
  3 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-06 15:41 UTC (permalink / raw)
  To: Mark Adams; +Cc: xen-devel

 On 10/06/2010 04:16 AM, Mark Adams wrote:
> Hi Xen-Devel's
>
> Please see my note below regarding a serious issue where my clock jumped
> in dom0. I'm sending this through to the devel list as I haven't managed
> to glean any clear help from xen-users and the debian bug team are
> unsure what could have caused this.
>
> Can you confirm if the kernel or xen controls the clock in dom0? I also
> understand that this could be an underlying hardware issue but I have
> another system on exactly the same hardware which hasn't had this occur.

The kernel manages its own time, but it uses the Xen system clock as its
timebase.  If the Xen system clock is unstable for some reason, then it
will affect the kernel's timekeeping.

Nothing should be using the tsc clocksource, so I'm not sure why its
reporting any kinds of messages.  No PV Xen domain can expect the raw
tsc to be stable.

But the tsc is the basis for the Xen clocksource, and if the tsc is
unstable in unexpected ways then it can affect Xen timekeeping.  This
can be caused by certain power management modes.

> Any advice on how to investigate further or ensure better clock
> stability across dom0 and domU would be appreciated. 

What type of system is it?  How many CPUs?  What CPU vendor?

> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
> the time moves this much? My guess is that the domU crashed when the
> time changed, and was thus rebooted automatically. Strangely the Windows
> 2003 server didn't get rebooted.

I don't think there would be any direct connection between the dom0 time
jump and Windows dying, but if the CPU's tsc and/or Xen's timekeeping is
unstable, then Windows might also see a similar time jump and react badly.

    J

> If you need any more info to help please let me know.
>
> Thanks,
> Mark
>
> On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
>> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
>>> Hi All,
>>>
>>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.
>>> Today I noticed (when kerberos to the domain controllers stopped
>>> working..) that the clock was 50 minutes out in dom0 -- This caused the
>>> HVM windows domain controllers to have the wrong time.
>>>
>>> I'm not sure if this is a kernel issue or a xen issue, but the only
>>> thing related is I can see the following in the kernel log:
>>>
>>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
>>>
>>> But I also see in the dmesg log that xen is using it's own clock.
>>>
>>> [    7.676563] Switching to clocksource xen
>>>
>>> I can't identify anything else in the logs to indicate when the time
>>> might have changed. I have a few other dom0 at the same level that
>>> haven't decided to change the time.
>>>
>>> Can anyone confirm whether xen controls the time or the kernel? Also
>>> when I corrected the time in dom0 it was still wrong in HVM domU -- How
>>> long does it take for this to propogate? (I rebooted the VM's to correct
>>> it immediately).
>>>
>>> Any other pointers on how to ensure stability of clocks from dom0 to
>>> domU HVM hosts (and pv for that matter..) would be appreciated.
>> Some further info on this, It appears the HVM domU (windows server 2008)
>> unexpectedly shut down at 18:51, after the unstable clocksource error.
>> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
>> xend.log shows a reboot 
>>
>> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has shutdown: name=ha-dc1 id=2 reason=reboot.
>>
>> This is like someone issuing "xm reboot domain" is it not? Is it
>> possible that xen could have issued this reboot itself due to a crash? I
>> can't see any crash logs.
>>
>> Cheers,
>> Mark
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-06 15:41 ` Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 " Jeremy Fitzhardinge
@ 2010-10-06 16:15   ` Mark Adams
  2010-10-06 16:23     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Adams @ 2010-10-06 16:15 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Wed, Oct 06, 2010 at 08:41:51AM -0700, Jeremy Fitzhardinge wrote:
>  On 10/06/2010 04:16 AM, Mark Adams wrote:
> > Hi Xen-Devel's
> >
> > Please see my note below regarding a serious issue where my clock jumped
> > in dom0. I'm sending this through to the devel list as I haven't managed
> > to glean any clear help from xen-users and the debian bug team are
> > unsure what could have caused this.
> >
> > Can you confirm if the kernel or xen controls the clock in dom0? I also
> > understand that this could be an underlying hardware issue but I have
> > another system on exactly the same hardware which hasn't had this occur.
> 
> The kernel manages its own time, but it uses the Xen system clock as its
> timebase.  If the Xen system clock is unstable for some reason, then it
> will affect the kernel's timekeeping.
> 
> Nothing should be using the tsc clocksource, so I'm not sure why its
> reporting any kinds of messages.  No PV Xen domain can expect the raw
> tsc to be stable.

The message was reported in dom0, not domU.

> 
> But the tsc is the basis for the Xen clocksource, and if the tsc is
> unstable in unexpected ways then it can affect Xen timekeeping.  This
> can be caused by certain power management modes.
> 
> > Any advice on how to investigate further or ensure better clock
> > stability across dom0 and domU would be appreciated. 
> 
> What type of system is it?  How many CPUs?  What CPU vendor?

It is a Tyan S7010AGM2NRF with 2 intel quad core Xeon E5620 CPU's.

Thanks,
Mark

> 
> > Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
> > the time moves this much? My guess is that the domU crashed when the
> > time changed, and was thus rebooted automatically. Strangely the Windows
> > 2003 server didn't get rebooted.
> 
> I don't think there would be any direct connection between the dom0 time
> jump and Windows dying, but if the CPU's tsc and/or Xen's timekeeping is
> unstable, then Windows might also see a similar time jump and react badly.
> 
>     J
> 
> > If you need any more info to help please let me know.
> >
> > Thanks,
> > Mark
> >
> > On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
> >> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
> >>> Hi All,
> >>>
> >>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.
> >>> Today I noticed (when kerberos to the domain controllers stopped
> >>> working..) that the clock was 50 minutes out in dom0 -- This caused the
> >>> HVM windows domain controllers to have the wrong time.
> >>>
> >>> I'm not sure if this is a kernel issue or a xen issue, but the only
> >>> thing related is I can see the following in the kernel log:
> >>>
> >>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
> >>>
> >>> But I also see in the dmesg log that xen is using it's own clock.
> >>>
> >>> [    7.676563] Switching to clocksource xen
> >>>
> >>> I can't identify anything else in the logs to indicate when the time
> >>> might have changed. I have a few other dom0 at the same level that
> >>> haven't decided to change the time.
> >>>
> >>> Can anyone confirm whether xen controls the time or the kernel? Also
> >>> when I corrected the time in dom0 it was still wrong in HVM domU -- How
> >>> long does it take for this to propogate? (I rebooted the VM's to correct
> >>> it immediately).
> >>>
> >>> Any other pointers on how to ensure stability of clocks from dom0 to
> >>> domU HVM hosts (and pv for that matter..) would be appreciated.
> >> Some further info on this, It appears the HVM domU (windows server 2008)
> >> unexpectedly shut down at 18:51, after the unstable clocksource error.
> >> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
> >> xend.log shows a reboot 
> >>
> >> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has shutdown: name=ha-dc1 id=2 reason=reboot.
> >>
> >> This is like someone issuing "xm reboot domain" is it not? Is it
> >> possible that xen could have issued this reboot itself due to a crash? I
> >> can't see any crash logs.
> >>
> >> Cheers,
> >> Mark
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-06 16:15   ` Mark Adams
@ 2010-10-06 16:23     ` Jeremy Fitzhardinge
  2010-10-07 14:04       ` Dan Magenheimer
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-06 16:23 UTC (permalink / raw)
  To: Mark Adams; +Cc: Dan Magenheimer, xen-devel

 On 10/06/2010 09:15 AM, Mark Adams wrote:
> On Wed, Oct 06, 2010 at 08:41:51AM -0700, Jeremy Fitzhardinge wrote:
>>  On 10/06/2010 04:16 AM, Mark Adams wrote:
>>> Hi Xen-Devel's
>>>
>>> Please see my note below regarding a serious issue where my clock jumped
>>> in dom0. I'm sending this through to the devel list as I haven't managed
>>> to glean any clear help from xen-users and the debian bug team are
>>> unsure what could have caused this.
>>>
>>> Can you confirm if the kernel or xen controls the clock in dom0? I also
>>> understand that this could be an underlying hardware issue but I have
>>> another system on exactly the same hardware which hasn't had this occur.
>> The kernel manages its own time, but it uses the Xen system clock as its
>> timebase.  If the Xen system clock is unstable for some reason, then it
>> will affect the kernel's timekeeping.
>>
>> Nothing should be using the tsc clocksource, so I'm not sure why its
>> reporting any kinds of messages.  No PV Xen domain can expect the raw
>> tsc to be stable.
> The message was reported in dom0, not domU.

Dom0 is a normal PV domain.  It just has a few more privileges than a
regular domU.

>> But the tsc is the basis for the Xen clocksource, and if the tsc is
>> unstable in unexpected ways then it can affect Xen timekeeping.  This
>> can be caused by certain power management modes.
>>
>>> Any advice on how to investigate further or ensure better clock
>>> stability across dom0 and domU would be appreciated. 
>> What type of system is it?  How many CPUs?  What CPU vendor?
> It is a Tyan S7010AGM2NRF with 2 intel quad core Xeon E5620 CPU's.

I forget all the magic options that can affect timekeeping (cc:d Dan,
since this stuff is close to his heart).

    J

> Thanks,
> Mark
>
>>> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
>>> the time moves this much? My guess is that the domU crashed when the
>>> time changed, and was thus rebooted automatically. Strangely the Windows
>>> 2003 server didn't get rebooted.
>> I don't think there would be any direct connection between the dom0 time
>> jump and Windows dying, but if the CPU's tsc and/or Xen's timekeeping is
>> unstable, then Windows might also see a similar time jump and react badly.
>>
>>     J
>>
>>> If you need any more info to help please let me know.
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
>>>> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
>>>>> Hi All,
>>>>>
>>>>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.
>>>>> Today I noticed (when kerberos to the domain controllers stopped
>>>>> working..) that the clock was 50 minutes out in dom0 -- This caused the
>>>>> HVM windows domain controllers to have the wrong time.
>>>>>
>>>>> I'm not sure if this is a kernel issue or a xen issue, but the only
>>>>> thing related is I can see the following in the kernel log:
>>>>>
>>>>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
>>>>>
>>>>> But I also see in the dmesg log that xen is using it's own clock.
>>>>>
>>>>> [    7.676563] Switching to clocksource xen
>>>>>
>>>>> I can't identify anything else in the logs to indicate when the time
>>>>> might have changed. I have a few other dom0 at the same level that
>>>>> haven't decided to change the time.
>>>>>
>>>>> Can anyone confirm whether xen controls the time or the kernel? Also
>>>>> when I corrected the time in dom0 it was still wrong in HVM domU -- How
>>>>> long does it take for this to propogate? (I rebooted the VM's to correct
>>>>> it immediately).
>>>>>
>>>>> Any other pointers on how to ensure stability of clocks from dom0 to
>>>>> domU HVM hosts (and pv for that matter..) would be appreciated.
>>>> Some further info on this, It appears the HVM domU (windows server 2008)
>>>> unexpectedly shut down at 18:51, after the unstable clocksource error.
>>>> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
>>>> xend.log shows a reboot 
>>>>
>>>> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has shutdown: name=ha-dc1 id=2 reason=reboot.
>>>>
>>>> This is like someone issuing "xm reboot domain" is it not? Is it
>>>> possible that xen could have issued this reboot itself due to a crash? I
>>>> can't see any crash logs.
>>>>
>>>> Cheers,
>>>> Mark
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-06 16:23     ` Jeremy Fitzhardinge
@ 2010-10-07 14:04       ` Dan Magenheimer
  2010-10-26  9:22         ` Mark Adams
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Magenheimer @ 2010-10-07 14:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Mark Adams; +Cc: xen-devel

Hi Jeremy and Mark --

Oddly, I saw that "clocksource tsc unstable" message myself
on a busy 2.6.36-rc5 PV domain yesterday.  While it is possible
that this reflects a hardware problem, the fact that you
saw it on a Nehalem+ Intel processor makes it very unlikely.
The "s" and "t" debug keys (the output of which can be seen via
"xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
the problem if it is indeed a hardware problem or BIOS
problem or the result of a CPU hot-add... all unlikely.

It IS possible that the code that emulates tsc is broken
somewhere, but I don't think tsc should be emulated by
default for dom0 on a Nehalem+ box... and even if it is,
it is directly based on Xen system time which, if it went
awry, would probably cause major problems.

Looking through the Linux code that prints that message (in
kernel/time/clocksource.c) it appears that the message
appears if the tsc deviates from the "watchdog clocksource",
which in PV domains is "xen" (or more precisely pvclock
I think).  So most likely, this is a symptom of a problem
with pvclock or the watchdog code in the pvops kernel, not
an indicator that the tsc is actually unstable.

Dan

> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Wednesday, October 06, 2010 10:23 AM
> To: Mark Adams
> Cc: xen-devel@lists.xensource.com; Dan Magenheimer
> Subject: Re: [Xen-devel] Clock jumped 50 minutes in dom0 caused
> incorrect 2008 R2 domU time
> 
>  On 10/06/2010 09:15 AM, Mark Adams wrote:
> > On Wed, Oct 06, 2010 at 08:41:51AM -0700, Jeremy Fitzhardinge wrote:
> >>  On 10/06/2010 04:16 AM, Mark Adams wrote:
> >>> Hi Xen-Devel's
> >>>
> >>> Please see my note below regarding a serious issue where my clock
> jumped
> >>> in dom0. I'm sending this through to the devel list as I haven't
> managed
> >>> to glean any clear help from xen-users and the debian bug team are
> >>> unsure what could have caused this.
> >>>
> >>> Can you confirm if the kernel or xen controls the clock in dom0? I
> also
> >>> understand that this could be an underlying hardware issue but I
> have
> >>> another system on exactly the same hardware which hasn't had this
> occur.
> >> The kernel manages its own time, but it uses the Xen system clock as
> its
> >> timebase.  If the Xen system clock is unstable for some reason, then
> it
> >> will affect the kernel's timekeeping.
> >>
> >> Nothing should be using the tsc clocksource, so I'm not sure why its
> >> reporting any kinds of messages.  No PV Xen domain can expect the
> raw
> >> tsc to be stable.
> > The message was reported in dom0, not domU.
> 
> Dom0 is a normal PV domain.  It just has a few more privileges than a
> regular domU.
> 
> >> But the tsc is the basis for the Xen clocksource, and if the tsc is
> >> unstable in unexpected ways then it can affect Xen timekeeping.
> This
> >> can be caused by certain power management modes.
> >>
> >>> Any advice on how to investigate further or ensure better clock
> >>> stability across dom0 and domU would be appreciated.
> >> What type of system is it?  How many CPUs?  What CPU vendor?
> > It is a Tyan S7010AGM2NRF with 2 intel quad core Xeon E5620 CPU's.
> 
> I forget all the magic options that can affect timekeeping (cc:d Dan,
> since this stuff is close to his heart).
> 
>     J
> 
> > Thanks,
> > Mark
> >
> >>> Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU
> if
> >>> the time moves this much? My guess is that the domU crashed when
> the
> >>> time changed, and was thus rebooted automatically. Strangely the
> Windows
> >>> 2003 server didn't get rebooted.
> >> I don't think there would be any direct connection between the dom0
> time
> >> jump and Windows dying, but if the CPU's tsc and/or Xen's
> timekeeping is
> >> unstable, then Windows might also see a similar time jump and react
> badly.
> >>
> >>     J
> >>
> >>> If you need any more info to help please let me know.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>> On Mon, Oct 04, 2010 at 01:00:51PM +0100, Mark Adams wrote:
> >>>> On Mon, Oct 04, 2010 at 11:01:10AM +0100, Mark Adams wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21
> kernel.
> >>>>> Today I noticed (when kerberos to the domain controllers stopped
> >>>>> working..) that the clock was 50 minutes out in dom0 -- This
> caused the
> >>>>> HVM windows domain controllers to have the wrong time.
> >>>>>
> >>>>> I'm not sure if this is a kernel issue or a xen issue, but the
> only
> >>>>> thing related is I can see the following in the kernel log:
> >>>>>
> >>>>> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc
> unstable (delta = -2999660303788 ns)
> >>>>>
> >>>>> But I also see in the dmesg log that xen is using it's own clock.
> >>>>>
> >>>>> [    7.676563] Switching to clocksource xen
> >>>>>
> >>>>> I can't identify anything else in the logs to indicate when the
> time
> >>>>> might have changed. I have a few other dom0 at the same level
> that
> >>>>> haven't decided to change the time.
> >>>>>
> >>>>> Can anyone confirm whether xen controls the time or the kernel?
> Also
> >>>>> when I corrected the time in dom0 it was still wrong in HVM domU
> -- How
> >>>>> long does it take for this to propogate? (I rebooted the VM's to
> correct
> >>>>> it immediately).
> >>>>>
> >>>>> Any other pointers on how to ensure stability of clocks from dom0
> to
> >>>>> domU HVM hosts (and pv for that matter..) would be appreciated.
> >>>> Some further info on this, It appears the HVM domU (windows server
> 2008)
> >>>> unexpectedly shut down at 18:51, after the unstable clocksource
> error.
> >>>> qemu-dm logs show a reset "reset requested in cpu_handle_ioreq."
> and
> >>>> xend.log shows a reboot
> >>>>
> >>>> [2010-10-02 18:51:03 1759] INFO (XendDomainInfo:2088) Domain has
> shutdown: name=ha-dc1 id=2 reason=reboot.
> >>>>
> >>>> This is like someone issuing "xm reboot domain" is it not? Is it
> >>>> possible that xen could have issued this reboot itself due to a
> crash? I
> >>>> can't see any crash logs.
> >>>>
> >>>> Cheers,
> >>>> Mark
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
> >>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-07 14:04       ` Dan Magenheimer
@ 2010-10-26  9:22         ` Mark Adams
  2010-10-26 17:03           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Adams @ 2010-10-26  9:22 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Jeremy Fitzhardinge, xen-devel

On Thu, Oct 07, 2010 at 07:04:18AM -0700, Dan Magenheimer wrote:
> Hi Jeremy and Mark --
> 
> Oddly, I saw that "clocksource tsc unstable" message myself
> on a busy 2.6.36-rc5 PV domain yesterday.  While it is possible
> that this reflects a hardware problem, the fact that you
> saw it on a Nehalem+ Intel processor makes it very unlikely.
> The "s" and "t" debug keys (the output of which can be seen via
> "xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
> the problem if it is indeed a hardware problem or BIOS
> problem or the result of a CPU hot-add... all unlikely.
> 
> It IS possible that the code that emulates tsc is broken
> somewhere, but I don't think tsc should be emulated by
> default for dom0 on a Nehalem+ box... and even if it is,
> it is directly based on Xen system time which, if it went
> awry, would probably cause major problems.
> 
> Looking through the Linux code that prints that message (in
> kernel/time/clocksource.c) it appears that the message
> appears if the tsc deviates from the "watchdog clocksource",
> which in PV domains is "xen" (or more precisely pvclock
> I think).  So most likely, this is a symptom of a problem
> with pvclock or the watchdog code in the pvops kernel, not
> an indicator that the tsc is actually unstable.
> 
> Dan

Is there any more information I can provide to help with debugging this?
We haven't had the problem since. It could just be a coincidence but it
happened around the time that daylight savings occurred in the US (we
are in the UK).

Regards,
Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-26  9:22         ` Mark Adams
@ 2010-10-26 17:03           ` Jeremy Fitzhardinge
  2010-10-26 21:54             ` Dan Magenheimer
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-26 17:03 UTC (permalink / raw)
  To: Mark Adams; +Cc: Dan Magenheimer, xen-devel

 On 10/26/2010 02:22 AM, Mark Adams wrote:
> On Thu, Oct 07, 2010 at 07:04:18AM -0700, Dan Magenheimer wrote:
>> Hi Jeremy and Mark --
>>
>> Oddly, I saw that "clocksource tsc unstable" message myself
>> on a busy 2.6.36-rc5 PV domain yesterday.  While it is possible
>> that this reflects a hardware problem, the fact that you
>> saw it on a Nehalem+ Intel processor makes it very unlikely.
>> The "s" and "t" debug keys (the output of which can be seen via
>> "xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
>> the problem if it is indeed a hardware problem or BIOS
>> problem or the result of a CPU hot-add... all unlikely.
>>
>> It IS possible that the code that emulates tsc is broken
>> somewhere, but I don't think tsc should be emulated by
>> default for dom0 on a Nehalem+ box... and even if it is,
>> it is directly based on Xen system time which, if it went
>> awry, would probably cause major problems.
>>
>> Looking through the Linux code that prints that message (in
>> kernel/time/clocksource.c) it appears that the message
>> appears if the tsc deviates from the "watchdog clocksource",
>> which in PV domains is "xen" (or more precisely pvclock
>> I think).  So most likely, this is a symptom of a problem
>> with pvclock or the watchdog code in the pvops kernel, not
>> an indicator that the tsc is actually unstable.
>>
>> Dan
> Is there any more information I can provide to help with debugging this?
> We haven't had the problem since. It could just be a coincidence but it
> happened around the time that daylight savings occurred in the US (we
> are in the UK).

In Linux/Xen it shouldn't have any effect since the clocks are always
maintained in UTC, then timezone details are applied much later in
usermode.  But Windows has a bad habit of setting the hardware RTC to
local time, and mucking about with it for DST changes - but that would
only be relevant if you booted Windows on your host machine (I don't
think there's any way for a Windows guest's time to leak into the
host/dom0's timebase).

Unfortunately these kinds of time problems can be notoriously hard to
pin down and diagnose.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-26 17:03           ` Jeremy Fitzhardinge
@ 2010-10-26 21:54             ` Dan Magenheimer
  2010-10-27 20:29               ` Dan Magenheimer
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Magenheimer @ 2010-10-26 21:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Mark Adams; +Cc: xen-devel

>  On 10/26/2010 02:22 AM, Mark Adams wrote:
> > On Thu, Oct 07, 2010 at 07:04:18AM -0700, Dan Magenheimer wrote:
> >> Hi Jeremy and Mark --
> >>
> >> Oddly, I saw that "clocksource tsc unstable" message myself
> >> on a busy 2.6.36-rc5 PV domain yesterday.  While it is possible
> >> that this reflects a hardware problem, the fact that you
> >> saw it on a Nehalem+ Intel processor makes it very unlikely.
> >> The "s" and "t" debug keys (the output of which can be seen via
> >> "xm debug-key s; xm dmesg | tail" in dom0) can help diagnose
> >> the problem if it is indeed a hardware problem or BIOS
> >> problem or the result of a CPU hot-add... all unlikely.
> >>
> >> It IS possible that the code that emulates tsc is broken
> >> somewhere, but I don't think tsc should be emulated by
> >> default for dom0 on a Nehalem+ box... and even if it is,
> >> it is directly based on Xen system time which, if it went
> >> awry, would probably cause major problems.
> >>
> >> Looking through the Linux code that prints that message (in
> >> kernel/time/clocksource.c) it appears that the message
> >> appears if the tsc deviates from the "watchdog clocksource",
> >> which in PV domains is "xen" (or more precisely pvclock
> >> I think).  So most likely, this is a symptom of a problem
> >> with pvclock or the watchdog code in the pvops kernel, not
> >> an indicator that the tsc is actually unstable.
> >>
> > Is there any more information I can provide to help with debugging
> this?
> > We haven't had the problem since. It could just be a coincidence but
> it
> > happened around the time that daylight savings occurred in the US (we
> > are in the UK).
> 
> In Linux/Xen it shouldn't have any effect since the clocks are always
> maintained in UTC, then timezone details are applied much later in
> usermode.  But Windows has a bad habit of setting the hardware RTC to
> local time, and mucking about with it for DST changes - but that would
> only be relevant if you booted Windows on your host machine (I don't
> think there's any way for a Windows guest's time to leak into the
> host/dom0's timebase).
> 
> Unfortunately these kinds of time problems can be notoriously hard to
> pin down and diagnose.

This seems to occur when one -- or possibly all -- vcpus
are "spinning" for an unexpectedly long period of time.  If so
it may be possible to synthesize some kind of long-but-non-infinite
deadlock in a domU kernel which might reproduce the problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-26 21:54             ` Dan Magenheimer
@ 2010-10-27 20:29               ` Dan Magenheimer
  2011-01-04 17:00                 ` Mark Adams
  0 siblings, 1 reply; 20+ messages in thread
From: Dan Magenheimer @ 2010-10-27 20:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Mark Adams; +Cc: xen-devel

> > Unfortunately these kinds of time problems can be notoriously hard to
> > pin down and diagnose.
> 
> This seems to occur when one -- or possibly all -- vcpus
> are "spinning" for an unexpectedly long period of time.  If so
> it may be possible to synthesize some kind of long-but-non-infinite
> deadlock in a domU kernel which might reproduce the problem.

Saw this on LKML:
http://lkml.org/lkml/2010/10/27/366 

The solution refers to a recent git commit so is unlikely to
be the specific cause for the problem we've seen.  But the
analysis sounds very familiar and also corresponds to my
observation above:  In short, SOMEthing (in the guest kernel
or in Xen) is causing the guest to "go out to lunch" for some
extended period of time, which confuses the watchdog timer,
which disables TSC as a clocksource.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-27 20:29               ` Dan Magenheimer
@ 2011-01-04 17:00                 ` Mark Adams
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Adams @ 2011-01-04 17:00 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Jeremy Fitzhardinge, xen-devel

This has just occurred again on another machine. Is anyone else seeing
this?

On Wed, Oct 27, 2010 at 01:29:55PM -0700, Dan Magenheimer wrote:
> > > Unfortunately these kinds of time problems can be notoriously hard to
> > > pin down and diagnose.
> > 
> > This seems to occur when one -- or possibly all -- vcpus
> > are "spinning" for an unexpectedly long period of time.  If so
> > it may be possible to synthesize some kind of long-but-non-infinite
> > deadlock in a domU kernel which might reproduce the problem.
> 
> Saw this on LKML:
> http://lkml.org/lkml/2010/10/27/366 
> 
> The solution refers to a recent git commit so is unlikely to
> be the specific cause for the problem we've seen.  But the
> analysis sounds very familiar and also corresponds to my
> observation above:  In short, SOMEthing (in the guest kernel
> or in Xen) is causing the guest to "go out to lunch" for some
> extended period of time, which confuses the watchdog timer,
> which disables TSC as a clocksource.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <AANLkTinDMfrR5u2k3kPJfJ9Z+op53v6ziEYnLEO03FkG@mail.gmail.com>]

[parent not found: <20101008100907.GH30044@campbell-lange.net>]

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
       [not found]   ` <20101008100907.GH30044@campbell-lange.net>
@ 2010-10-09  2:15     ` wei song
  2010-10-11 10:10       ` Mark Adams
  0 siblings, 1 reply; 20+ messages in thread
From: wei song @ 2010-10-09  2:15 UTC (permalink / raw)
  To: Mark Adams; +Cc: xen devel


[-- Attachment #1.1: Type: text/plain, Size: 2138 bytes --]

 added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
file.

thanks,
James Song
2010/10/8 Mark Adams <mark@campbell-lange.net>

> Hi,
>
> My hardware is stable otherwise. I haven't had any issues for months
> before, and ever since, and another server with IDENTICAL setup and
> hardware did not have the issue at the same time.
>
> Where should those settings be added?
>
> Regards,
> Mark
>
> On Fri, Oct 08, 2010 at 10:15:59AM +0800, wei song wrote:
> > What's your setting of timer_mode and tsc_mode? If your hardware is not
> > stable, pls try timer_mode =2  and tsc_mode = 1 and viridian=1.
> >
> > -James Song
> > 2010/10/6 Mark Adams <mark@campbell-lange.net>
> >
> > > Hi Xen-Devel's
> > >
> > > Please see my note below regarding a serious issue where my clock
> jumped
> > > in dom0. I'm sending this through to the devel list as I haven't
> managed
> > > to glean any clear help from xen-users and the debian bug team are
> > > unsure what could have caused this.
> > >
> > > Can you confirm if the kernel or xen controls the clock in dom0? I also
> > > understand that this could be an underlying hardware issue but I have
> > > another system on exactly the same hardware which hasn't had this
> occur.
> > >
> > > Any advice on how to investigate further or ensure better clock
> > > stability across dom0 and domU would be appreciated.
> > >
> > > Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
> > > the time moves this much? My guess is that the domU crashed when the
> > > time changed, and was thus rebooted automatically. Strangely the
> Windows
> > > 2003 server didn't get rebooted.
> > >
> > > If you need any more info to help please let me know.
> > >
> > > Thanks,
> > > Mark
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> > >
>
> --
> Mark Adams
> Technical Manager
> mark@campbell-lange.net
> .
> Campbell-Lange Workshop
> www.campbell-lange.net
> 0207 6311 555
> 3 Tottenham Street London W1T 2AF
> Registered in England No. 04551928
>

[-- Attachment #1.2: Type: text/html, Size: 3124 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-09  2:15     ` wei song
@ 2010-10-11 10:10       ` Mark Adams
  2011-01-04 17:09         ` Ian Campbell
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Adams @ 2010-10-11 10:10 UTC (permalink / raw)
  To: wei song; +Cc: xen devel

Hi,

I can try this, but can you please confirm what these options do?

Regards,
Mark

On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
>  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> file.
> 
> thanks,
> James Song
> 2010/10/8 Mark Adams <mark@campbell-lange.net>
> 
> > Hi,
> >
> > My hardware is stable otherwise. I haven't had any issues for months
> > before, and ever since, and another server with IDENTICAL setup and
> > hardware did not have the issue at the same time.
> >
> > Where should those settings be added?
> >
> > Regards,
> > Mark
> >
> > On Fri, Oct 08, 2010 at 10:15:59AM +0800, wei song wrote:
> > > What's your setting of timer_mode and tsc_mode? If your hardware is not
> > > stable, pls try timer_mode =2  and tsc_mode = 1 and viridian=1.
> > >
> > > -James Song
> > > 2010/10/6 Mark Adams <mark@campbell-lange.net>
> > >
> > > > Hi Xen-Devel's
> > > >
> > > > Please see my note below regarding a serious issue where my clock
> > jumped
> > > > in dom0. I'm sending this through to the devel list as I haven't
> > managed
> > > > to glean any clear help from xen-users and the debian bug team are
> > > > unsure what could have caused this.
> > > >
> > > > Can you confirm if the kernel or xen controls the clock in dom0? I also
> > > > understand that this could be an underlying hardware issue but I have
> > > > another system on exactly the same hardware which hasn't had this
> > occur.
> > > >
> > > > Any advice on how to investigate further or ensure better clock
> > > > stability across dom0 and domU would be appreciated.
> > > >
> > > > Also is it correct behaviour for Xen to reboot an 2008 R2 HVM domU if
> > > > the time moves this much? My guess is that the domU crashed when the
> > > > time changed, and was thus rebooted automatically. Strangely the
> > Windows
> > > > 2003 server didn't get rebooted.
> > > >
> > > > If you need any more info to help please let me know.
> > > >
> > > > Thanks,
> > > > Mark
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.xensource.com
> > > > http://lists.xensource.com/xen-devel
> > > >
> >
> > --
> > Mark Adams
> > Technical Manager
> > mark@campbell-lange.net
> > .
> > Campbell-Lange Workshop
> > www.campbell-lange.net
> > 0207 6311 555
> > 3 Tottenham Street London W1T 2AF
> > Registered in England No. 04551928
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2010-10-11 10:10       ` Mark Adams
@ 2011-01-04 17:09         ` Ian Campbell
  2011-01-04 17:22           ` Tim Deegan
  2011-01-04 17:28           ` Gianni Tedesco
  0 siblings, 2 replies; 20+ messages in thread
From: Ian Campbell @ 2011-01-04 17:09 UTC (permalink / raw)
  To: Mark Adams; +Cc: wei song, xen devel

Which ever one of you started it: Please don't top post.

> On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
> >  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> > file.

On Mon, 2010-10-11 at 11:10 +0100, Mark Adams wrote:
> I can try this, but can you please confirm what these options do?

viridian=1 turns off support for the Hyper-V virtualisation
compatibility layer. Not many of these are actually supported but AFAIK
one or two are and can affect the behaviour of Win2008 (although you
would hope it was for the better!)

According to xmexample.hvm: tsc_mode : TSC mode (0=default, 1=native
TSC, 2=never emulate, 3=pvrdtscp).

Timer mode is apparently "0=delay virtual time when ticks are missed;
1=virtual time is always wallclock time".

To be honest I'm not entirely sure that those last two actually mean in
practice. Hopefully someone who understands this stuff will weigh in.

Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2011-01-04 17:09         ` Ian Campbell
@ 2011-01-04 17:22           ` Tim Deegan
  2011-01-04 17:27             ` Ian Campbell
  2011-01-04 17:28           ` Gianni Tedesco
  1 sibling, 1 reply; 20+ messages in thread
From: Tim Deegan @ 2011-01-04 17:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei song, xen devel, Mark Adams

At 17:09 +0000 on 04 Jan (1294160969), Ian Campbell wrote:
> Which ever one of you started it: Please don't top post.
> 
> > On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
> > >  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> > > file.
> 
> On Mon, 2010-10-11 at 11:10 +0100, Mark Adams wrote:
> > I can try this, but can you please confirm what these options do?
> 
> viridian=1 turns off support for the Hyper-V virtualisation
> compatibility layer.

Surely it turns it _on_, no?  It does indeed help with Windows guests,
in particular avoiding the vexing "STOP 101" bluescreen when Windows
thinks one CPU hasn't seen timer interrupts for too long.

> Not many of these are actually supported but AFAIK
> one or two are and can affect the behaviour of Win2008 (although you
> would hope it was for the better!)
> 
> According to xmexample.hvm: tsc_mode : TSC mode (0=default, 1=native
> TSC, 2=never emulate, 3=pvrdtscp).
> 
> Timer mode is apparently "0=delay virtual time when ticks are missed;
> 1=virtual time is always wallclock time".
> 
> To be honest I'm not entirely sure that those last two actually mean in
> practice. Hopefully someone who understands this stuff will weigh in.

Timer modes describe how guest-visible time and timer interrupts are
updated when a VCPU is rescheduled:  (2 is 'no-missed-ticks-pending')

 *  delay_for_missed_ticks (default):
 *   Do not advance a vcpu's time beyond the correct delivery time for
 *   interrupts that have been missed due to preemption. Deliver missed
 *   interrupts when the vcpu is rescheduled and advance the vcpu's
 *   virtual
 *   time stepwise for each one.
 *  no_delay_for_missed_ticks:
 *   As above, missed interrupts are delivered, but guest time always
 *   tracks
 *   wallclock (i.e., real) time while doing so.
 *  no_missed_ticks_pending:
 *   No missed interrupts are held pending. Instead, to ensure ticks are
 *   delivered at some non-zero rate, if we detect missed ticks then the
 *   internal tick alarm is not disabled if the VCPU is preempted during
 *   the
 *   next tick period.
 *  one_missed_tick_pending:
 *   Missed interrupts are collapsed together and delivered as one 'late
 *   tick'.
 *   Guest time always tracks wallclock (i.e., real) time.

TBH, though, trying to figure out exactly how that interacts with a
multi-processor OS that's trying to work backwards from timer values and
interupts to a consistent wallclock time is, let's say, tricky.
Mostly it's superstition of the "known to work best with OS foo"
variety rather than deep understanding. 

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2011-01-04 17:22           ` Tim Deegan
@ 2011-01-04 17:27             ` Ian Campbell
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2011-01-04 17:27 UTC (permalink / raw)
  To: Tim Deegan; +Cc: wei song, xen devel, Mark Adams

On Tue, 2011-01-04 at 17:22 +0000, Tim Deegan wrote:
> At 17:09 +0000 on 04 Jan (1294160969), Ian Campbell wrote:
> > Which ever one of you started it: Please don't top post.
> > 
> > > On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
> > > >  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> > > > file.
> > 
> > On Mon, 2010-10-11 at 11:10 +0100, Mark Adams wrote:
> > > I can try this, but can you please confirm what these options do?
> > 
> > viridian=1 turns off support for the Hyper-V virtualisation
> > compatibility layer.
> 
> Surely it turns it _on_, no?

Duh, Yeah.

> Timer modes describe [..]

Thanks!

Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2011-01-04 17:09         ` Ian Campbell
  2011-01-04 17:22           ` Tim Deegan
@ 2011-01-04 17:28           ` Gianni Tedesco
  2011-01-05 12:03             ` Mark Adams
  1 sibling, 1 reply; 20+ messages in thread
From: Gianni Tedesco @ 2011-01-04 17:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei song, xen devel, Mark Adams

On Tue, 2011-01-04 at 17:09 +0000, Ian Campbell wrote:
> Which ever one of you started it: Please don't top post.
> 
> > On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
> > >  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> > > file.
> 
> On Mon, 2010-10-11 at 11:10 +0100, Mark Adams wrote:
> > I can try this, but can you please confirm what these options do?
> 
> viridian=1 turns off support for the Hyper-V virtualisation
                   ^^^ on ;)

> compatibility layer. Not many of these are actually supported but AFAIK
> one or two are and can affect the behaviour of Win2008 (although you
> would hope it was for the better!)
> 
> According to xmexample.hvm: tsc_mode : TSC mode (0=default, 1=native
> TSC, 2=never emulate, 3=pvrdtscp).

>From xen/include/asm-x86/time.h:
/*
 *  PV TSC emulation modes:
 *    0 = guest rdtsc/p executed natively when monotonicity can be guaranteed
 *         and emulated otherwise (with frequency scaled if necessary)
 *    1 = guest rdtsc/p always emulated at 1GHz (kernel and user)
 *    2 = guest rdtsc always executed natively (no monotonicity/frequency
 *         guarantees); guest rdtscp emulated at native frequency if
 *         unsupported by h/w, else executed natively
 *    3 = same as 2, except xen manages TSC_AUX register so guest can
 *         determine when a restore/migration has occurred and assumes
 *         guest obtains/uses pvclock-like mechanism to adjust for
 *         monotonicity and frequency changes
 */

> 
> Timer mode is apparently "0=delay virtual time when ticks are missed;
> 1=virtual time is always wallclock time".

It's all explained in xen/include/public/hvm/params.h

AIUI, timer_mode=0 means if timer ticks were missed then they all get
re-injected in to the guest when next run. IOW if 5 ticks were missed
you get 5 IRQ's in succession and the clock time is incremented stepwise
with them. I think it should never be used unless the OS simply counts
ticks to figure out the time (which they don't).

1 is default and means the same except virtual time is always up to date
when re-injecting missed ticks.

3 i'm not sure about but sounds like some variant of 4?

4 is where missed ticks delivered in one 'late tick'

Gianni

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time
  2011-01-04 17:28           ` Gianni Tedesco
@ 2011-01-05 12:03             ` Mark Adams
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Adams @ 2011-01-05 12:03 UTC (permalink / raw)
  To: Gianni Tedesco; +Cc: wei song, xen devel, Ian Campbell

On Tue, Jan 04, 2011 at 05:28:02PM +0000, Gianni Tedesco wrote:
> On Tue, 2011-01-04 at 17:09 +0000, Ian Campbell wrote:
> > Which ever one of you started it: Please don't top post.
> > 
> > > On Sat, Oct 09, 2010 at 10:15:20AM +0800, wei song wrote:
> > > >  added timer_mode =2  and tsc_mode = 1 and viridian=1 into your configure
> > > > file.
> > 
> > On Mon, 2010-10-11 at 11:10 +0100, Mark Adams wrote:
> > > I can try this, but can you please confirm what these options do?
> > 
> > viridian=1 turns off support for the Hyper-V virtualisation
>                    ^^^ on ;)
> 
> > compatibility layer. Not many of these are actually supported but AFAIK
> > one or two are and can affect the behaviour of Win2008 (although you
> > would hope it was for the better!)
> > 
> > According to xmexample.hvm: tsc_mode : TSC mode (0=default, 1=native
> > TSC, 2=never emulate, 3=pvrdtscp).
> 
> >From xen/include/asm-x86/time.h:
> /*
>  *  PV TSC emulation modes:
>  *    0 = guest rdtsc/p executed natively when monotonicity can be guaranteed
>  *         and emulated otherwise (with frequency scaled if necessary)
>  *    1 = guest rdtsc/p always emulated at 1GHz (kernel and user)
>  *    2 = guest rdtsc always executed natively (no monotonicity/frequency
>  *         guarantees); guest rdtscp emulated at native frequency if
>  *         unsupported by h/w, else executed natively
>  *    3 = same as 2, except xen manages TSC_AUX register so guest can
>  *         determine when a restore/migration has occurred and assumes
>  *         guest obtains/uses pvclock-like mechanism to adjust for
>  *         monotonicity and frequency changes
>  */
> 
> > 
> > Timer mode is apparently "0=delay virtual time when ticks are missed;
> > 1=virtual time is always wallclock time".
> 
> It's all explained in xen/include/public/hvm/params.h
> 
> AIUI, timer_mode=0 means if timer ticks were missed then they all get
> re-injected in to the guest when next run. IOW if 5 ticks were missed
> you get 5 IRQ's in succession and the clock time is incremented stepwise
> with them. I think it should never be used unless the OS simply counts
> ticks to figure out the time (which they don't).
> 
> 1 is default and means the same except virtual time is always up to date
> when re-injecting missed ticks.
> 
> 3 i'm not sure about but sounds like some variant of 4?
> 
> 4 is where missed ticks delivered in one 'late tick'
> 
> Gianni
> 
> 

All, Thanks for input. Do you agree that the -best- setting to maintain
stable time for 2008 R2 hvm guests is to use timer_mode = 2, tsc_mode =
1 and viridian = 1 ?

Best Regards,
Mark

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-01-05 12:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-06 11:16 Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 domU time Mark Adams
2010-10-06 12:20 ` Clock jumped 50 minutes in dom0 caused incorrect 2008R2 " James Harper
2010-10-06 12:24 ` James Harper
2010-10-06 13:04   ` Mark Adams
2010-10-06 15:41 ` Clock jumped 50 minutes in dom0 caused incorrect 2008 R2 " Jeremy Fitzhardinge
2010-10-06 16:15   ` Mark Adams
2010-10-06 16:23     ` Jeremy Fitzhardinge
2010-10-07 14:04       ` Dan Magenheimer
2010-10-26  9:22         ` Mark Adams
2010-10-26 17:03           ` Jeremy Fitzhardinge
2010-10-26 21:54             ` Dan Magenheimer
2010-10-27 20:29               ` Dan Magenheimer
2011-01-04 17:00                 ` Mark Adams
     [not found] ` <AANLkTinDMfrR5u2k3kPJfJ9Z+op53v6ziEYnLEO03FkG@mail.gmail.com>
     [not found]   ` <20101008100907.GH30044@campbell-lange.net>
2010-10-09  2:15     ` wei song
2010-10-11 10:10       ` Mark Adams
2011-01-04 17:09         ` Ian Campbell
2011-01-04 17:22           ` Tim Deegan
2011-01-04 17:27             ` Ian Campbell
2011-01-04 17:28           ` Gianni Tedesco
2011-01-05 12:03             ` Mark Adams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).