public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Clocksource tsc unstable (delta = -4398046474878 ns)
@ 2010-03-28 11:46 Sebastian Hetze
  2010-03-29 10:31 ` Athanasius
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Hetze @ 2010-03-28 11:46 UTC (permalink / raw)
  To: kvm

Hi *,

this message appeared in the KVM guest kern.log last night:

Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)

The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.

If I understand things correct, in kernel/time/clocksource.c
clocksource_watchdog() checks all the
/sys/devices/system/clocksource/clocksource0/available_clocksource
every 0.5sec for an delta of more than 0.0625s. So the tsc must have
changed more than one hour within two subsequent calls of
clocksource_watchdog. No event in the host nor anything in the
guest gives reasonable cause for this step.

However, the number 4398046474878 is only 36226 ns away from
4*1024*1024*1024*1024

The guest is an 32 bit system running in a 64 bit host. Is this an
possible cause of this strange message? Any other idea what is going
wrong here? Maybe this is a hardware bug?

Best regards,

  Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze
@ 2010-03-29 10:31 ` Athanasius
  2010-03-30  8:08   ` Sebastian Hetze
  0 siblings, 1 reply; 7+ messages in thread
From: Athanasius @ 2010-03-29 10:31 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 2806 bytes --]

On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> this message appeared in the KVM guest kern.log last night:
> 
> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)
> 
> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> 
> If I understand things correct, in kernel/time/clocksource.c
> clocksource_watchdog() checks all the
> /sys/devices/system/clocksource/clocksource0/available_clocksource
> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> changed more than one hour within two subsequent calls of
> clocksource_watchdog. No event in the host nor anything in the
> guest gives reasonable cause for this step.
> 
> However, the number 4398046474878 is only 36226 ns away from
> 4*1024*1024*1024*1024

  I didn't see any such messages but I've had a recent experience with
the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
two separate incidents.  Eerily the exact jumps, as best I can tell from
logs are of 17592 and 8796 seconds, give or take a second or two.  If
you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
nanoseconds.
  What I've done that seems to have avoided this happening again is drop
KVM_CLOCK kernel option from the kvm guests' kernel.

  This is with a Debian squeeze (testing) KVM host running 2.6.33 from
vanilla sources and my own config.  The guests are Debian lenny
(stable) and were also running a 2.6.33 kernel from vanilla sources and
my own (different, to match the virtual hardware in a KVM guest) config.
Both systems/kernels are 64 bit.  The base machine is a Dell R210 with
an Intel Xeon X3450 quad-core CPU, with the hyper-threading enabled to
give 8 visible CPUs in Linux.  This only happened on one of the two
guests, the much busier one (it does shell accounts, email, IMAP/POP3, a
small news server and NFS serves web pages to the other guest which only
runs apache2 and nagios3).
  It took around 2-3 days to see the problem both times.  Without
KVM_CLOCK it's been up and stable for well over a week now.  Without
KVM_CLOCK the only clocksource is acpi_pm and thus that is being used.
I didn't test forcing that with a boot-time parameter and KVM_CLOCK
still enabled.

  Given turning KVM_CLOCK off fixed my problem and the problem repeating
itself causes all manner of trouble given how busy the machine is I'm
not really willing to test alternative fixes.

-- 
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-29 10:31 ` Athanasius
@ 2010-03-30  8:08   ` Sebastian Hetze
  2010-03-30 16:12     ` Athanasius
  2010-03-30 17:04     ` Beinicke, Thomas
  0 siblings, 2 replies; 7+ messages in thread
From: Sebastian Hetze @ 2010-03-30  8:08 UTC (permalink / raw)
  To: Sebastian Hetze, kvm

On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> > this message appeared in the KVM guest kern.log last night:
> > 
> > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)
> > 
> > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> > 
> > If I understand things correct, in kernel/time/clocksource.c
> > clocksource_watchdog() checks all the
> > /sys/devices/system/clocksource/clocksource0/available_clocksource
> > every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> > changed more than one hour within two subsequent calls of
> > clocksource_watchdog. No event in the host nor anything in the
> > guest gives reasonable cause for this step.
> > 
> > However, the number 4398046474878 is only 36226 ns away from
> > 4*1024*1024*1024*1024
> 
>   I didn't see any such messages but I've had a recent experience with
> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> two separate incidents.  Eerily the exact jumps, as best I can tell from
> logs are of 17592 and 8796 seconds, give or take a second or two.  If
> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> nanoseconds.
>   What I've done that seems to have avoided this happening again is drop
> KVM_CLOCK kernel option from the kvm guests' kernel.

To my understanding, kvm-clock is the best and most reliable clocksource
available, so I do not think it is a good idea to disable it.

There is a lot of bit shift operation happening with the clocksources,
so there may be a real bug hidden somewhere in the code.
Somehow ntp adjustment is involved, can this cause such huge steps?
Im my case, I actually have NTP running in the guest. However, the
statistics show a pretty stable timing here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-30  8:08   ` Sebastian Hetze
@ 2010-03-30 16:12     ` Athanasius
  2010-03-30 17:04     ` Beinicke, Thomas
  1 sibling, 0 replies; 7+ messages in thread
From: Athanasius @ 2010-03-30 16:12 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm

On Tue, Mar 30, 2010 at 10:08:28AM +0200, Sebastian Hetze wrote:
> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> >   I didn't see any such messages but I've had a recent experience with
> > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> > two separate incidents.  Eerily the exact jumps, as best I can tell from
> > logs are of 17592 and 8796 seconds, give or take a second or two.  If
> > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> > nanoseconds.
> >   What I've done that seems to have avoided this happening again is drop
> > KVM_CLOCK kernel option from the kvm guests' kernel.
> 
> To my understanding, kvm-clock is the best and most reliable clocksource
> available, so I do not think it is a good idea to disable it.
> 
> There is a lot of bit shift operation happening with the clocksources,
> so there may be a real bug hidden somewhere in the code.
> Somehow ntp adjustment is involved, can this cause such huge steps?
> Im my case, I actually have NTP running in the guest. However, the
> statistics show a pretty stable timing here.

  This is one thing thing to note, I *was* running ntpd in the affected
guest (and rather obviously, I still am).  If there's some bad
interaction between KVM_CLOCK and ntpd it needs documenting in the
first instance and preferably also fixing.

-- 
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-30  8:08   ` Sebastian Hetze
  2010-03-30 16:12     ` Athanasius
@ 2010-03-30 17:04     ` Beinicke, Thomas
  2010-03-31 19:32       ` Zachary Amsden
  1 sibling, 1 reply; 7+ messages in thread
From: Beinicke, Thomas @ 2010-03-30 17:04 UTC (permalink / raw)
  To: Sebastian Hetze; +Cc: kvm@vger.kernel.org

On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> > On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> > > this message appeared in the KVM guest kern.log last night:
> > > 
> > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
> > > (delta = -4398046474878 ns)
> > > 
> > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> > > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> > > 
> > > If I understand things correct, in kernel/time/clocksource.c
> > > clocksource_watchdog() checks all the
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource
> > > every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> > > changed more than one hour within two subsequent calls of
> > > clocksource_watchdog. No event in the host nor anything in the
> > > guest gives reasonable cause for this step.
> > > 
> > > However, the number 4398046474878 is only 36226 ns away from
> > > 4*1024*1024*1024*1024
> > > 
> >   I didn't see any such messages but I've had a recent experience with
> > 
> > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> > two separate incidents.  Eerily the exact jumps, as best I can tell from
> > logs are of 17592 and 8796 seconds, give or take a second or two.  If
> > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> > nanoseconds.
> > 
> >   What I've done that seems to have avoided this happening again is drop
> > 
> > KVM_CLOCK kernel option from the kvm guests' kernel.
> 
> To my understanding, kvm-clock is the best and most reliable clocksource
> available, so I do not think it is a good idea to disable it.
> 
> There is a lot of bit shift operation happening with the clocksources,
> so there may be a real bug hidden somewhere in the code.
> Somehow ntp adjustment is involved, can this cause such huge steps?
> Im my case, I actually have NTP running in the guest. However, the
> statistics show a pretty stable timing here.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I am having the same problem occasional.
It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
it 100%. It just never occurs on VMs that only serve a few web pages though.
I also noticed that on a machine which has this problem even an ssh shell is 
*very* laggy so it's not just a cosmetic problem.

Would removing the hrtimer from the kernel config solve it or is it necessary 
for KVM?

I remember this problem has been posted her before though there wasn't any 
real conclusion or solution for it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-31 19:32       ` Zachary Amsden
@ 2010-03-31 13:09         ` Beinicke, Thomas
  0 siblings, 0 replies; 7+ messages in thread
From: Beinicke, Thomas @ 2010-03-31 13:09 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm@vger.kernel.org

On Wednesday 31 March 2010 21:32:18 you wrote:
> On 03/30/10 07:04, Beinicke, Thomas wrote:
> > On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
> >> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> >>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> >>>> this message appeared in the KVM guest kern.log last night:
> >>>> 
> >>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
> >>>> (delta = -4398046474878 ns)
> >>>> 
> >>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> >>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> >>>> 
> >>>> If I understand things correct, in kernel/time/clocksource.c
> >>>> clocksource_watchdog() checks all the
> >>>> /sys/devices/system/clocksource/clocksource0/available_clocksource
> >>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> >>>> changed more than one hour within two subsequent calls of
> >>>> clocksource_watchdog. No event in the host nor anything in the
> >>>> guest gives reasonable cause for this step.
> >>>> 
> >>>> However, the number 4398046474878 is only 36226 ns away from
> >>>> 4*1024*1024*1024*1024
> >>>> 
> >>>   I didn't see any such messages but I've had a recent experience with
> >>> 
> >>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> >>> two separate incidents.  Eerily the exact jumps, as best I can tell
> >>> from logs are of 17592 and 8796 seconds, give or take a second or two.
> >>>  If you look at these as nanoseconds then that's 'exactly' 2^44 and
> >>> 2^43 nanoseconds.
> >>> 
> >>>   What I've done that seems to have avoided this happening again is
> >>>   drop
> >>> 
> >>> KVM_CLOCK kernel option from the kvm guests' kernel.
> >> 
> >> To my understanding, kvm-clock is the best and most reliable clocksource
> >> available, so I do not think it is a good idea to disable it.
> >> 
> >> There is a lot of bit shift operation happening with the clocksources,
> >> so there may be a real bug hidden somewhere in the code.
> >> Somehow ntp adjustment is involved, can this cause such huge steps?
> >> Im my case, I actually have NTP running in the guest. However, the
> >> statistics show a pretty stable timing here.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > I am having the same problem occasional.
> > It only occurs if the VM is under heavy IO or CPU Load but I can't
> > reproduce it 100%. It just never occurs on VMs that only serve a few web
> > pages though. I also noticed that on a machine which has this problem
> > even an ssh shell is *very* laggy so it's not just a cosmetic problem.
> > 
> > Would removing the hrtimer from the kernel config solve it or is it
> > necessary for KVM?
> > 
> > I remember this problem has been posted her before though there wasn't
> > any real conclusion or solution for it.
> 
> Are you also running a 32-bit kernel?

I have the problem on 32-bit and 64-bit clients.
The host machines are all 64-bit.

> Thanks,
> 
> Zach


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
  2010-03-30 17:04     ` Beinicke, Thomas
@ 2010-03-31 19:32       ` Zachary Amsden
  2010-03-31 13:09         ` Beinicke, Thomas
  0 siblings, 1 reply; 7+ messages in thread
From: Zachary Amsden @ 2010-03-31 19:32 UTC (permalink / raw)
  To: Beinicke, Thomas; +Cc: Sebastian Hetze, kvm@vger.kernel.org

On 03/30/10 07:04, Beinicke, Thomas wrote:
> On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
>   
>> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
>>     
>>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
>>>       
>>>> this message appeared in the KVM guest kern.log last night:
>>>>
>>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
>>>> (delta = -4398046474878 ns)
>>>>
>>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
>>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
>>>>
>>>> If I understand things correct, in kernel/time/clocksource.c
>>>> clocksource_watchdog() checks all the
>>>> /sys/devices/system/clocksource/clocksource0/available_clocksource
>>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
>>>> changed more than one hour within two subsequent calls of
>>>> clocksource_watchdog. No event in the host nor anything in the
>>>> guest gives reasonable cause for this step.
>>>>
>>>> However, the number 4398046474878 is only 36226 ns away from
>>>> 4*1024*1024*1024*1024
>>>>
>>>>         
>>>   I didn't see any such messages but I've had a recent experience with
>>>
>>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
>>> two separate incidents.  Eerily the exact jumps, as best I can tell from
>>> logs are of 17592 and 8796 seconds, give or take a second or two.  If
>>> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
>>> nanoseconds.
>>>
>>>   What I've done that seems to have avoided this happening again is drop
>>>
>>> KVM_CLOCK kernel option from the kvm guests' kernel.
>>>       
>> To my understanding, kvm-clock is the best and most reliable clocksource
>> available, so I do not think it is a good idea to disable it.
>>
>> There is a lot of bit shift operation happening with the clocksources,
>> so there may be a real bug hidden somewhere in the code.
>> Somehow ntp adjustment is involved, can this cause such huge steps?
>> Im my case, I actually have NTP running in the guest. However, the
>> statistics show a pretty stable timing here.
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>     
> I am having the same problem occasional.
> It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
> it 100%. It just never occurs on VMs that only serve a few web pages though.
> I also noticed that on a machine which has this problem even an ssh shell is 
> *very* laggy so it's not just a cosmetic problem.
>
> Would removing the hrtimer from the kernel config solve it or is it necessary 
> for KVM?
>
> I remember this problem has been posted her before though there wasn't any 
> real conclusion or solution for it.
>   

Are you also running a 32-bit kernel?

Thanks,

Zach

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-03-31 13:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze
2010-03-29 10:31 ` Athanasius
2010-03-30  8:08   ` Sebastian Hetze
2010-03-30 16:12     ` Athanasius
2010-03-30 17:04     ` Beinicke, Thomas
2010-03-31 19:32       ` Zachary Amsden
2010-03-31 13:09         ` Beinicke, Thomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox