* Clocksource tsc unstable (delta = -4398046474878 ns)
@ 2010-03-28 11:46 Sebastian Hetze
2010-03-29 10:31 ` Athanasius
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Hetze @ 2010-03-28 11:46 UTC (permalink / raw)
To: kvm
Hi *,
this message appeared in the KVM guest kern.log last night:
Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)
The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
If I understand things correct, in kernel/time/clocksource.c
clocksource_watchdog() checks all the
/sys/devices/system/clocksource/clocksource0/available_clocksource
every 0.5sec for an delta of more than 0.0625s. So the tsc must have
changed more than one hour within two subsequent calls of
clocksource_watchdog. No event in the host nor anything in the
guest gives reasonable cause for this step.
However, the number 4398046474878 is only 36226 ns away from
4*1024*1024*1024*1024
The guest is an 32 bit system running in a 64 bit host. Is this an
possible cause of this strange message? Any other idea what is going
wrong here? Maybe this is a hardware bug?
Best regards,
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze
@ 2010-03-29 10:31 ` Athanasius
2010-03-30 8:08 ` Sebastian Hetze
0 siblings, 1 reply; 7+ messages in thread
From: Athanasius @ 2010-03-29 10:31 UTC (permalink / raw)
To: Sebastian Hetze; +Cc: kvm
[-- Attachment #1: Type: text/plain, Size: 2806 bytes --]
On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> this message appeared in the KVM guest kern.log last night:
>
> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)
>
> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
>
> If I understand things correct, in kernel/time/clocksource.c
> clocksource_watchdog() checks all the
> /sys/devices/system/clocksource/clocksource0/available_clocksource
> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> changed more than one hour within two subsequent calls of
> clocksource_watchdog. No event in the host nor anything in the
> guest gives reasonable cause for this step.
>
> However, the number 4398046474878 is only 36226 ns away from
> 4*1024*1024*1024*1024
I didn't see any such messages but I've had a recent experience with
the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
two separate incidents. Eerily the exact jumps, as best I can tell from
logs are of 17592 and 8796 seconds, give or take a second or two. If
you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
nanoseconds.
What I've done that seems to have avoided this happening again is drop
KVM_CLOCK kernel option from the kvm guests' kernel.
This is with a Debian squeeze (testing) KVM host running 2.6.33 from
vanilla sources and my own config. The guests are Debian lenny
(stable) and were also running a 2.6.33 kernel from vanilla sources and
my own (different, to match the virtual hardware in a KVM guest) config.
Both systems/kernels are 64 bit. The base machine is a Dell R210 with
an Intel Xeon X3450 quad-core CPU, with the hyper-threading enabled to
give 8 visible CPUs in Linux. This only happened on one of the two
guests, the much busier one (it does shell accounts, email, IMAP/POP3, a
small news server and NFS serves web pages to the other guest which only
runs apache2 and nagios3).
It took around 2-3 days to see the problem both times. Without
KVM_CLOCK it's been up and stable for well over a week now. Without
KVM_CLOCK the only clocksource is acpi_pm and thus that is being used.
I didn't test forcing that with a boot-time parameter and KVM_CLOCK
still enabled.
Given turning KVM_CLOCK off fixed my problem and the problem repeating
itself causes all manner of trouble given how busy the machine is I'm
not really willing to test alternative fixes.
--
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
Finger athan(at)fysh.org for PGP key
"And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-29 10:31 ` Athanasius
@ 2010-03-30 8:08 ` Sebastian Hetze
2010-03-30 16:12 ` Athanasius
2010-03-30 17:04 ` Beinicke, Thomas
0 siblings, 2 replies; 7+ messages in thread
From: Sebastian Hetze @ 2010-03-30 8:08 UTC (permalink / raw)
To: Sebastian Hetze, kvm
On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> > this message appeared in the KVM guest kern.log last night:
> >
> > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns)
> >
> > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> >
> > If I understand things correct, in kernel/time/clocksource.c
> > clocksource_watchdog() checks all the
> > /sys/devices/system/clocksource/clocksource0/available_clocksource
> > every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> > changed more than one hour within two subsequent calls of
> > clocksource_watchdog. No event in the host nor anything in the
> > guest gives reasonable cause for this step.
> >
> > However, the number 4398046474878 is only 36226 ns away from
> > 4*1024*1024*1024*1024
>
> I didn't see any such messages but I've had a recent experience with
> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> two separate incidents. Eerily the exact jumps, as best I can tell from
> logs are of 17592 and 8796 seconds, give or take a second or two. If
> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> nanoseconds.
> What I've done that seems to have avoided this happening again is drop
> KVM_CLOCK kernel option from the kvm guests' kernel.
To my understanding, kvm-clock is the best and most reliable clocksource
available, so I do not think it is a good idea to disable it.
There is a lot of bit shift operation happening with the clocksources,
so there may be a real bug hidden somewhere in the code.
Somehow ntp adjustment is involved, can this cause such huge steps?
Im my case, I actually have NTP running in the guest. However, the
statistics show a pretty stable timing here.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-30 8:08 ` Sebastian Hetze
@ 2010-03-30 16:12 ` Athanasius
2010-03-30 17:04 ` Beinicke, Thomas
1 sibling, 0 replies; 7+ messages in thread
From: Athanasius @ 2010-03-30 16:12 UTC (permalink / raw)
To: Sebastian Hetze; +Cc: kvm
On Tue, Mar 30, 2010 at 10:08:28AM +0200, Sebastian Hetze wrote:
> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> > I didn't see any such messages but I've had a recent experience with
> > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> > two separate incidents. Eerily the exact jumps, as best I can tell from
> > logs are of 17592 and 8796 seconds, give or take a second or two. If
> > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> > nanoseconds.
> > What I've done that seems to have avoided this happening again is drop
> > KVM_CLOCK kernel option from the kvm guests' kernel.
>
> To my understanding, kvm-clock is the best and most reliable clocksource
> available, so I do not think it is a good idea to disable it.
>
> There is a lot of bit shift operation happening with the clocksources,
> so there may be a real bug hidden somewhere in the code.
> Somehow ntp adjustment is involved, can this cause such huge steps?
> Im my case, I actually have NTP running in the guest. However, the
> statistics show a pretty stable timing here.
This is one thing thing to note, I *was* running ntpd in the affected
guest (and rather obviously, I still am). If there's some bad
interaction between KVM_CLOCK and ntpd it needs documenting in the
first instance and preferably also fixing.
--
- Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/
Finger athan(at)fysh.org for PGP key
"And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-30 8:08 ` Sebastian Hetze
2010-03-30 16:12 ` Athanasius
@ 2010-03-30 17:04 ` Beinicke, Thomas
2010-03-31 19:32 ` Zachary Amsden
1 sibling, 1 reply; 7+ messages in thread
From: Beinicke, Thomas @ 2010-03-30 17:04 UTC (permalink / raw)
To: Sebastian Hetze; +Cc: kvm@vger.kernel.org
On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> > On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> > > this message appeared in the KVM guest kern.log last night:
> > >
> > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
> > > (delta = -4398046474878 ns)
> > >
> > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> > > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> > >
> > > If I understand things correct, in kernel/time/clocksource.c
> > > clocksource_watchdog() checks all the
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource
> > > every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> > > changed more than one hour within two subsequent calls of
> > > clocksource_watchdog. No event in the host nor anything in the
> > > guest gives reasonable cause for this step.
> > >
> > > However, the number 4398046474878 is only 36226 ns away from
> > > 4*1024*1024*1024*1024
> > >
> > I didn't see any such messages but I've had a recent experience with
> >
> > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> > two separate incidents. Eerily the exact jumps, as best I can tell from
> > logs are of 17592 and 8796 seconds, give or take a second or two. If
> > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
> > nanoseconds.
> >
> > What I've done that seems to have avoided this happening again is drop
> >
> > KVM_CLOCK kernel option from the kvm guests' kernel.
>
> To my understanding, kvm-clock is the best and most reliable clocksource
> available, so I do not think it is a good idea to disable it.
>
> There is a lot of bit shift operation happening with the clocksources,
> so there may be a real bug hidden somewhere in the code.
> Somehow ntp adjustment is involved, can this cause such huge steps?
> Im my case, I actually have NTP running in the guest. However, the
> statistics show a pretty stable timing here.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I am having the same problem occasional.
It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce
it 100%. It just never occurs on VMs that only serve a few web pages though.
I also noticed that on a machine which has this problem even an ssh shell is
*very* laggy so it's not just a cosmetic problem.
Would removing the hrtimer from the kernel config solve it or is it necessary
for KVM?
I remember this problem has been posted her before though there wasn't any
real conclusion or solution for it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-31 19:32 ` Zachary Amsden
@ 2010-03-31 13:09 ` Beinicke, Thomas
0 siblings, 0 replies; 7+ messages in thread
From: Beinicke, Thomas @ 2010-03-31 13:09 UTC (permalink / raw)
To: Zachary Amsden; +Cc: kvm@vger.kernel.org
On Wednesday 31 March 2010 21:32:18 you wrote:
> On 03/30/10 07:04, Beinicke, Thomas wrote:
> > On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
> >> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
> >>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
> >>>> this message appeared in the KVM guest kern.log last night:
> >>>>
> >>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
> >>>> (delta = -4398046474878 ns)
> >>>>
> >>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
> >>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
> >>>>
> >>>> If I understand things correct, in kernel/time/clocksource.c
> >>>> clocksource_watchdog() checks all the
> >>>> /sys/devices/system/clocksource/clocksource0/available_clocksource
> >>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
> >>>> changed more than one hour within two subsequent calls of
> >>>> clocksource_watchdog. No event in the host nor anything in the
> >>>> guest gives reasonable cause for this step.
> >>>>
> >>>> However, the number 4398046474878 is only 36226 ns away from
> >>>> 4*1024*1024*1024*1024
> >>>>
> >>> I didn't see any such messages but I've had a recent experience with
> >>>
> >>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
> >>> two separate incidents. Eerily the exact jumps, as best I can tell
> >>> from logs are of 17592 and 8796 seconds, give or take a second or two.
> >>> If you look at these as nanoseconds then that's 'exactly' 2^44 and
> >>> 2^43 nanoseconds.
> >>>
> >>> What I've done that seems to have avoided this happening again is
> >>> drop
> >>>
> >>> KVM_CLOCK kernel option from the kvm guests' kernel.
> >>
> >> To my understanding, kvm-clock is the best and most reliable clocksource
> >> available, so I do not think it is a good idea to disable it.
> >>
> >> There is a lot of bit shift operation happening with the clocksources,
> >> so there may be a real bug hidden somewhere in the code.
> >> Somehow ntp adjustment is involved, can this cause such huge steps?
> >> Im my case, I actually have NTP running in the guest. However, the
> >> statistics show a pretty stable timing here.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > I am having the same problem occasional.
> > It only occurs if the VM is under heavy IO or CPU Load but I can't
> > reproduce it 100%. It just never occurs on VMs that only serve a few web
> > pages though. I also noticed that on a machine which has this problem
> > even an ssh shell is *very* laggy so it's not just a cosmetic problem.
> >
> > Would removing the hrtimer from the kernel config solve it or is it
> > necessary for KVM?
> >
> > I remember this problem has been posted her before though there wasn't
> > any real conclusion or solution for it.
>
> Are you also running a 32-bit kernel?
I have the problem on 32-bit and 64-bit clients.
The host machines are all 64-bit.
> Thanks,
>
> Zach
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns)
2010-03-30 17:04 ` Beinicke, Thomas
@ 2010-03-31 19:32 ` Zachary Amsden
2010-03-31 13:09 ` Beinicke, Thomas
0 siblings, 1 reply; 7+ messages in thread
From: Zachary Amsden @ 2010-03-31 19:32 UTC (permalink / raw)
To: Beinicke, Thomas; +Cc: Sebastian Hetze, kvm@vger.kernel.org
On 03/30/10 07:04, Beinicke, Thomas wrote:
> On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
>
>> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
>>
>>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
>>>
>>>> this message appeared in the KVM guest kern.log last night:
>>>>
>>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
>>>> (delta = -4398046474878 ns)
>>>>
>>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
>>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
>>>>
>>>> If I understand things correct, in kernel/time/clocksource.c
>>>> clocksource_watchdog() checks all the
>>>> /sys/devices/system/clocksource/clocksource0/available_clocksource
>>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
>>>> changed more than one hour within two subsequent calls of
>>>> clocksource_watchdog. No event in the host nor anything in the
>>>> guest gives reasonable cause for this step.
>>>>
>>>> However, the number 4398046474878 is only 36226 ns away from
>>>> 4*1024*1024*1024*1024
>>>>
>>>>
>>> I didn't see any such messages but I've had a recent experience with
>>>
>>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
>>> two separate incidents. Eerily the exact jumps, as best I can tell from
>>> logs are of 17592 and 8796 seconds, give or take a second or two. If
>>> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
>>> nanoseconds.
>>>
>>> What I've done that seems to have avoided this happening again is drop
>>>
>>> KVM_CLOCK kernel option from the kvm guests' kernel.
>>>
>> To my understanding, kvm-clock is the best and most reliable clocksource
>> available, so I do not think it is a good idea to disable it.
>>
>> There is a lot of bit shift operation happening with the clocksources,
>> so there may be a real bug hidden somewhere in the code.
>> Somehow ntp adjustment is involved, can this cause such huge steps?
>> Im my case, I actually have NTP running in the guest. However, the
>> statistics show a pretty stable timing here.
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> I am having the same problem occasional.
> It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce
> it 100%. It just never occurs on VMs that only serve a few web pages though.
> I also noticed that on a machine which has this problem even an ssh shell is
> *very* laggy so it's not just a cosmetic problem.
>
> Would removing the hrtimer from the kernel config solve it or is it necessary
> for KVM?
>
> I remember this problem has been posted her before though there wasn't any
> real conclusion or solution for it.
>
Are you also running a 32-bit kernel?
Thanks,
Zach
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-03-31 13:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze
2010-03-29 10:31 ` Athanasius
2010-03-30 8:08 ` Sebastian Hetze
2010-03-30 16:12 ` Athanasius
2010-03-30 17:04 ` Beinicke, Thomas
2010-03-31 19:32 ` Zachary Amsden
2010-03-31 13:09 ` Beinicke, Thomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox