* Clocksource tsc unstable (delta = -4398046474878 ns) @ 2010-03-28 11:46 Sebastian Hetze 2010-03-29 10:31 ` Athanasius 0 siblings, 1 reply; 7+ messages in thread From: Sebastian Hetze @ 2010-03-28 11:46 UTC (permalink / raw) To: kvm Hi *, this message appeared in the KVM guest kern.log last night: Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns) The guest is running a 2.6.31-20-generic-pae ubuntu kernel with hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. If I understand things correct, in kernel/time/clocksource.c clocksource_watchdog() checks all the /sys/devices/system/clocksource/clocksource0/available_clocksource every 0.5sec for an delta of more than 0.0625s. So the tsc must have changed more than one hour within two subsequent calls of clocksource_watchdog. No event in the host nor anything in the guest gives reasonable cause for this step. However, the number 4398046474878 is only 36226 ns away from 4*1024*1024*1024*1024 The guest is an 32 bit system running in a 64 bit host. Is this an possible cause of this strange message? Any other idea what is going wrong here? Maybe this is a hardware bug? Best regards, Sebastian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze @ 2010-03-29 10:31 ` Athanasius 2010-03-30 8:08 ` Sebastian Hetze 0 siblings, 1 reply; 7+ messages in thread From: Athanasius @ 2010-03-29 10:31 UTC (permalink / raw) To: Sebastian Hetze; +Cc: kvm [-- Attachment #1: Type: text/plain, Size: 2806 bytes --] On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: > this message appeared in the KVM guest kern.log last night: > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns) > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. > > If I understand things correct, in kernel/time/clocksource.c > clocksource_watchdog() checks all the > /sys/devices/system/clocksource/clocksource0/available_clocksource > every 0.5sec for an delta of more than 0.0625s. So the tsc must have > changed more than one hour within two subsequent calls of > clocksource_watchdog. No event in the host nor anything in the > guest gives reasonable cause for this step. > > However, the number 4398046474878 is only 36226 ns away from > 4*1024*1024*1024*1024 I didn't see any such messages but I've had a recent experience with the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in two separate incidents. Eerily the exact jumps, as best I can tell from logs are of 17592 and 8796 seconds, give or take a second or two. If you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 nanoseconds. What I've done that seems to have avoided this happening again is drop KVM_CLOCK kernel option from the kvm guests' kernel. This is with a Debian squeeze (testing) KVM host running 2.6.33 from vanilla sources and my own config. The guests are Debian lenny (stable) and were also running a 2.6.33 kernel from vanilla sources and my own (different, to match the virtual hardware in a KVM guest) config. Both systems/kernels are 64 bit. The base machine is a Dell R210 with an Intel Xeon X3450 quad-core CPU, with the hyper-threading enabled to give 8 visible CPUs in Linux. This only happened on one of the two guests, the much busier one (it does shell accounts, email, IMAP/POP3, a small news server and NFS serves web pages to the other guest which only runs apache2 and nagios3). It took around 2-3 days to see the problem both times. Without KVM_CLOCK it's been up and stable for well over a week now. Without KVM_CLOCK the only clocksource is acpi_pm and thus that is being used. I didn't test forcing that with a boot-time parameter and KVM_CLOCK still enabled. Given turning KVM_CLOCK off fixed my problem and the problem repeating itself causes all manner of trouble given how busy the machine is I'm not really willing to test alternative fixes. -- - Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/ Finger athan(at)fysh.org for PGP key "And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence." Paula Cole - ME [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-29 10:31 ` Athanasius @ 2010-03-30 8:08 ` Sebastian Hetze 2010-03-30 16:12 ` Athanasius 2010-03-30 17:04 ` Beinicke, Thomas 0 siblings, 2 replies; 7+ messages in thread From: Sebastian Hetze @ 2010-03-30 8:08 UTC (permalink / raw) To: Sebastian Hetze, kvm On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: > On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: > > this message appeared in the KVM guest kern.log last night: > > > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (delta = -4398046474878 ns) > > > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with > > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. > > > > If I understand things correct, in kernel/time/clocksource.c > > clocksource_watchdog() checks all the > > /sys/devices/system/clocksource/clocksource0/available_clocksource > > every 0.5sec for an delta of more than 0.0625s. So the tsc must have > > changed more than one hour within two subsequent calls of > > clocksource_watchdog. No event in the host nor anything in the > > guest gives reasonable cause for this step. > > > > However, the number 4398046474878 is only 36226 ns away from > > 4*1024*1024*1024*1024 > > I didn't see any such messages but I've had a recent experience with > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in > two separate incidents. Eerily the exact jumps, as best I can tell from > logs are of 17592 and 8796 seconds, give or take a second or two. If > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 > nanoseconds. > What I've done that seems to have avoided this happening again is drop > KVM_CLOCK kernel option from the kvm guests' kernel. To my understanding, kvm-clock is the best and most reliable clocksource available, so I do not think it is a good idea to disable it. There is a lot of bit shift operation happening with the clocksources, so there may be a real bug hidden somewhere in the code. Somehow ntp adjustment is involved, can this cause such huge steps? Im my case, I actually have NTP running in the guest. However, the statistics show a pretty stable timing here. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-30 8:08 ` Sebastian Hetze @ 2010-03-30 16:12 ` Athanasius 2010-03-30 17:04 ` Beinicke, Thomas 1 sibling, 0 replies; 7+ messages in thread From: Athanasius @ 2010-03-30 16:12 UTC (permalink / raw) To: Sebastian Hetze; +Cc: kvm On Tue, Mar 30, 2010 at 10:08:28AM +0200, Sebastian Hetze wrote: > On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: > > I didn't see any such messages but I've had a recent experience with > > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in > > two separate incidents. Eerily the exact jumps, as best I can tell from > > logs are of 17592 and 8796 seconds, give or take a second or two. If > > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 > > nanoseconds. > > What I've done that seems to have avoided this happening again is drop > > KVM_CLOCK kernel option from the kvm guests' kernel. > > To my understanding, kvm-clock is the best and most reliable clocksource > available, so I do not think it is a good idea to disable it. > > There is a lot of bit shift operation happening with the clocksources, > so there may be a real bug hidden somewhere in the code. > Somehow ntp adjustment is involved, can this cause such huge steps? > Im my case, I actually have NTP running in the guest. However, the > statistics show a pretty stable timing here. This is one thing thing to note, I *was* running ntpd in the affected guest (and rather obviously, I still am). If there's some bad interaction between KVM_CLOCK and ntpd it needs documenting in the first instance and preferably also fixing. -- - Athanasius = Athanasius(at)miggy.org / http://www.miggy.org/ Finger athan(at)fysh.org for PGP key "And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence." Paula Cole - ME ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-30 8:08 ` Sebastian Hetze 2010-03-30 16:12 ` Athanasius @ 2010-03-30 17:04 ` Beinicke, Thomas 2010-03-31 19:32 ` Zachary Amsden 1 sibling, 1 reply; 7+ messages in thread From: Beinicke, Thomas @ 2010-03-30 17:04 UTC (permalink / raw) To: Sebastian Hetze; +Cc: kvm@vger.kernel.org On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote: > On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: > > On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: > > > this message appeared in the KVM guest kern.log last night: > > > > > > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable > > > (delta = -4398046474878 ns) > > > > > > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with > > > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. > > > > > > If I understand things correct, in kernel/time/clocksource.c > > > clocksource_watchdog() checks all the > > > /sys/devices/system/clocksource/clocksource0/available_clocksource > > > every 0.5sec for an delta of more than 0.0625s. So the tsc must have > > > changed more than one hour within two subsequent calls of > > > clocksource_watchdog. No event in the host nor anything in the > > > guest gives reasonable cause for this step. > > > > > > However, the number 4398046474878 is only 36226 ns away from > > > 4*1024*1024*1024*1024 > > > > > I didn't see any such messages but I've had a recent experience with > > > > the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in > > two separate incidents. Eerily the exact jumps, as best I can tell from > > logs are of 17592 and 8796 seconds, give or take a second or two. If > > you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 > > nanoseconds. > > > > What I've done that seems to have avoided this happening again is drop > > > > KVM_CLOCK kernel option from the kvm guests' kernel. > > To my understanding, kvm-clock is the best and most reliable clocksource > available, so I do not think it is a good idea to disable it. > > There is a lot of bit shift operation happening with the clocksources, > so there may be a real bug hidden somewhere in the code. > Somehow ntp adjustment is involved, can this cause such huge steps? > Im my case, I actually have NTP running in the guest. However, the > statistics show a pretty stable timing here. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html I am having the same problem occasional. It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce it 100%. It just never occurs on VMs that only serve a few web pages though. I also noticed that on a machine which has this problem even an ssh shell is *very* laggy so it's not just a cosmetic problem. Would removing the hrtimer from the kernel config solve it or is it necessary for KVM? I remember this problem has been posted her before though there wasn't any real conclusion or solution for it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-30 17:04 ` Beinicke, Thomas @ 2010-03-31 19:32 ` Zachary Amsden 2010-03-31 13:09 ` Beinicke, Thomas 0 siblings, 1 reply; 7+ messages in thread From: Zachary Amsden @ 2010-03-31 19:32 UTC (permalink / raw) To: Beinicke, Thomas; +Cc: Sebastian Hetze, kvm@vger.kernel.org On 03/30/10 07:04, Beinicke, Thomas wrote: > On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote: > >> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: >> >>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: >>> >>>> this message appeared in the KVM guest kern.log last night: >>>> >>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable >>>> (delta = -4398046474878 ns) >>>> >>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with >>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. >>>> >>>> If I understand things correct, in kernel/time/clocksource.c >>>> clocksource_watchdog() checks all the >>>> /sys/devices/system/clocksource/clocksource0/available_clocksource >>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have >>>> changed more than one hour within two subsequent calls of >>>> clocksource_watchdog. No event in the host nor anything in the >>>> guest gives reasonable cause for this step. >>>> >>>> However, the number 4398046474878 is only 36226 ns away from >>>> 4*1024*1024*1024*1024 >>>> >>>> >>> I didn't see any such messages but I've had a recent experience with >>> >>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in >>> two separate incidents. Eerily the exact jumps, as best I can tell from >>> logs are of 17592 and 8796 seconds, give or take a second or two. If >>> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 >>> nanoseconds. >>> >>> What I've done that seems to have avoided this happening again is drop >>> >>> KVM_CLOCK kernel option from the kvm guests' kernel. >>> >> To my understanding, kvm-clock is the best and most reliable clocksource >> available, so I do not think it is a good idea to disable it. >> >> There is a lot of bit shift operation happening with the clocksources, >> so there may be a real bug hidden somewhere in the code. >> Somehow ntp adjustment is involved, can this cause such huge steps? >> Im my case, I actually have NTP running in the guest. However, the >> statistics show a pretty stable timing here. >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > I am having the same problem occasional. > It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce > it 100%. It just never occurs on VMs that only serve a few web pages though. > I also noticed that on a machine which has this problem even an ssh shell is > *very* laggy so it's not just a cosmetic problem. > > Would removing the hrtimer from the kernel config solve it or is it necessary > for KVM? > > I remember this problem has been posted her before though there wasn't any > real conclusion or solution for it. > Are you also running a 32-bit kernel? Thanks, Zach ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Clocksource tsc unstable (delta = -4398046474878 ns) 2010-03-31 19:32 ` Zachary Amsden @ 2010-03-31 13:09 ` Beinicke, Thomas 0 siblings, 0 replies; 7+ messages in thread From: Beinicke, Thomas @ 2010-03-31 13:09 UTC (permalink / raw) To: Zachary Amsden; +Cc: kvm@vger.kernel.org On Wednesday 31 March 2010 21:32:18 you wrote: > On 03/30/10 07:04, Beinicke, Thomas wrote: > > On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote: > >> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: > >>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: > >>>> this message appeared in the KVM guest kern.log last night: > >>>> > >>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable > >>>> (delta = -4398046474878 ns) > >>>> > >>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with > >>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. > >>>> > >>>> If I understand things correct, in kernel/time/clocksource.c > >>>> clocksource_watchdog() checks all the > >>>> /sys/devices/system/clocksource/clocksource0/available_clocksource > >>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have > >>>> changed more than one hour within two subsequent calls of > >>>> clocksource_watchdog. No event in the host nor anything in the > >>>> guest gives reasonable cause for this step. > >>>> > >>>> However, the number 4398046474878 is only 36226 ns away from > >>>> 4*1024*1024*1024*1024 > >>>> > >>> I didn't see any such messages but I've had a recent experience with > >>> > >>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in > >>> two separate incidents. Eerily the exact jumps, as best I can tell > >>> from logs are of 17592 and 8796 seconds, give or take a second or two. > >>> If you look at these as nanoseconds then that's 'exactly' 2^44 and > >>> 2^43 nanoseconds. > >>> > >>> What I've done that seems to have avoided this happening again is > >>> drop > >>> > >>> KVM_CLOCK kernel option from the kvm guests' kernel. > >> > >> To my understanding, kvm-clock is the best and most reliable clocksource > >> available, so I do not think it is a good idea to disable it. > >> > >> There is a lot of bit shift operation happening with the clocksources, > >> so there may be a real bug hidden somewhere in the code. > >> Somehow ntp adjustment is involved, can this cause such huge steps? > >> Im my case, I actually have NTP running in the guest. However, the > >> statistics show a pretty stable timing here. > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > I am having the same problem occasional. > > It only occurs if the VM is under heavy IO or CPU Load but I can't > > reproduce it 100%. It just never occurs on VMs that only serve a few web > > pages though. I also noticed that on a machine which has this problem > > even an ssh shell is *very* laggy so it's not just a cosmetic problem. > > > > Would removing the hrtimer from the kernel config solve it or is it > > necessary for KVM? > > > > I remember this problem has been posted her before though there wasn't > > any real conclusion or solution for it. > > Are you also running a 32-bit kernel? I have the problem on 32-bit and 64-bit clients. The host machines are all 64-bit. > Thanks, > > Zach ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-03-31 13:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-28 11:46 Clocksource tsc unstable (delta = -4398046474878 ns) Sebastian Hetze 2010-03-29 10:31 ` Athanasius 2010-03-30 8:08 ` Sebastian Hetze 2010-03-30 16:12 ` Athanasius 2010-03-30 17:04 ` Beinicke, Thomas 2010-03-31 19:32 ` Zachary Amsden 2010-03-31 13:09 ` Beinicke, Thomas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox