From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: Clocksource tsc unstable (delta = -4398046474878 ns) Date: Wed, 31 Mar 2010 09:32:18 -1000 Message-ID: <4BB3A342.5070201@redhat.com> References: <20100328114635.401C730301D3@mail.linux-ag.de> <20100329103113.GP3910@miggy.org> <20100330080828.A92003030135@mail.linux-ag.de> <201003301904.21536.thomas.beinicke@fsd-web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Sebastian Hetze , "kvm@vger.kernel.org" To: "Beinicke, Thomas" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:37060 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751409Ab0C3Tdg (ORCPT ); Tue, 30 Mar 2010 15:33:36 -0400 In-Reply-To: <201003301904.21536.thomas.beinicke@fsd-web.de> Sender: kvm-owner@vger.kernel.org List-ID: On 03/30/10 07:04, Beinicke, Thomas wrote: > On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote: > >> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote: >> >>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: >>> >>>> this message appeared in the KVM guest kern.log last night: >>>> >>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable >>>> (delta = -4398046474878 ns) >>>> >>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with >>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. >>>> >>>> If I understand things correct, in kernel/time/clocksource.c >>>> clocksource_watchdog() checks all the >>>> /sys/devices/system/clocksource/clocksource0/available_clocksource >>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have >>>> changed more than one hour within two subsequent calls of >>>> clocksource_watchdog. No event in the host nor anything in the >>>> guest gives reasonable cause for this step. >>>> >>>> However, the number 4398046474878 is only 36226 ns away from >>>> 4*1024*1024*1024*1024 >>>> >>>> >>> I didn't see any such messages but I've had a recent experience with >>> >>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in >>> two separate incidents. Eerily the exact jumps, as best I can tell from >>> logs are of 17592 and 8796 seconds, give or take a second or two. If >>> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 >>> nanoseconds. >>> >>> What I've done that seems to have avoided this happening again is drop >>> >>> KVM_CLOCK kernel option from the kvm guests' kernel. >>> >> To my understanding, kvm-clock is the best and most reliable clocksource >> available, so I do not think it is a good idea to disable it. >> >> There is a lot of bit shift operation happening with the clocksources, >> so there may be a real bug hidden somewhere in the code. >> Somehow ntp adjustment is involved, can this cause such huge steps? >> Im my case, I actually have NTP running in the guest. However, the >> statistics show a pretty stable timing here. >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > I am having the same problem occasional. > It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce > it 100%. It just never occurs on VMs that only serve a few web pages though. > I also noticed that on a machine which has this problem even an ssh shell is > *very* laggy so it's not just a cosmetic problem. > > Would removing the hrtimer from the kernel config solve it or is it necessary > for KVM? > > I remember this problem has been posted her before though there wasn't any > real conclusion or solution for it. > Are you also running a 32-bit kernel? Thanks, Zach