From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zachary Amsden <zamsden@redhat.com>
Subject: Re: Clocksource tsc unstable (delta = -4398046474878 ns)
Date: Wed, 31 Mar 2010 09:32:18 -1000
Message-ID: <4BB3A342.5070201@redhat.com>
References: <20100328114635.401C730301D3@mail.linux-ag.de> <20100329103113.GP3910@miggy.org> <20100330080828.A92003030135@mail.linux-ag.de> <201003301904.21536.thomas.beinicke@fsd-web.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Sebastian Hetze <s.hetze@linux-ag.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
To: "Beinicke, Thomas" <thomas.beinicke@fsd-web.de>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:37060 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751409Ab0C3Tdg (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 30 Mar 2010 15:33:36 -0400
In-Reply-To: <201003301904.21536.thomas.beinicke@fsd-web.de>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 03/30/10 07:04, Beinicke, Thomas wrote:
> On Tuesday 30 March 2010 10:08:28 Sebastian Hetze wrote:
>   
>> On Mon, Mar 29, 2010 at 11:31:13AM +0100, Athanasius wrote:
>>     
>>> On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote:
>>>       
>>>> this message appeared in the KVM guest kern.log last night:
>>>>
>>>> Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable
>>>> (delta = -4398046474878 ns)
>>>>
>>>> The guest is running a 2.6.31-20-generic-pae ubuntu kernel with
>>>> hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied.
>>>>
>>>> If I understand things correct, in kernel/time/clocksource.c
>>>> clocksource_watchdog() checks all the
>>>> /sys/devices/system/clocksource/clocksource0/available_clocksource
>>>> every 0.5sec for an delta of more than 0.0625s. So the tsc must have
>>>> changed more than one hour within two subsequent calls of
>>>> clocksource_watchdog. No event in the host nor anything in the
>>>> guest gives reasonable cause for this step.
>>>>
>>>> However, the number 4398046474878 is only 36226 ns away from
>>>> 4*1024*1024*1024*1024
>>>>
>>>>         
>>>   I didn't see any such messages but I've had a recent experience with
>>>
>>> the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in
>>> two separate incidents.  Eerily the exact jumps, as best I can tell from
>>> logs are of 17592 and 8796 seconds, give or take a second or two.  If
>>> you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43
>>> nanoseconds.
>>>
>>>   What I've done that seems to have avoided this happening again is drop
>>>
>>> KVM_CLOCK kernel option from the kvm guests' kernel.
>>>       
>> To my understanding, kvm-clock is the best and most reliable clocksource
>> available, so I do not think it is a good idea to disable it.
>>
>> There is a lot of bit shift operation happening with the clocksources,
>> so there may be a real bug hidden somewhere in the code.
>> Somehow ntp adjustment is involved, can this cause such huge steps?
>> Im my case, I actually have NTP running in the guest. However, the
>> statistics show a pretty stable timing here.
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>     
> I am having the same problem occasional.
> It only occurs if the VM is under heavy IO or CPU Load but I can't reproduce 
> it 100%. It just never occurs on VMs that only serve a few web pages though.
> I also noticed that on a machine which has this problem even an ssh shell is 
> *very* laggy so it's not just a cosmetic problem.
>
> Would removing the hrtimer from the kernel config solve it or is it necessary 
> for KVM?
>
> I remember this problem has been posted her before though there wasn't any 
> real conclusion or solution for it.
>   

Are you also running a 32-bit kernel?

Thanks,

Zach