Re: Clock jumps

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Clock jumps
       [not found] <loom.20100524T171038-56@post.gmane.org>
@ 2010-05-25  6:21 ` Gleb Natapov
  2010-05-26 17:10   ` Orion Poplawski
  0 siblings, 1 reply; 16+ messages in thread
From: Gleb Natapov @ 2010-05-25  6:21 UTC (permalink / raw)
  To: Orion Poplawski; +Cc: linux-kernel, kvm

Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +0000, Orion Poplawski wrote:
> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
> host.  Is there any debugging I can enable to see what is jumping the clock?
> 
> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
> Switching to clocksource kvm-clock
> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
> 
> Thanks,
> 
>  Orion
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
			Gleb.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-25  6:21 ` Clock jumps Gleb Natapov
@ 2010-05-26 17:10   ` Orion Poplawski
  2010-05-26 17:31     ` Alexander Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Orion Poplawski @ 2010-05-26 17:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: linux-kernel, kvm

On 05/25/2010 12:21 AM, Gleb Natapov wrote:
> Adding kvm to CC.
>
> On Mon, May 24, 2010 at 04:06:32PM +0000, Orion Poplawski wrote:
>> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
>> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
>> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
>> host.  Is there any debugging I can enable to see what is jumping the clock?
>>
>> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
>> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
>> Switching to clocksource kvm-clock
>> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)


Thanks, though I don't think it made it there.  I'm also not sure it's 
completely limited to KVM, though that is the only running system I am 
currently seeing the problem on.  I also see clock jumps during anaconda 
installs on physical hardware and apparently they have been present since at 
least F11.  Might be unrelated though.

I'm really at a loss of how to debug this though.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-26 17:10   ` Orion Poplawski
@ 2010-05-26 17:31     ` Alexander Graf
  2010-05-26 17:50       ` Orion Poplawski
                         ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Alexander Graf @ 2010-05-26 17:31 UTC (permalink / raw)
  To: Orion Poplawski; +Cc: Gleb Natapov, linux-kernel, kvm


On 26.05.2010, at 19:10, Orion Poplawski wrote:

> On 05/25/2010 12:21 AM, Gleb Natapov wrote:
>> Adding kvm to CC.
>> 
>> On Mon, May 24, 2010 at 04:06:32PM +0000, Orion Poplawski wrote:
>>> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
>>> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
>>> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
>>> host.  Is there any debugging I can enable to see what is jumping the clock?
>>> 
>>> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
>>> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
>>> Switching to clocksource kvm-clock
>>> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
> 
> 
> Thanks, though I don't think it made it there.  I'm also not sure it's completely limited to KVM, though that is the only running system I am currently seeing the problem on.  I also see clock jumps during anaconda installs on physical hardware and apparently they have been present since at least F11.  Might be unrelated though.
> 
> I'm really at a loss of how to debug this though.

Do you have ntpd running inside the guest? I have a bug report lying around about 2.6.33 with kvm-clock jumping in time when ntpd is used: https://bugzilla.novell.com/show_bug.cgi?id=582260

Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-26 17:31     ` Alexander Graf
@ 2010-05-26 17:50       ` Orion Poplawski
  2010-05-26 22:55       ` Orion Poplawski
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Orion Poplawski @ 2010-05-26 17:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Gleb Natapov, linux-kernel, kvm

On 05/26/2010 11:31 AM, Alexander Graf wrote:
>
> Do you have ntpd running inside the guest? I have a bug report lying around about 2.6.33 with kvm-clock jumping in time when ntpd is used: https://bugzilla.novell.com/show_bug.cgi?id=582260
>
> Alex
>

I've used ntpd and chronyd.  I haven't tried running without either.  I'll do 
that now...


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-26 17:31     ` Alexander Graf
  2010-05-26 17:50       ` Orion Poplawski
@ 2010-05-26 22:55       ` Orion Poplawski
  2010-05-27 18:32       ` Bernhard Schmidt
  2010-06-02 22:54       ` Orion Poplawski
  3 siblings, 0 replies; 16+ messages in thread
From: Orion Poplawski @ 2010-05-26 22:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Gleb Natapov, linux-kernel, kvm

On 05/26/2010 11:31 AM, Alexander Graf wrote:
>
> On 26.05.2010, at 19:10, Orion Poplawski wrote:
>
>> On 05/25/2010 12:21 AM, Gleb Natapov wrote:
>>> Adding kvm to CC.
>>>
>>> On Mon, May 24, 2010 at 04:06:32PM +0000, Orion Poplawski wrote:
>>>> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
>>>> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
>>>> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
>>>> host.  Is there any debugging I can enable to see what is jumping the clock?
>>>>
>>>> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
>>>> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
>>>> Switching to clocksource kvm-clock
>>>> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
>>
>>
>> Thanks, though I don't think it made it there.  I'm also not sure it's completely limited to KVM, though that is the only running system I am currently seeing the problem on.  I also see clock jumps during anaconda installs on physical hardware and apparently they have been present since at least F11.  Might be unrelated though.
>>
>> I'm really at a loss of how to debug this though.
>
> Do you have ntpd running inside the guest? I have a bug report lying around about 2.6.33 with kvm-clock jumping in time when ntpd is used: https://bugzilla.novell.com/show_bug.cgi?id=582260
>
> Alex
>

That bug looks just like what I'm seeing.  I even see the soft lockup messages 
sometimes as well.  May actually be seeing it with a Fedora 12 guest as well - 
but it results in a hard hang.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-26 17:31     ` Alexander Graf
  2010-05-26 17:50       ` Orion Poplawski
  2010-05-26 22:55       ` Orion Poplawski
@ 2010-05-27 18:32       ` Bernhard Schmidt
  2010-05-27 19:08         ` john stultz
  2010-05-27 21:53         ` Zachary Amsden
  2010-06-02 22:54       ` Orion Poplawski
  3 siblings, 2 replies; 16+ messages in thread
From: Bernhard Schmidt @ 2010-05-27 18:32 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

Alexander Graf <agraf@suse.de> wrote:

Hi,

> Do you have ntpd running inside the guest? I have a bug report lying
> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
> https://bugzilla.novell.com/show_bug.cgi?id=582260

I want to chime in here, I have a very similar problem, but not with
ntpd in the guest.

The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
was a Debian Lenny with a custom 2.6.33 host kernel and a custom
qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.

The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
(2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
kernels, but one system has i386 userland.

With this setup once in a while (maybe every other week) one VM would
have a sudden clock jump, 6-12 hours into the future. No kernel messages
or other log entries than applications complaining about the clock jump
after the fact. Other VMs were unaffected.

Yesterday I did an upgrade to Debian Squeeze. This involved a new
qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
kernels from 2.6.33 to 2.6.33.4.

First of all, after the reboot the host clock was totally unreliable. I
had a constant skew of up to five seconds per minute in the host clock,
which of course affected the VMs as well.  This problem went away when I
changed from tsc into hpet on the host. The host does CPU frequency
scaling which is, as far as I know, known to affect TSC stability. I
think I remember messages about tsc being unstable in earlier boots,
maybe the detection did just not work this time.

Worse, the clock jump issues in the guest appeared much more often than
before. The higher loaded VMs did not survive ten minutes without
jumping several hours ahead. 

Situation has stabilized after setting clocksource hpet in the guest
immediately after boot. So it seems kvm-clock has some issues here.

I've seen a preliminary patch floating around on the ML by Zachary
Amsden, but I haven't tried it yet. It talks of backward warps, but so
far I've only seen forward warps (the VM time suddenly jumps into the 
future), so it might be unrelated.

Bernhard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 18:32       ` Bernhard Schmidt
@ 2010-05-27 19:08         ` john stultz
  2010-05-27 21:48           ` Bernhard Schmidt
  2010-05-27 21:53         ` Zachary Amsden
  1 sibling, 1 reply; 16+ messages in thread
From: john stultz @ 2010-05-27 19:08 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: linux-kernel, kvm

On Thu, May 27, 2010 at 11:32 AM, Bernhard Schmidt <berni@birkenwald.de> wrote:
> Alexander Graf <agraf@suse.de> wrote:
>> Do you have ntpd running inside the guest? I have a bug report lying
>> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
>> https://bugzilla.novell.com/show_bug.cgi?id=582260
>
> I want to chime in here, I have a very similar problem, but not with
> ntpd in the guest.
>
> The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
> was a Debian Lenny with a custom 2.6.33 host kernel and a custom
> qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.
>
> The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
> (2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
> kernels, but one system has i386 userland.
>
> With this setup once in a while (maybe every other week) one VM would
> have a sudden clock jump, 6-12 hours into the future. No kernel messages
> or other log entries than applications complaining about the clock jump
> after the fact. Other VMs were unaffected.
>
> Yesterday I did an upgrade to Debian Squeeze. This involved a new
> qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
> kernels from 2.6.33 to 2.6.33.4.
>
> First of all, after the reboot the host clock was totally unreliable. I
> had a constant skew of up to five seconds per minute in the host clock,
> which of course affected the VMs as well.  This problem went away when I
> changed from tsc into hpet on the host. The host does CPU frequency
> scaling which is, as far as I know, known to affect TSC stability. I
> think I remember messages about tsc being unstable in earlier boots,
> maybe the detection did just not work this time.

I'd be very interested in hearing more about the host side issue. So
this happened with the same kernel that you were using before, with no
trouble?

Could you also send dmesg output from this boot? And if you can find
any older dmesg logs to compare with, send those too?

thanks
-john

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 19:08         ` john stultz
@ 2010-05-27 21:48           ` Bernhard Schmidt
  2010-05-28  0:00             ` john stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Bernhard Schmidt @ 2010-05-27 21:48 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, kvm

On 27.05.2010 21:08, john stultz wrote:

Hi John,

> I'd be very interested in hearing more about the host side issue. So
> this happened with the same kernel that you were using before, with no
> trouble?

Correct.

> Could you also send dmesg output from this boot? And if you can find
> any older dmesg logs to compare with, send those too?

See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
on the same kernel binary the initrd changed greatly when upgrading, so 
ordering/timing between those two is off.

Note that the dmesg output is captured right after boot. I think I 
remember seeing a "TSC unstable" message pretty soon after boot, but I 
might be mixing it up with my other AMD-based KVM server. I don't hold 
normal (non-boot) logs that long, so I can't tell for sure.

If you need any more info feel free to contact me.

Bernhard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 18:32       ` Bernhard Schmidt
  2010-05-27 19:08         ` john stultz
@ 2010-05-27 21:53         ` Zachary Amsden
  2010-05-27 22:12           ` Bernhard Schmidt
  1 sibling, 1 reply; 16+ messages in thread
From: Zachary Amsden @ 2010-05-27 21:53 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: kvm, linux-kernel

On 05/27/2010 08:32 AM, Bernhard Schmidt wrote:
> Alexander Graf<agraf@suse.de>  wrote:
>
> Hi,
>
>    
>> Do you have ntpd running inside the guest? I have a bug report lying
>> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
>> https://bugzilla.novell.com/show_bug.cgi?id=582260
>>      
> I want to chime in here, I have a very similar problem, but not with
> ntpd in the guest.
>
> The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
> was a Debian Lenny with a custom 2.6.33 host kernel and a custom
> qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.
>
> The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
> (2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
> kernels, but one system has i386 userland.
>
> With this setup once in a while (maybe every other week) one VM would
> have a sudden clock jump, 6-12 hours into the future. No kernel messages
> or other log entries than applications complaining about the clock jump
> after the fact. Other VMs were unaffected.
>
> Yesterday I did an upgrade to Debian Squeeze. This involved a new
> qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
> kernels from 2.6.33 to 2.6.33.4.
>
> First of all, after the reboot the host clock was totally unreliable. I
> had a constant skew of up to five seconds per minute in the host clock,
> which of course affected the VMs as well.  This problem went away when I
> changed from tsc into hpet on the host. The host does CPU frequency
> scaling which is, as far as I know, known to affect TSC stability. I
> think I remember messages about tsc being unstable in earlier boots,
> maybe the detection did just not work this time.
>
> Worse, the clock jump issues in the guest appeared much more often than
> before. The higher loaded VMs did not survive ten minutes without
> jumping several hours ahead.
>
> Situation has stabilized after setting clocksource hpet in the guest
> immediately after boot. So it seems kvm-clock has some issues here.
>
> I've seen a preliminary patch floating around on the ML by Zachary
> Amsden, but I haven't tried it yet. It talks of backward warps, but so
> far I've only seen forward warps (the VM time suddenly jumps into the
> future), so it might be unrelated.
>    

I have an AMD Turion TL-52 machine with unreliable TSC.  It varies with 
CPU frequency, which is okay, we can compensate for that, but worse, it 
has broken clocking when in C1E idle.  Apparently, it divides down the 
clock input to an idle core, so it only runs at 1/16 or whatever of the 
rate, and adds a multiplier to the TSC increment, so it scales by 16 
instead of by 1 (whatever the actual numbers are I don't know, but this 
illustrates the point).  When it wakes up to service a cache probe from 
another core, it now runs with full clock rate ... and still uses the 
multiplier for the TSC increment.

The effect is that idle CPUs have TSC which may increase faster than 
running CPUs.  Given time, this delta can add to a very large number (in 
theory, it's a random walk, but it can go very far off).  If a VM is 
running on this CPU and happens to match the idle pattern without 
switching CPUs, time can effectively run accelerated on that VM, and 
very rapidly things are going to get confused.

Newer kernels should detect the host clock being unreliable quite 
quickly; my F13 machine detects it right away at boot.

I have server side fixes for this kvm-clock which seem to give me a 
stable clock on this machine, but for true SMP stability, you will need 
Glauber's guest side changes to kvmclock as well.  It is impossible to 
guarantee strictly monotonic clocksource across multiple CPUs when 
frequency is dynamically changing (and also because of the C1E idle 
problems).

There is one remaining problem to fix, the reset of TSC on reboot in SMP 
will destabilize the TSCs again, but now I've actually got VMs running 
again (different bug), that shouldn't be long.

Zach

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 21:53         ` Zachary Amsden
@ 2010-05-27 22:12           ` Bernhard Schmidt
  2010-05-27 22:20             ` Zachary Amsden
  2010-05-27 22:22             ` Zachary Amsden
  0 siblings, 2 replies; 16+ messages in thread
From: Bernhard Schmidt @ 2010-05-27 22:12 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, linux-kernel

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,

> I have server side fixes for this kvm-clock which seem to give me a
> stable clock on this machine, but for true SMP stability, you will need
> Glauber's guest side changes to kvmclock as well. It is impossible to
> guarantee strictly monotonic clocksource across multiple CPUs when
> frequency is dynamically changing (and also because of the C1E idle
> problems).

Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using kvm-clock.

Anyway, can you send me both patches? I'd like to try it, but I have 
completely lost track where the up-to-date patches are.

Bernhard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 22:12           ` Bernhard Schmidt
@ 2010-05-27 22:20             ` Zachary Amsden
  2010-05-27 22:22             ` Zachary Amsden
  1 sibling, 0 replies; 16+ messages in thread
From: Zachary Amsden @ 2010-05-27 22:20 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: kvm, linux-kernel

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:
> On 27.05.2010 23:53, Zachary Amsden wrote:
>
> Hello Zachary,
>
>> I have server side fixes for this kvm-clock which seem to give me a
>> stable clock on this machine, but for true SMP stability, you will need
>> Glauber's guest side changes to kvmclock as well. It is impossible to
>> guarantee strictly monotonic clocksource across multiple CPUs when
>> frequency is dynamically changing (and also because of the C1E idle
>> problems).
>
> Is all this relevant only when the host is on TSC? Because I have seen 
> these jumps when the host was on HPET and the guests were using 
> kvm-clock.
>
> Anyway, can you send me both patches? I'd like to try it, but I have 
> completely lost track where the up-to-date patches are.

There's more than two, there's quite a bit, I'll send it to the list soon.

Zach

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 22:12           ` Bernhard Schmidt
  2010-05-27 22:20             ` Zachary Amsden
@ 2010-05-27 22:22             ` Zachary Amsden
  1 sibling, 0 replies; 16+ messages in thread
From: Zachary Amsden @ 2010-05-27 22:22 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: kvm, linux-kernel

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:
> On 27.05.2010 23:53, Zachary Amsden wrote:
>
> Hello Zachary,
>
>> I have server side fixes for this kvm-clock which seem to give me a
>> stable clock on this machine, but for true SMP stability, you will need
>> Glauber's guest side changes to kvmclock as well. It is impossible to
>> guarantee strictly monotonic clocksource across multiple CPUs when
>> frequency is dynamically changing (and also because of the C1E idle
>> problems).
>
> Is all this relevant only when the host is on TSC? Because I have seen 
> these jumps when the host was on HPET and the guests were using 
> kvm-clock.

It doesn't matter what the host uses (although the host on TSC with 
unstable TSC can make things worse), tsc and kvmclock sources in the 
guest will be unstable regardless.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-27 21:48           ` Bernhard Schmidt
@ 2010-05-28  0:00             ` john stultz
  2010-05-28  0:33               ` Bernhard Schmidt
  0 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2010-05-28  0:00 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: linux-kernel, kvm, Thomas Gleixner, Ingo Molnar

On Thu, 2010-05-27 at 23:48 +0200, Bernhard Schmidt wrote:
> On 27.05.2010 21:08, john stultz wrote:
> > I'd be very interested in hearing more about the host side issue. So
> > this happened with the same kernel that you were using before, with no
> > trouble?
> 
> Correct.
> 
> > Could you also send dmesg output from this boot? And if you can find
> > any older dmesg logs to compare with, send those too?
> 
> See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
> http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
> on the same kernel binary the initrd changed greatly when upgrading, so 
> ordering/timing between those two is off.
> 
> Note that the dmesg output is captured right after boot. I think I 
> remember seeing a "TSC unstable" message pretty soon after boot, but I 
> might be mixing it up with my other AMD-based KVM server. I don't hold 
> normal (non-boot) logs that long, so I can't tell for sure.

Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.000000000 -0700
+++ dmesg-squeeze       2010-05-27 16:46:14.000000000 -0700
@@ -132,8 +132,8 @@
 console [ttyS1] enabled
 hpet clockevent registered
 Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
 Security Framework initialized
 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
 Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
 CPU0: Intel(R) Xeon(R) CPU            3075  @ 2.66GHz stepping 0b
 Booting Node   0, Processors  #1
 Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the  "Detected
xxxx.yyy MHz processor." line varies by across a few other boots (if
they still exist?).

Ingo/Thomas: Any thoughts, should we be considering dropping the
quick_pit_calibrate() code and always do the slower route?

thanks
-john

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-28  0:00             ` john stultz
@ 2010-05-28  0:33               ` Bernhard Schmidt
  2010-05-28  0:46                 ` john stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Bernhard Schmidt @ 2010-05-28  0:33 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, kvm, Thomas Gleixner, Ingo Molnar

On 28.05.2010 02:00, john stultz wrote:

Hi John,

> Looking at the diff:
> --- dmesg-lenny 2010-05-27 16:45:33.000000000 -0700
> +++ dmesg-squeeze       2010-05-27 16:46:14.000000000 -0700
> @@ -132,8 +132,8 @@
>   console [ttyS1] enabled
>   hpet clockevent registered
>   Fast TSC calibration using PIT
> -Detected 2660.398 MHz processor.
> -Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
> +Detected 2613.324 MHz processor.
> +Calibrating delay loop (skipped), value calculated using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
>   Security Framework initialized
>   Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
>   Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> @@ -160,7 +160,7 @@
>   CPU0: Intel(R) Xeon(R) CPU            3075  @ 2.66GHz stepping 0b
>   Booting Node   0, Processors  #1
>   Brought up 2 CPUs
> -Total of 2 processors activated (10640.79 BogoMIPS).
> +Total of 2 processors activated (10546.63 BogoMIPS).
>   NET: Registered protocol family 16
>   ACPI: bus type pci registered
>   PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
>
> So you can see in the above the during the second boot the TSC
> calibration was badly mis-calculated. This was the cause of the skew.
>
> Not sure how that might be linked to the distro upgrade. It could have
> been something like SMI damage during the calibration time, but I
> thought the calibration loop watched for that.
>
> Bernhard: I expect with all those vms, this machine isn't rebooted
> frequently. So could you look at the logs to see how much the  "Detected
> xxxx.yyy MHz processor." line varies by across a few other boots (if
> they still exist?).

Correct, the box isn't rebooted often, but I do have a few dmesg outputs 
laying around. lpj was always almost the same until the very last boot 
which screwed up the clock.

dmesg:[    0.000000] Linux version 2.6.33 (root@svr02) (gcc version 
4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
dmesg:[    0.008005] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
dmesg:[    0.288002] Total of 2 processors activated (10546.63 BogoMIPS).
dmesg.0:[    0.000000] Linux version 2.6.33 (root@svr02) (gcc version 
4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
dmesg.0:[    0.008005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
dmesg.0:[    0.274022] Total of 2 processors activated (10640.79 BogoMIPS).
dmesg.1.gz:[    0.000000] Linux version 2.6.32-rc7 (root@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.1.gz:[    0.012004] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.06 BogoMIPS (lpj=10638120)
dmesg.1.gz:[    0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639980)
dmesg.1.gz:[    0.260003] Total of 2 processors activated (10639.05 
BogoMIPS).
dmesg.2.gz:[    0.000000] Linux version 2.6.32-rc7 (root@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.2.gz:[    0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.35 BogoMIPS (lpj=10638712)
dmesg.2.gz:[    0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639990)
dmesg.2.gz:[    0.261567] Total of 2 processors activated (10639.35 
BogoMIPS).
dmesg.3.gz:[    0.000000] Linux version 2.6.32-rc7 (root@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.3.gz:[    0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.97 BogoMIPS (lpj=10639956)
dmesg.3.gz:[    0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639987)
dmesg.3.gz:[    0.257152] Total of 2 processors activated (10639.97 
BogoMIPS).
dmesg.4.gz:[    0.000000] Linux version 2.6.32-rc7 (root@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.4.gz:[    0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.84 BogoMIPS (lpj=10639688)
dmesg.4.gz:[    0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639993)
dmesg.4.gz:[    0.253571] Total of 2 processors activated (10639.84 
BogoMIPS).

If necessary I can reboot once more, but I'd like to avoid it.

Bernhard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-28  0:33               ` Bernhard Schmidt
@ 2010-05-28  0:46                 ` john stultz
  0 siblings, 0 replies; 16+ messages in thread
From: john stultz @ 2010-05-28  0:46 UTC (permalink / raw)
  To: Bernhard Schmidt; +Cc: linux-kernel, kvm, Thomas Gleixner, Ingo Molnar

On Fri, 2010-05-28 at 02:33 +0200, Bernhard Schmidt wrote:
> On 28.05.2010 02:00, john stultz wrote:
> > Looking at the diff:
> > --- dmesg-lenny 2010-05-27 16:45:33.000000000 -0700
> > +++ dmesg-squeeze       2010-05-27 16:46:14.000000000 -0700
> > @@ -132,8 +132,8 @@
> >   console [ttyS1] enabled
> >   hpet clockevent registered
> >   Fast TSC calibration using PIT
> > -Detected 2660.398 MHz processor.
> > -Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
> > +Detected 2613.324 MHz processor.
> > +Calibrating delay loop (skipped), value calculated using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
> >   Security Framework initialized
> >   Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> >   Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > @@ -160,7 +160,7 @@
> >   CPU0: Intel(R) Xeon(R) CPU            3075  @ 2.66GHz stepping 0b
> >   Booting Node   0, Processors  #1
> >   Brought up 2 CPUs
> > -Total of 2 processors activated (10640.79 BogoMIPS).
> > +Total of 2 processors activated (10546.63 BogoMIPS).
> >   NET: Registered protocol family 16
> >   ACPI: bus type pci registered
> >   PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
> >
> > So you can see in the above the during the second boot the TSC
> > calibration was badly mis-calculated. This was the cause of the skew.
> >
> > Not sure how that might be linked to the distro upgrade. It could have
> > been something like SMI damage during the calibration time, but I
> > thought the calibration loop watched for that.
> >
> > Bernhard: I expect with all those vms, this machine isn't rebooted
> > frequently. So could you look at the logs to see how much the  "Detected
> > xxxx.yyy MHz processor." line varies by across a few other boots (if
> > they still exist?).
> 
> Correct, the box isn't rebooted often, but I do have a few dmesg outputs 
> laying around. lpj was always almost the same until the very last boot 
> which screwed up the clock.
> 
> dmesg:[    0.000000] Linux version 2.6.33 (root@svr02) (gcc version 
> 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
> dmesg:[    0.008005] Calibrating delay loop (skipped), value calculated 
> using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
> dmesg:[    0.288002] Total of 2 processors activated (10546.63 BogoMIPS).

Yea. The bogomips/loops per jiffies are actually calculated with a
different chunk of code (although its interesting it miscalculated in
both cases).

Could you send the "Detected xxxx.yyy MHz processor." lines as well?

thanks
-john



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Clock jumps
  2010-05-26 17:31     ` Alexander Graf
                         ` (2 preceding siblings ...)
  2010-05-27 18:32       ` Bernhard Schmidt
@ 2010-06-02 22:54       ` Orion Poplawski
  3 siblings, 0 replies; 16+ messages in thread
From: Orion Poplawski @ 2010-06-02 22:54 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Gleb Natapov, linux-kernel, kvm

On 05/26/2010 11:31 AM, Alexander Graf wrote:
>
> On 26.05.2010, at 19:10, Orion Poplawski wrote:
>
>> On 05/25/2010 12:21 AM, Gleb Natapov wrote:
>>> Adding kvm to CC.
>>>
>>> On Mon, May 24, 2010 at 04:06:32PM +0000, Orion Poplawski wrote:
>>>> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
>>>> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
>>>> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
>>>> host.  Is there any debugging I can enable to see what is jumping the clock?
>>>>
>>>> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
>>>> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
>>>> Switching to clocksource kvm-clock
>>>> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
>>
>>
>> Thanks, though I don't think it made it there.  I'm also not sure it's completely limited to KVM, though that is the only running system I am currently seeing the problem on.  I also see clock jumps during anaconda installs on physical hardware and apparently they have been present since at least F11.  Might be unrelated though.
>>
>> I'm really at a loss of how to debug this though.
>
> Do you have ntpd running inside the guest? I have a bug report lying around about 2.6.33 with kvm-clock jumping in time when ntpd is used: https://bugzilla.novell.com/show_bug.cgi?id=582260
>
> Alex
>

Turning off ntpd and chronyd did not help for me.


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-06-02 22:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <loom.20100524T171038-56@post.gmane.org>
2010-05-25  6:21 ` Clock jumps Gleb Natapov
2010-05-26 17:10   ` Orion Poplawski
2010-05-26 17:31     ` Alexander Graf
2010-05-26 17:50       ` Orion Poplawski
2010-05-26 22:55       ` Orion Poplawski
2010-05-27 18:32       ` Bernhard Schmidt
2010-05-27 19:08         ` john stultz
2010-05-27 21:48           ` Bernhard Schmidt
2010-05-28  0:00             ` john stultz
2010-05-28  0:33               ` Bernhard Schmidt
2010-05-28  0:46                 ` john stultz
2010-05-27 21:53         ` Zachary Amsden
2010-05-27 22:12           ` Bernhard Schmidt
2010-05-27 22:20             ` Zachary Amsden
2010-05-27 22:22             ` Zachary Amsden
2010-06-02 22:54       ` Orion Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).