public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Timedrift in KVM guests after livemigration.
@ 2010-04-15  7:35 Espen Berg
  2010-04-17 19:52 ` Espen Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Espen Berg @ 2010-04-15  7:35 UTC (permalink / raw)
  To: kvm

We have three KVM hosts that supports live-migration between them, but 
one of our problems is time drifting.  The three frontends has different 
CPU frequency and the KVM guests adopt the frequency from the host 
machine where it was first started.

Host1: cat /proc/cpuinfo
model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
cpu MHz         : 2394.048

Host2: cat /proc/cpuinfo
model name      : Intel(R) Core(TM)2 CPU          6700  @ 2.66GHz
cpu MHz         : 2659.685

Host3: cat /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cpu MHz         : 2327.507


virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.11.0

Is there any solution to our problems, or is a reboot the only safe 
solution?

Regards
Espen




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-15  7:35 Timedrift in KVM guests after livemigration Espen Berg
@ 2010-04-17 19:52 ` Espen Berg
  2010-04-17 20:17   ` Michael Tokarev
  0 siblings, 1 reply; 11+ messages in thread
From: Espen Berg @ 2010-04-17 19:52 UTC (permalink / raw)
  To: kvm

Den 15.04.2010 09:35, skrev Espen Berg:
> We have three KVM hosts that supports live-migration between them, but
> one of our problems is time drifting. The three frontends has different
> CPU frequency and the KVM guests adopt the frequency from the host
> machine where it was first started.
>
> Host1: cat /proc/cpuinfo
> model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
> cpu MHz : 2394.048
>
> Host2: cat /proc/cpuinfo
> model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz
> cpu MHz : 2659.685
>
> Host3: cat /proc/cpuinfo
> model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
> cpu MHz : 2327.507
>
>
> virsh version
> Compiled against library: libvir 0.7.6
> Using library: libvir 0.7.6
> Using API: QEMU 0.7.6
> Running hypervisor: QEMU 0.11.0
>
> Is there any solution to our problems, or is a reboot the only safe
> solution?

Is there no one with similar problems here? :\  Guess I should file a 
bug report or something if the same problems occur in the latest 
version.  I can't se any changes in change log after 0.11.x that relate 
to this problem.  We can't be the only one that uses different CPUs in a 
migration environment.

Since this is a cluster in production, I'm not able to try the latest 
version either.

Espen.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-17 19:52 ` Espen Berg
@ 2010-04-17 20:17   ` Michael Tokarev
  2010-04-17 23:21     ` Espen Berg
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2010-04-17 20:17 UTC (permalink / raw)
  To: Espen Berg; +Cc: kvm

17.04.2010 23:52, Espen Berg wrote:
> Den 15.04.2010 09:35, skrev Espen Berg:
>> We have three KVM hosts that supports live-migration between them, but
>> one of our problems is time drifting. The three frontends has different
>> CPU frequency and the KVM guests adopt the frequency from the host
>> machine where it was first started.

What do you mean by "adopts" ?  Note that the cpu frequency
means nothing for all the modern operating systems, at least
since the days of common usage of MS-DOS which relied on CPU
frequency for its time functions.  All interesting things are
now done using timers instead, and timers (which don't depend
on CPU frequency again) usually work quite well.

What complicates things is that the most cheap and accurate
enough time source is TSC (time stamp counter register in
the CPU), but it will definitely be different on each
machine.  For that, 0.12.3 kvm and 2.6.32 kernel (I think)
introduced a compensation.  See for example -tdf kvm option.

[]
>> Is there any solution to our problems, or is a reboot the only safe
>> solution?

Well, reboot is definitely a safe solution.

> Is there no one with similar problems here? :\ Guess I should file a bug
> report or something if the same problems occur in the latest version. I
> can't se any changes in change log after 0.11.x that relate to this
> problem. We can't be the only one that uses different CPUs in a
> migration environment.

Actually there is a difference in 0.12.

> Since this is a cluster in production, I'm not able to try the latest
> version either.

Well, that's difficult one, no?  It either works or not.
If you can't try anything else, why to ask? :)

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-17 20:17   ` Michael Tokarev
@ 2010-04-17 23:21     ` Espen Berg
  2010-04-18  9:22       ` Dor Laor
  0 siblings, 1 reply; 11+ messages in thread
From: Espen Berg @ 2010-04-17 23:21 UTC (permalink / raw)
  To: kvm

Den 17.04.2010 22:17, skrev Michael Tokarev:
>>> We have three KVM hosts that supports live-migration between them, but
>>> one of our problems is time drifting. The three frontends has different
>>> CPU frequency and the KVM guests adopt the frequency from the host
>>> machine where it was first started.
> What do you mean by "adopts" ? Note that the cpu frequency
> means nothing for all the modern operating systems, at least
> since the days of common usage of MS-DOS which relied on CPU
> frequency for its time functions. All interesting things are
> now done using timers instead, and timers (which don't depend
> on CPU frequency again) usually work quite well.

The assumption that frequency of the ticks was calculated by the hosts 
MHz, was based on the fact that grater clock frequency differences 
caused  higher time drift.  60 MHz difference caused about 24min drift, 
332 MHz difference caused about 2h25min drift.


> What complicates things is that the most cheap and accurate
> enough time source is TSC (time stamp counter register in
> the CPU), but it will definitely be different on each
> machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
> introduced a compensation. See for example -tdf kvm option.

Ah, nice to know. :)

>> Since this is a cluster in production, I'm not able to try the latest
>> version either.
> Well, that's difficult one, no? It either works or not.
> If you can't try anything else, why to ask? :)

What I tried to say was that there are many important virtual servers 
running on this cluster at the moment, so "trial by error" was not an 
option.  The last time we tried 0.12.x (during the initial tests of the 
cluster) there where a lot of stability issues, crashes during migration 
etc.

Regards, Espen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-17 23:21     ` Espen Berg
@ 2010-04-18  9:22       ` Dor Laor
  2010-04-18  9:33         ` Espen Berg
  2010-04-18  9:56         ` Gleb Natapov
  0 siblings, 2 replies; 11+ messages in thread
From: Dor Laor @ 2010-04-18  9:22 UTC (permalink / raw)
  To: Espen Berg; +Cc: kvm

On 04/18/2010 02:21 AM, Espen Berg wrote:
> Den 17.04.2010 22:17, skrev Michael Tokarev:
>>>> We have three KVM hosts that supports live-migration between them, but
>>>> one of our problems is time drifting. The three frontends has different
>>>> CPU frequency and the KVM guests adopt the frequency from the host
>>>> machine where it was first started.
>> What do you mean by "adopts" ? Note that the cpu frequency
>> means nothing for all the modern operating systems, at least
>> since the days of common usage of MS-DOS which relied on CPU
>> frequency for its time functions. All interesting things are
>> now done using timers instead, and timers (which don't depend
>> on CPU frequency again) usually work quite well.
>
> The assumption that frequency of the ticks was calculated by the hosts
> MHz, was based on the fact that grater clock frequency differences
> caused higher time drift. 60 MHz difference caused about 24min drift,
> 332 MHz difference caused about 2h25min drift.
>
>
>> What complicates things is that the most cheap and accurate
>> enough time source is TSC (time stamp counter register in
>> the CPU), but it will definitely be different on each
>> machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
>> introduced a compensation. See for example -tdf kvm option.
>
> Ah, nice to know. :)

That's two different things here:
The issue that Espen is reporting is that the hosts have different 
frequency and guests that relay on the tsc as a source clock will notice 
that post migration. The is indeed a problem that -tdf does not solve. 
-tdf only adds compensation for the RTC clock emulation.

What's the guest type and what's the guest's source clock?
Using tsc directly as a source clock is not recommended because of this 
migration issue (that is not solveable until we trap every rdtsc by the 
guest). Using pv kvmclock in Linux mitigates this issue since it exposes 
both the tsc and the host clock so guests can adjust themselves.

Several months ago a pvclock migration fix was added to pass the pvclock 
MSRs reading to the destination: 1a03675db146dfc760b3b48b3448075189f142cc


>
>>> Since this is a cluster in production, I'm not able to try the latest
>>> version either.
>> Well, that's difficult one, no? It either works or not.
>> If you can't try anything else, why to ask? :)
>
> What I tried to say was that there are many important virtual servers
> running on this cluster at the moment, so "trial by error" was not an
> option. The last time we tried 0.12.x (during the initial tests of the
> cluster) there where a lot of stability issues, crashes during migration
> etc.
>
> Regards, Espen
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-18  9:22       ` Dor Laor
@ 2010-04-18  9:33         ` Espen Berg
  2010-04-22  7:40           ` Thomas Treutner
  2010-04-18  9:56         ` Gleb Natapov
  1 sibling, 1 reply; 11+ messages in thread
From: Espen Berg @ 2010-04-18  9:33 UTC (permalink / raw)
  To: kvm

Den 18.04.2010 11:22, skrev Dor Laor:
>>> What do you mean by "adopts" ? Note that the cpu frequency
>>> means nothing for all the modern operating systems, at least
>>> since the days of common usage of MS-DOS which relied on CPU
>>> frequency for its time functions. All interesting things are
>>> now done using timers instead, and timers (which don't depend
>>> on CPU frequency again) usually work quite well.
>> The assumption that frequency of the ticks was calculated by the hosts
>> MHz, was based on the fact that grater clock frequency differences
>> caused higher time drift. 60 MHz difference caused about 24min drift,
>> 332 MHz difference caused about 2h25min drift.
>>> What complicates things is that the most cheap and accurate
>>> enough time source is TSC (time stamp counter register in
>>> the CPU), but it will definitely be different on each
>>> machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
>>> introduced a compensation. See for example -tdf kvm option.
>> Ah, nice to know. :)
> That's two different things here:
> The issue that Espen is reporting is that the hosts have different
> frequency and guests that relay on the tsc as a source clock will notice
> that post migration. The is indeed a problem that -tdf does not solve.
> -tdf only adds compensation for the RTC clock emulation.
>
> What's the guest type and what's the guest's source clock?

All guest are Debian lenny with latest upstream kernel, hvm/kvm.

We are using kvm-clock as guest source clock.

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock


Regards
Espen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-18  9:22       ` Dor Laor
  2010-04-18  9:33         ` Espen Berg
@ 2010-04-18  9:56         ` Gleb Natapov
  2010-04-19  9:21           ` Espen Berg
  1 sibling, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2010-04-18  9:56 UTC (permalink / raw)
  To: Dor Laor; +Cc: Espen Berg, kvm

On Sun, Apr 18, 2010 at 12:22:54PM +0300, Dor Laor wrote:
> On 04/18/2010 02:21 AM, Espen Berg wrote:
> >Den 17.04.2010 22:17, skrev Michael Tokarev:
> >>>>We have three KVM hosts that supports live-migration between them, but
> >>>>one of our problems is time drifting. The three frontends has different
> >>>>CPU frequency and the KVM guests adopt the frequency from the host
> >>>>machine where it was first started.
> >>What do you mean by "adopts" ? Note that the cpu frequency
> >>means nothing for all the modern operating systems, at least
> >>since the days of common usage of MS-DOS which relied on CPU
> >>frequency for its time functions. All interesting things are
> >>now done using timers instead, and timers (which don't depend
> >>on CPU frequency again) usually work quite well.
> >
> >The assumption that frequency of the ticks was calculated by the hosts
> >MHz, was based on the fact that grater clock frequency differences
> >caused higher time drift. 60 MHz difference caused about 24min drift,
> >332 MHz difference caused about 2h25min drift.
> >
> >
> >>What complicates things is that the most cheap and accurate
> >>enough time source is TSC (time stamp counter register in
> >>the CPU), but it will definitely be different on each
> >>machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
> >>introduced a compensation. See for example -tdf kvm option.
> >
> >Ah, nice to know. :)
> 
> That's two different things here:
> The issue that Espen is reporting is that the hosts have different
> frequency and guests that relay on the tsc as a source clock will
> notice that post migration. The is indeed a problem that -tdf does
> not solve. -tdf only adds compensation for the RTC clock emulation.
> 
It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
pit is used it does nothing.

> What's the guest type and what's the guest's source clock?
> Using tsc directly as a source clock is not recommended because of
> this migration issue (that is not solveable until we trap every
> rdtsc by the guest). Using pv kvmclock in Linux mitigates this issue
> since it exposes both the tsc and the host clock so guests can
> adjust themselves.
> 
> Several months ago a pvclock migration fix was added to pass the
> pvclock MSRs reading to the destination:
> 1a03675db146dfc760b3b48b3448075189f142cc
> 
> 
> >
> >>>Since this is a cluster in production, I'm not able to try the latest
> >>>version either.
> >>Well, that's difficult one, no? It either works or not.
> >>If you can't try anything else, why to ask? :)
> >
> >What I tried to say was that there are many important virtual servers
> >running on this cluster at the moment, so "trial by error" was not an
> >option. The last time we tried 0.12.x (during the initial tests of the
> >cluster) there where a lot of stability issues, crashes during migration
> >etc.
> >
> >Regards, Espen
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-18  9:56         ` Gleb Natapov
@ 2010-04-19  9:21           ` Espen Berg
  2010-04-19  9:29             ` Gleb Natapov
  0 siblings, 1 reply; 11+ messages in thread
From: Espen Berg @ 2010-04-19  9:21 UTC (permalink / raw)
  To: kvm

Den 18.04.2010 11:56, skrev Gleb Natapov:

>> That's two different things here:
>> The issue that Espen is reporting is that the hosts have different
>> frequency and guests that relay on the tsc as a source clock will
>> notice that post migration. The is indeed a problem that -tdf does
>> not solve. -tdf only adds compensation for the RTC clock emulation.
>>
> It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
> pit is used it does nothing.

So this "hack" will not solve our problem?

Espen



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-19  9:21           ` Espen Berg
@ 2010-04-19  9:29             ` Gleb Natapov
  2010-04-19 11:57               ` Dor Laor
  0 siblings, 1 reply; 11+ messages in thread
From: Gleb Natapov @ 2010-04-19  9:29 UTC (permalink / raw)
  To: Espen Berg; +Cc: kvm

On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote:
> Den 18.04.2010 11:56, skrev Gleb Natapov:
> 
> >>That's two different things here:
> >>The issue that Espen is reporting is that the hosts have different
> >>frequency and guests that relay on the tsc as a source clock will
> >>notice that post migration. The is indeed a problem that -tdf does
> >>not solve. -tdf only adds compensation for the RTC clock emulation.
> >>
> >It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
> >pit is used it does nothing.
> 
> So this "hack" will not solve our problem?
> 
If your guest uses RTC for time keeping it may help. Otherwise it does
nothing.

--
			Gleb.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-19  9:29             ` Gleb Natapov
@ 2010-04-19 11:57               ` Dor Laor
  0 siblings, 0 replies; 11+ messages in thread
From: Dor Laor @ 2010-04-19 11:57 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Espen Berg, kvm

On 04/19/2010 12:29 PM, Gleb Natapov wrote:
> On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote:
>> Den 18.04.2010 11:56, skrev Gleb Natapov:
>>
>>>> That's two different things here:
>>>> The issue that Espen is reporting is that the hosts have different
>>>> frequency and guests that relay on the tsc as a source clock will
>>>> notice that post migration. The is indeed a problem that -tdf does
>>>> not solve. -tdf only adds compensation for the RTC clock emulation.
>>>>
>>> It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
>>> pit is used it does nothing.
>>
>> So this "hack" will not solve our problem?

As I also stated, in the past the kvmclock MSRs were not sync upon live 
migration and it was fixed in 1a03675db146dfc760b3b48b3448075189f142cc ,
better check with the code.

>>
> If your guest uses RTC for time keeping it may help. Otherwise it does
> nothing.
>
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timedrift in KVM guests after livemigration.
  2010-04-18  9:33         ` Espen Berg
@ 2010-04-22  7:40           ` Thomas Treutner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Treutner @ 2010-04-22  7:40 UTC (permalink / raw)
  To: Espen Berg; +Cc: kvm

On Sunday 18 April 2010 11:33:44 Espen Berg wrote:
> All guest are Debian lenny with latest upstream kernel, hvm/kvm.
>
> We are using kvm-clock as guest source clock.
>
> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> kvm-clock

I had to deactivate C1E (AMD CPUs) and use acpi clocksource (for both servers 
and VMs, IIRC). If you can, you should give it a try. After that, live 
migration worked somewhat stable.


regards, 
thomas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-04-22  8:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-15  7:35 Timedrift in KVM guests after livemigration Espen Berg
2010-04-17 19:52 ` Espen Berg
2010-04-17 20:17   ` Michael Tokarev
2010-04-17 23:21     ` Espen Berg
2010-04-18  9:22       ` Dor Laor
2010-04-18  9:33         ` Espen Berg
2010-04-22  7:40           ` Thomas Treutner
2010-04-18  9:56         ` Gleb Natapov
2010-04-19  9:21           ` Espen Berg
2010-04-19  9:29             ` Gleb Natapov
2010-04-19 11:57               ` Dor Laor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox