Bug in KVM clock backwards compensation

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bug in KVM clock backwards compensation
@ 2011-04-28  6:59 Zachary Amsden
  2011-04-28  7:06 ` Jan Kiszka
  2011-04-28  7:13 ` Roedel, Joerg
  0 siblings, 2 replies; 12+ messages in thread
From: Zachary Amsden @ 2011-04-28  6:59 UTC (permalink / raw)
  To: kvm, Jan Kiszka, Joerg Roedel

So I've been going over the new code changes to the TSC related code and 
I don't like one particular set of changes.  In particular, here:

         kvm_x86_ops->vcpu_load(vcpu, cpu);
         if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
                 /* Make sure TSC doesn't go backwards */
                 s64 tsc_delta;
                 u64 tsc;

                 kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
                 tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
                              tsc - vcpu->arch.last_guest_tsc;

                 if (tsc_delta < 0)
                         mark_tsc_unstable("KVM discovered backwards TSC");
                 if (check_tsc_unstable()) {
                         kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
                         vcpu->arch.tsc_catchup = 1;
                 }

The point of this code fragment is to test the host clock to see if it 
is stable, because we may have just come back from an idle phase which 
stopped the TSC, switched CPUs, or come back from a deep sleep state 
which reset the host TSC.

However, the above code is fetching the guest TSC instead of the host 
TSC, which isn't the way it is supposed to work.

I saw a patch floating around that touched this code recently, but I 
think there's a definite issue here that needs addressing.

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  6:59 Bug in KVM clock backwards compensation Zachary Amsden
@ 2011-04-28  7:06 ` Jan Kiszka
  2011-04-28  7:22   ` Roedel, Joerg
  2011-04-28 17:48   ` Zachary Amsden
  2011-04-28  7:13 ` Roedel, Joerg
  1 sibling, 2 replies; 12+ messages in thread
From: Jan Kiszka @ 2011-04-28  7:06 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

On 2011-04-28 08:59, Zachary Amsden wrote:
> So I've been going over the new code changes to the TSC related code and
> I don't like one particular set of changes.  In particular, here:
> 
>         kvm_x86_ops->vcpu_load(vcpu, cpu);
>         if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>                 /* Make sure TSC doesn't go backwards */
>                 s64 tsc_delta;
>                 u64 tsc;
> 
>                 kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
>                 tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>                              tsc - vcpu->arch.last_guest_tsc;
> 
>                 if (tsc_delta < 0)
>                         mark_tsc_unstable("KVM discovered backwards TSC");
>                 if (check_tsc_unstable()) {
>                         kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>                         vcpu->arch.tsc_catchup = 1;
>                 }
> 
> 
> The point of this code fragment is to test the host clock to see if it
> is stable, because we may have just come back from an idle phase which
> stopped the TSC, switched CPUs, or come back from a deep sleep state
> which reset the host TSC.
> 
> However, the above code is fetching the guest TSC instead of the host
> TSC, which isn't the way it is supposed to work.
> 
> I saw a patch floating around that touched this code recently, but I
> think there's a definite issue here that needs addressing.

And /me still wonders (like I did when this first popped up) if the
proper place of determining TSC stability really have to be KVM.

If the Linux core fails to detect some instability and KVM has to jump
in, shouldn't we better improve the core's detection abilities and make
use of them in KVM? Conceptually this looks like we are currently just
working around a core deficit in KVM.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  7:06 ` Jan Kiszka
@ 2011-04-28  7:22   ` Roedel, Joerg
  2011-04-28 19:06     ` Zachary Amsden
  2011-04-28 17:48   ` Zachary Amsden
  1 sibling, 1 reply; 12+ messages in thread
From: Roedel, Joerg @ 2011-04-28  7:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Zachary Amsden, kvm

On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
> And /me still wonders (like I did when this first popped up) if the
> proper place of determining TSC stability really have to be KVM.
> 
> If the Linux core fails to detect some instability and KVM has to jump
> in, shouldn't we better improve the core's detection abilities and make
> use of them in KVM? Conceptually this looks like we are currently just
> working around a core deficit in KVM.

Yes, good question. Has this ever triggered on a real machine (not
counting the suspend/resume issue in)?

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  7:22   ` Roedel, Joerg
@ 2011-04-28 19:06     ` Zachary Amsden
  2011-04-28 22:38       ` Jan Kiszka
  0 siblings, 1 reply; 12+ messages in thread
From: Zachary Amsden @ 2011-04-28 19:06 UTC (permalink / raw)
  To: Roedel, Joerg; +Cc: Jan Kiszka, kvm

On 04/28/2011 12:22 AM, Roedel, Joerg wrote:
> On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
>    
>> And /me still wonders (like I did when this first popped up) if the
>> proper place of determining TSC stability really have to be KVM.
>>
>> If the Linux core fails to detect some instability and KVM has to jump
>> in, shouldn't we better improve the core's detection abilities and make
>> use of them in KVM? Conceptually this looks like we are currently just
>> working around a core deficit in KVM.
>>      
> Yes, good question. Has this ever triggered on a real machine (not
> counting the suspend/resume issue in)?
>    

Yes... some platforms don't go haywire until you start using power 
mangement, TSC is stable before that, but not afterwards, and depending 
on the version of the kernel, KVM might detect this before the kernel does.

Honestly, the code is obsolete, but still useful for those who build KVM 
as an external module on older kernels using the kvm-kmod system.

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28 19:06     ` Zachary Amsden
@ 2011-04-28 22:38       ` Jan Kiszka
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kiszka @ 2011-04-28 22:38 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Roedel, Joerg, kvm

[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]

On 2011-04-28 21:06, Zachary Amsden wrote:
> On 04/28/2011 12:22 AM, Roedel, Joerg wrote:
>> On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
>>   
>>> And /me still wonders (like I did when this first popped up) if the
>>> proper place of determining TSC stability really have to be KVM.
>>>
>>> If the Linux core fails to detect some instability and KVM has to jump
>>> in, shouldn't we better improve the core's detection abilities and make
>>> use of them in KVM? Conceptually this looks like we are currently just
>>> working around a core deficit in KVM.
>>>      
>> Yes, good question. Has this ever triggered on a real machine (not
>> counting the suspend/resume issue in)?
>>    
> 
> Yes... some platforms don't go haywire until you start using power
> mangement, TSC is stable before that, but not afterwards, and depending
> on the version of the kernel, KVM might detect this before the kernel does.
> 
> Honestly, the code is obsolete, but still useful for those who build KVM
> as an external module on older kernels using the kvm-kmod system.

I'll happily accept patches that migrate any logic to kvm-kmod that the
current kernel does not need it anymore.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  7:06 ` Jan Kiszka
  2011-04-28  7:22   ` Roedel, Joerg
@ 2011-04-28 17:48   ` Zachary Amsden
  1 sibling, 0 replies; 12+ messages in thread
From: Zachary Amsden @ 2011-04-28 17:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm, Joerg Roedel

On 04/28/2011 12:06 AM, Jan Kiszka wrote:
> On 2011-04-28 08:59, Zachary Amsden wrote:
>    
>> So I've been going over the new code changes to the TSC related code and
>> I don't like one particular set of changes.  In particular, here:
>>
>>          kvm_x86_ops->vcpu_load(vcpu, cpu);
>>          if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>                  /* Make sure TSC doesn't go backwards */
>>                  s64 tsc_delta;
>>                  u64 tsc;
>>
>>                  kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc);
>>                  tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>>                               tsc - vcpu->arch.last_guest_tsc;
>>
>>                  if (tsc_delta<  0)
>>                          mark_tsc_unstable("KVM discovered backwards TSC");
>>                  if (check_tsc_unstable()) {
>>                          kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>>                          vcpu->arch.tsc_catchup = 1;
>>                  }
>>
>>
>> The point of this code fragment is to test the host clock to see if it
>> is stable, because we may have just come back from an idle phase which
>> stopped the TSC, switched CPUs, or come back from a deep sleep state
>> which reset the host TSC.
>>
>> However, the above code is fetching the guest TSC instead of the host
>> TSC, which isn't the way it is supposed to work.
>>
>> I saw a patch floating around that touched this code recently, but I
>> think there's a definite issue here that needs addressing.
>>      
> And /me still wonders (like I did when this first popped up) if the
> proper place of determining TSC stability really have to be KVM.
>    

No, it's not.  I have a patch which fixes that.  Was in the process of 
merging it into my local tree when I found this.

> If the Linux core fails to detect some instability and KVM has to jump
> in, shouldn't we better improve the core's detection abilities and make
> use of them in KVM? Conceptually this looks like we are currently just
> working around a core deficit in KVM.
>    

100% correct you are.

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  6:59 Bug in KVM clock backwards compensation Zachary Amsden
  2011-04-28  7:06 ` Jan Kiszka
@ 2011-04-28  7:13 ` Roedel, Joerg
  2011-04-28 18:34   ` Zachary Amsden
  1 sibling, 1 reply; 12+ messages in thread
From: Roedel, Joerg @ 2011-04-28  7:13 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Jan Kiszka

On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote:
> So I've been going over the new code changes to the TSC related code and 
> I don't like one particular set of changes.  In particular, here:
> 
>          kvm_x86_ops->vcpu_load(vcpu, cpu);
>          if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>                  /* Make sure TSC doesn't go backwards */
>                  s64 tsc_delta;
>                  u64 tsc;
> 
>                  kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
>                  tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>                               tsc - vcpu->arch.last_guest_tsc;
> 
>                  if (tsc_delta < 0)
>                          mark_tsc_unstable("KVM discovered backwards TSC");
>                  if (check_tsc_unstable()) {
>                          kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>                          vcpu->arch.tsc_catchup = 1;
>                  }
> 
> 
> The point of this code fragment is to test the host clock to see if it 
> is stable, because we may have just come back from an idle phase which 
> stopped the TSC, switched CPUs, or come back from a deep sleep state 
> which reset the host TSC.

I see it different. This code wants to check if the _guest_ tsc moves
forwared (or at least not backwards). So it is fully legitimate to just
do this by reading the guest-tsc and compare it to the last one the
guest had.

> I saw a patch floating around that touched this code recently, but I 
> think there's a definite issue here that needs addressing.

In fact, this change was done to address one of your concerns. You
mentioned that the values passed to adjust_tsc_offset() were in
unconsistent units in my first version of tsc-scaling. This was a right
objection because one call-site used guest-tsc-units while the other
used host-tsc-units. This change intended to fix that by using
guest-tsc-units always for adjust_tsc_offset().

Not that the guest and the host tsc have the same units on current
machines. But with tsc-scaling these units are different.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28  7:13 ` Roedel, Joerg
@ 2011-04-28 18:34   ` Zachary Amsden
  2011-04-28 20:20     ` Joerg Roedel
  0 siblings, 1 reply; 12+ messages in thread
From: Zachary Amsden @ 2011-04-28 18:34 UTC (permalink / raw)
  To: Roedel, Joerg; +Cc: kvm, Jan Kiszka

On 04/28/2011 12:13 AM, Roedel, Joerg wrote:
> On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote:
>    
>> So I've been going over the new code changes to the TSC related code and
>> I don't like one particular set of changes.  In particular, here:
>>
>>           kvm_x86_ops->vcpu_load(vcpu, cpu);
>>           if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>                   /* Make sure TSC doesn't go backwards */
>>                   s64 tsc_delta;
>>                   u64 tsc;
>>
>>                   kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc);
>>                   tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>>                                tsc - vcpu->arch.last_guest_tsc;
>>
>>                   if (tsc_delta<  0)
>>                           mark_tsc_unstable("KVM discovered backwards TSC");
>>                   if (check_tsc_unstable()) {
>>                           kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>>                           vcpu->arch.tsc_catchup = 1;
>>                   }
>>
>>
>> The point of this code fragment is to test the host clock to see if it
>> is stable, because we may have just come back from an idle phase which
>> stopped the TSC, switched CPUs, or come back from a deep sleep state
>> which reset the host TSC.
>>      
> I see it different. This code wants to check if the _guest_ tsc moves
> forwared (or at least not backwards). So it is fully legitimate to just
> do this by reading the guest-tsc and compare it to the last one the
> guest had.
>    

That wasn't the intention when I wrote that code.  It's simply there to 
detect backwards motion of the host TSC.  The guest TSC can legally go 
backwards whenever the guest decides to change it, so checking the guest 
TSC doesn't make sense here.

>> I saw a patch floating around that touched this code recently, but I
>> think there's a definite issue here that needs addressing.
>>      
> In fact, this change was done to address one of your concerns. You
> mentioned that the values passed to adjust_tsc_offset() were in
> unconsistent units in my first version of tsc-scaling. This was a right
> objection because one call-site used guest-tsc-units while the other
> used host-tsc-units. This change intended to fix that by using
> guest-tsc-units always for adjust_tsc_offset().
>
> Not that the guest and the host tsc have the same units on current
> machines. But with tsc-scaling these units are different.
>    

Yes, with tsc-scaling, the machines already have stable TSCs - the above 
test is for older hardware which could have problems, and can be 
reverted back to the original code without worrying about switching units.

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28 18:34   ` Zachary Amsden
@ 2011-04-28 20:20     ` Joerg Roedel
  2011-04-29  3:00       ` Zachary Amsden
  0 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2011-04-28 20:20 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Roedel, Joerg, kvm, Jan Kiszka

On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote:
> On 04/28/2011 12:13 AM, Roedel, Joerg wrote:

>> I see it different. This code wants to check if the _guest_ tsc moves
>> forwared (or at least not backwards). So it is fully legitimate to just
>> do this by reading the guest-tsc and compare it to the last one the
>> guest had.
>
> That wasn't the intention when I wrote that code.  It's simply there to  
> detect backwards motion of the host TSC.  The guest TSC can legally go  
> backwards whenever the guest decides to change it, so checking the guest  
> TSC doesn't make sense here.

This code checks how many guest tsc cycles have passed since this vCPU
was de-scheduled last time (and before it is running again). So since
the vCPU hasn't run in the meantime it had no chance to change its TSC.
Further, the other parameters like the TSC offset and the scaling
multiplier havn't changed too, so the only variable in the guest-tsc
calculation is the host-tsc.
So this calculation using the guest-tsc can detect backwards going
host-tsc as good as the old one. The benefit here is that we can feed
consistent values into adjust_tsc_offset().

> Yes, with tsc-scaling, the machines already have stable TSCs - the above  
> test is for older hardware which could have problems, and can be  
> reverted back to the original code without worrying about switching 
> units.

This is the case pratically. But architecturally the tsc-scaling feature
does not depend on a constant tsc, so we can make no such assumtions.
Additionally, it may happen that Linux mis-detects an unstable tsc for
some reason (broken BIOS, bug in the code, ...).  Therefore I think it
is dangerous to assume that this code will never run on tsc-scaling
capable hosts. And if it does and we don't manage the tsc-offset units
right, we may see very weird behavior.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-28 20:20     ` Joerg Roedel
@ 2011-04-29  3:00       ` Zachary Amsden
  2011-04-29  8:40         ` Joerg Roedel
  0 siblings, 1 reply; 12+ messages in thread
From: Zachary Amsden @ 2011-04-29  3:00 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Roedel, Joerg, kvm, Jan Kiszka

On 04/28/2011 01:20 PM, Joerg Roedel wrote:
> On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote:
>    
>> On 04/28/2011 12:13 AM, Roedel, Joerg wrote:
>>      
>    
>>> I see it different. This code wants to check if the _guest_ tsc moves
>>> forwared (or at least not backwards). So it is fully legitimate to just
>>> do this by reading the guest-tsc and compare it to the last one the
>>> guest had.
>>>        
>> That wasn't the intention when I wrote that code.  It's simply there to
>> detect backwards motion of the host TSC.  The guest TSC can legally go
>> backwards whenever the guest decides to change it, so checking the guest
>> TSC doesn't make sense here.
>>      
> This code checks how many guest tsc cycles have passed since this vCPU
> was de-scheduled last time (and before it is running again). So since
> the vCPU hasn't run in the meantime it had no chance to change its TSC.
> Further, the other parameters like the TSC offset and the scaling
> multiplier havn't changed too, so the only variable in the guest-tsc
> calculation is the host-tsc.
> So this calculation using the guest-tsc can detect backwards going
> host-tsc as good as the old one. The benefit here is that we can feed
> consistent values into adjust_tsc_offset().
>    

While true, this is more complex than the original code.  The original 
code here doesn't try to actually compensate for the guest TSC 
difference - instead what it does is NULL any discovered host TSC delta:

         if (tsc_delta < 0)
             mark_tsc_unstable("KVM discovered backwards TSC");
         if (check_tsc_unstable()) {
             kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
             vcpu->arch.tsc_catchup = 1;
         }
         kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

Erasing that delta also erases elapsed time since the VCPU has last been 
run, which isn't desirable, so it then sets tsc_catchup mode, which will 
restore the proper TSC.  The request here triggers code which later 
updates the TSC offset again.

To avoid complexity, I think it's simplest to do the first computation 
in terms of the host TSC.

>> Yes, with tsc-scaling, the machines already have stable TSCs - the above
>> test is for older hardware which could have problems, and can be
>> reverted back to the original code without worrying about switching
>> units.
>>      
> This is the case pratically. But architecturally the tsc-scaling feature
> does not depend on a constant tsc, so we can make no such assumtions.
> Additionally, it may happen that Linux mis-detects an unstable tsc for
> some reason (broken BIOS, bug in the code, ...).  Therefore I think it
> is dangerous to assume that this code will never run on tsc-scaling
> capable hosts. And if it does and we don't manage the tsc-offset units
> right, we may see very weird behavior.
>    

I agree, it is best to handle this case - hardware can and will change - 
but the TSC adjustment in terms of guest rate should be done under the 
atomic protection right before entering hardware virtualized mode - here:

I left compute_guest_tsc in place to recompute time in guest units here, 
even if the underlying hardware rate changes.

         /*
          * We may have to catch up the TSC to match elapsed wall clock
          * time for two reasons, even if kvmclock is used.
          *   1) CPU could have been running below the maximum TSC rate
          *   2) Broken TSC compensation resets the base at each VCPU
          *      entry to avoid unknown leaps of TSC even when running
          *      again on the same CPU.  This may cause apparent elapsed
          *      time to disappear, and the guest to stand still or run
          *      very slowly.
          */
         if (vcpu->tsc_catchup) {
                 u64 tsc = compute_guest_tsc(v, kernel_ns);
                 if (tsc > tsc_timestamp) {
                         kvm_x86_ops->adjust_tsc_offset(v, tsc - 
tsc_timestamp);
                         tsc_timestamp = tsc;
                 }
         }

So yeah, the code is getting pretty complex but we'd like to avoid that 
as much as possible - so I would prefer to have the hardware backwards 
compensation separate from the guest rate computation by doing this:

step 1) remove any backwards hardware TSC delta (in hardware units)
step 2) recompute guest TSC from a stable clock (gotten from kernel_ns) 
and apply adjustment (in guest units)

So it appears you can just share most of the logic of guest TSC catchup 
mode.

What do you think?

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-29  3:00       ` Zachary Amsden
@ 2011-04-29  8:40         ` Joerg Roedel
  2011-04-29 18:17           ` Zachary Amsden
  0 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2011-04-29  8:40 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Roedel, Joerg, kvm, Jan Kiszka

On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
> On 04/28/2011 01:20 PM, Joerg Roedel wrote:

>> This code checks how many guest tsc cycles have passed since this vCPU
>> was de-scheduled last time (and before it is running again). So since
>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>> Further, the other parameters like the TSC offset and the scaling
>> multiplier havn't changed too, so the only variable in the guest-tsc
>> calculation is the host-tsc.
>> So this calculation using the guest-tsc can detect backwards going
>> host-tsc as good as the old one. The benefit here is that we can feed
>> consistent values into adjust_tsc_offset().
>>    
>
> While true, this is more complex than the original code.  The original  
> code here doesn't try to actually compensate for the guest TSC  
> difference - instead what it does is NULL any discovered host TSC delta:

Why should KVM care about the host-tsc going backwards when the
guest-tsc moves forward? I think the bottom-line of this code is to make
sure the guest-tsc does not jump around (or even jumps backwards).

I also don't agree that this is additional complexity. With these
changes we were able to get rid of the last_host_tsc variable which is
actually a simplification.

As I see it, there are two situations here:

1) On hosts without tsc-scaling the value of tsc_delta calculated from
   the guest-tsc values is the same as when calculated with host-tsc
   values, because the guest-tsc only differs by an offset from the
   host-tsc.

2) On hosts with tsc-scaling these two tsc_delta values may be
   different. If the guest has a different tsc-freq as the host, we
   can't simply adjust the tsc by an offset calculated from the host-tsc
   values. So tsc_delta has to be calculated using the guest-tsc.
>
>         if (tsc_delta < 0)
>             mark_tsc_unstable("KVM discovered backwards TSC");
>         if (check_tsc_unstable()) {
>             kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>             vcpu->arch.tsc_catchup = 1;
>         }
>         kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
> Erasing that delta also erases elapsed time since the VCPU has last been  
> run, which isn't desirable, so it then sets tsc_catchup mode, which will  
> restore the proper TSC.  The request here triggers code which later  
> updates the TSC offset again.

I just looked again at the tsc_catchup thing. Since the clock-update is
forced in the code above, and the clock-update-code adjusts the tsc
itself if necessary, is it really necessary to do this here at all?

> I left compute_guest_tsc in place to recompute time in guest units here,  
> even if the underlying hardware rate changes.
>
>         /*
>          * We may have to catch up the TSC to match elapsed wall clock
>          * time for two reasons, even if kvmclock is used.
>          *   1) CPU could have been running below the maximum TSC rate
>          *   2) Broken TSC compensation resets the base at each VCPU
>          *      entry to avoid unknown leaps of TSC even when running
>          *      again on the same CPU.  This may cause apparent elapsed
>          *      time to disappear, and the guest to stand still or run
>          *      very slowly.
>          */
>         if (vcpu->tsc_catchup) {
>                 u64 tsc = compute_guest_tsc(v, kernel_ns);
>                 if (tsc > tsc_timestamp) {
>                         kvm_x86_ops->adjust_tsc_offset(v, tsc -  
> tsc_timestamp);
>                         tsc_timestamp = tsc;
>                 }
>         }
>
> So yeah, the code is getting pretty complex but we'd like to avoid that  
> as much as possible - so I would prefer to have the hardware backwards  
> compensation separate from the guest rate computation by doing this:
>
> step 1) remove any backwards hardware TSC delta (in hardware units)
> step 2) recompute guest TSC from a stable clock (gotten from kernel_ns)  
> and apply adjustment (in guest units)
>
> So it appears you can just share most of the logic of guest TSC catchup  
> mode.
>
> What do you think?

With the compensation done in kvm_guest_time_update I see no reason
anymore to adjust the tsc in vcpu_load at all. So if we can remove the
adjustment from there we can switch back to host-tsc there.

The question still is whether this needs to be done in KVM. For older
kernels Jan will take patches that handle this in kvm-kmod. So this code
can probably be removed from vcpu_load. The benefit is that we don't
need to re-introduce the last_host_tsc.

If we still need to call adjust_tsc_offset() in vcpu_load then we have
to do the calculation with the guest_tsc because this gives us a
tsc_delta in units that adjust_tsc_offset expects.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug in KVM clock backwards compensation
  2011-04-29  8:40         ` Joerg Roedel
@ 2011-04-29 18:17           ` Zachary Amsden
  0 siblings, 0 replies; 12+ messages in thread
From: Zachary Amsden @ 2011-04-29 18:17 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Roedel, Joerg, kvm, Jan Kiszka

On 04/29/2011 01:40 AM, Joerg Roedel wrote:
> On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
>    
>> On 04/28/2011 01:20 PM, Joerg Roedel wrote:
>>      
>    
>>> This code checks how many guest tsc cycles have passed since this vCPU
>>> was de-scheduled last time (and before it is running again). So since
>>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>>> Further, the other parameters like the TSC offset and the scaling
>>> multiplier havn't changed too, so the only variable in the guest-tsc
>>> calculation is the host-tsc.
>>> So this calculation using the guest-tsc can detect backwards going
>>> host-tsc as good as the old one. The benefit here is that we can feed
>>> consistent values into adjust_tsc_offset().
>>>
>>>        
>> While true, this is more complex than the original code.  The original
>> code here doesn't try to actually compensate for the guest TSC
>> difference - instead what it does is NULL any discovered host TSC delta:
>>      
> Why should KVM care about the host-tsc going backwards when the
> guest-tsc moves forward? I think the bottom-line of this code is to make
> sure the guest-tsc does not jump around (or even jumps backwards).
>
> I also don't agree that this is additional complexity. With these
> changes we were able to get rid of the last_host_tsc variable which is
> actually a simplification.
>
> As I see it, there are two situations here:
>
> 1) On hosts without tsc-scaling the value of tsc_delta calculated from
>     the guest-tsc values is the same as when calculated with host-tsc
>     values, because the guest-tsc only differs by an offset from the
>     host-tsc.
>
> 2) On hosts with tsc-scaling these two tsc_delta values may be
>     different. If the guest has a different tsc-freq as the host, we
>     can't simply adjust the tsc by an offset calculated from the host-tsc
>     values. So tsc_delta has to be calculated using the guest-tsc.
>    
>>          if (tsc_delta<  0)
>>              mark_tsc_unstable("KVM discovered backwards TSC");
>>          if (check_tsc_unstable()) {
>>              kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>>              vcpu->arch.tsc_catchup = 1;
>>          }
>>          kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>>
>> Erasing that delta also erases elapsed time since the VCPU has last been
>> run, which isn't desirable, so it then sets tsc_catchup mode, which will
>> restore the proper TSC.  The request here triggers code which later
>> updates the TSC offset again.
>>      
> I just looked again at the tsc_catchup thing. Since the clock-update is
> forced in the code above, and the clock-update-code adjusts the tsc
> itself if necessary, is it really necessary to do this here at all?
>    

Unfortunately, yes, it is.  The code in the second or catchup phase can 
only correct an under adjusted clock, or we risk setting the guest clock 
backwards from what has been observed.  So to guarantee there is not a 
huge jump due to over-adjustment, we must eliminate the TSC delta when 
switching CPUs, and that needs to happen in the preempt notifier, not 
the clock update handler.

I agree it is simpler to do everything in terms of guest clock rate, but 
there is one variable which is thrown in to complicate things here - the 
host clock rate may have changed during this whole process.

Let me look at the code again and see if it is possible to get rid of 
the first, host based adjustment entirely.  Ideally we would just track 
things in terms of the guest clock and do all the computation inside the 
clock update request handler instead of having one adjustment in the 
preempt notifier and a separate one later when the clock update happens.

I had originally tried something like that but ran into issues where the 
rate computation got exceedingly difficult.  Maybe the code has been 
restructured enough now that it will work (the clock update didn't used 
to happen unless you had KVM clock...)

Zach

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-04-29 18:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-28  6:59 Bug in KVM clock backwards compensation Zachary Amsden
2011-04-28  7:06 ` Jan Kiszka
2011-04-28  7:22   ` Roedel, Joerg
2011-04-28 19:06     ` Zachary Amsden
2011-04-28 22:38       ` Jan Kiszka
2011-04-28 17:48   ` Zachary Amsden
2011-04-28  7:13 ` Roedel, Joerg
2011-04-28 18:34   ` Zachary Amsden
2011-04-28 20:20     ` Joerg Roedel
2011-04-29  3:00       ` Zachary Amsden
2011-04-29  8:40         ` Joerg Roedel
2011-04-29 18:17           ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).