From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zachary Amsden <zamsden@redhat.com>
Subject: Re: Bug in KVM clock backwards compensation
Date: Fri, 29 Apr 2011 11:17:18 -0700
Message-ID: <4DBB00AE.6000602@redhat.com>
References: <4DB9106D.6040203@redhat.com> <20110428071316.GG20365@amd.com> <4DB9B344.9010301@redhat.com> <20110428202002.GF13402@8bytes.org> <4DBA29E9.9000905@redhat.com> <20110429084054.GG13402@8bytes.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "Roedel, Joerg" <Joerg.Roedel@amd.com>, kvm <kvm@vger.kernel.org>,
	Jan Kiszka <jan.kiszka@web.de>
To: Joerg Roedel <joro@8bytes.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:42553 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1760760Ab1D2SRX (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 29 Apr 2011 14:17:23 -0400
In-Reply-To: <20110429084054.GG13402@8bytes.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 04/29/2011 01:40 AM, Joerg Roedel wrote:
> On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
>    
>> On 04/28/2011 01:20 PM, Joerg Roedel wrote:
>>      
>    
>>> This code checks how many guest tsc cycles have passed since this vCPU
>>> was de-scheduled last time (and before it is running again). So since
>>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>>> Further, the other parameters like the TSC offset and the scaling
>>> multiplier havn't changed too, so the only variable in the guest-tsc
>>> calculation is the host-tsc.
>>> So this calculation using the guest-tsc can detect backwards going
>>> host-tsc as good as the old one. The benefit here is that we can feed
>>> consistent values into adjust_tsc_offset().
>>>
>>>        
>> While true, this is more complex than the original code.  The original
>> code here doesn't try to actually compensate for the guest TSC
>> difference - instead what it does is NULL any discovered host TSC delta:
>>      
> Why should KVM care about the host-tsc going backwards when the
> guest-tsc moves forward? I think the bottom-line of this code is to make
> sure the guest-tsc does not jump around (or even jumps backwards).
>
> I also don't agree that this is additional complexity. With these
> changes we were able to get rid of the last_host_tsc variable which is
> actually a simplification.
>
> As I see it, there are two situations here:
>
> 1) On hosts without tsc-scaling the value of tsc_delta calculated from
>     the guest-tsc values is the same as when calculated with host-tsc
>     values, because the guest-tsc only differs by an offset from the
>     host-tsc.
>
> 2) On hosts with tsc-scaling these two tsc_delta values may be
>     different. If the guest has a different tsc-freq as the host, we
>     can't simply adjust the tsc by an offset calculated from the host-tsc
>     values. So tsc_delta has to be calculated using the guest-tsc.
>    
>>          if (tsc_delta<  0)
>>              mark_tsc_unstable("KVM discovered backwards TSC");
>>          if (check_tsc_unstable()) {
>>              kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>>              vcpu->arch.tsc_catchup = 1;
>>          }
>>          kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>>
>> Erasing that delta also erases elapsed time since the VCPU has last been
>> run, which isn't desirable, so it then sets tsc_catchup mode, which will
>> restore the proper TSC.  The request here triggers code which later
>> updates the TSC offset again.
>>      
> I just looked again at the tsc_catchup thing. Since the clock-update is
> forced in the code above, and the clock-update-code adjusts the tsc
> itself if necessary, is it really necessary to do this here at all?
>    

Unfortunately, yes, it is.  The code in the second or catchup phase can 
only correct an under adjusted clock, or we risk setting the guest clock 
backwards from what has been observed.  So to guarantee there is not a 
huge jump due to over-adjustment, we must eliminate the TSC delta when 
switching CPUs, and that needs to happen in the preempt notifier, not 
the clock update handler.

I agree it is simpler to do everything in terms of guest clock rate, but 
there is one variable which is thrown in to complicate things here - the 
host clock rate may have changed during this whole process.

Let me look at the code again and see if it is possible to get rid of 
the first, host based adjustment entirely.  Ideally we would just track 
things in terms of the guest clock and do all the computation inside the 
clock update request handler instead of having one adjustment in the 
preempt notifier and a separate one later when the clock update happens.

I had originally tried something like that but ran into issues where the 
rate computation got exceedingly difficult.  Maybe the code has been 
restructured enough now that it will work (the clock update didn't used 
to happen unless you had KVM clock...)

Zach