All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joerg Roedel <joro@8bytes.org>
To: Zachary Amsden <zamsden@redhat.com>
Cc: "Roedel, Joerg" <Joerg.Roedel@amd.com>, kvm <kvm@vger.kernel.org>,
	Jan Kiszka <jan.kiszka@web.de>
Subject: Re: Bug in KVM clock backwards compensation
Date: Fri, 29 Apr 2011 10:40:54 +0200	[thread overview]
Message-ID: <20110429084054.GG13402@8bytes.org> (raw)
In-Reply-To: <4DBA29E9.9000905@redhat.com>

On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
> On 04/28/2011 01:20 PM, Joerg Roedel wrote:

>> This code checks how many guest tsc cycles have passed since this vCPU
>> was de-scheduled last time (and before it is running again). So since
>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>> Further, the other parameters like the TSC offset and the scaling
>> multiplier havn't changed too, so the only variable in the guest-tsc
>> calculation is the host-tsc.
>> So this calculation using the guest-tsc can detect backwards going
>> host-tsc as good as the old one. The benefit here is that we can feed
>> consistent values into adjust_tsc_offset().
>>    
>
> While true, this is more complex than the original code.  The original  
> code here doesn't try to actually compensate for the guest TSC  
> difference - instead what it does is NULL any discovered host TSC delta:

Why should KVM care about the host-tsc going backwards when the
guest-tsc moves forward? I think the bottom-line of this code is to make
sure the guest-tsc does not jump around (or even jumps backwards).

I also don't agree that this is additional complexity. With these
changes we were able to get rid of the last_host_tsc variable which is
actually a simplification.

As I see it, there are two situations here:

1) On hosts without tsc-scaling the value of tsc_delta calculated from
   the guest-tsc values is the same as when calculated with host-tsc
   values, because the guest-tsc only differs by an offset from the
   host-tsc.

2) On hosts with tsc-scaling these two tsc_delta values may be
   different. If the guest has a different tsc-freq as the host, we
   can't simply adjust the tsc by an offset calculated from the host-tsc
   values. So tsc_delta has to be calculated using the guest-tsc.
>
>         if (tsc_delta < 0)
>             mark_tsc_unstable("KVM discovered backwards TSC");
>         if (check_tsc_unstable()) {
>             kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>             vcpu->arch.tsc_catchup = 1;
>         }
>         kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
> Erasing that delta also erases elapsed time since the VCPU has last been  
> run, which isn't desirable, so it then sets tsc_catchup mode, which will  
> restore the proper TSC.  The request here triggers code which later  
> updates the TSC offset again.

I just looked again at the tsc_catchup thing. Since the clock-update is
forced in the code above, and the clock-update-code adjusts the tsc
itself if necessary, is it really necessary to do this here at all?

> I left compute_guest_tsc in place to recompute time in guest units here,  
> even if the underlying hardware rate changes.
>
>         /*
>          * We may have to catch up the TSC to match elapsed wall clock
>          * time for two reasons, even if kvmclock is used.
>          *   1) CPU could have been running below the maximum TSC rate
>          *   2) Broken TSC compensation resets the base at each VCPU
>          *      entry to avoid unknown leaps of TSC even when running
>          *      again on the same CPU.  This may cause apparent elapsed
>          *      time to disappear, and the guest to stand still or run
>          *      very slowly.
>          */
>         if (vcpu->tsc_catchup) {
>                 u64 tsc = compute_guest_tsc(v, kernel_ns);
>                 if (tsc > tsc_timestamp) {
>                         kvm_x86_ops->adjust_tsc_offset(v, tsc -  
> tsc_timestamp);
>                         tsc_timestamp = tsc;
>                 }
>         }
>
> So yeah, the code is getting pretty complex but we'd like to avoid that  
> as much as possible - so I would prefer to have the hardware backwards  
> compensation separate from the guest rate computation by doing this:
>
> step 1) remove any backwards hardware TSC delta (in hardware units)
> step 2) recompute guest TSC from a stable clock (gotten from kernel_ns)  
> and apply adjustment (in guest units)
>
> So it appears you can just share most of the logic of guest TSC catchup  
> mode.
>
> What do you think?

With the compensation done in kvm_guest_time_update I see no reason
anymore to adjust the tsc in vcpu_load at all. So if we can remove the
adjustment from there we can switch back to host-tsc there.

The question still is whether this needs to be done in KVM. For older
kernels Jan will take patches that handle this in kvm-kmod. So this code
can probably be removed from vcpu_load. The benefit is that we don't
need to re-introduce the last_host_tsc.

If we still need to call adjust_tsc_offset() in vcpu_load then we have
to do the calculation with the guest_tsc because this gives us a
tsc_delta in units that adjust_tsc_offset expects.


Regards,

	Joerg


  reply	other threads:[~2011-04-29  8:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-28  6:59 Bug in KVM clock backwards compensation Zachary Amsden
2011-04-28  7:06 ` Jan Kiszka
2011-04-28  7:22   ` Roedel, Joerg
2011-04-28 19:06     ` Zachary Amsden
2011-04-28 22:38       ` Jan Kiszka
2011-04-28 17:48   ` Zachary Amsden
2011-04-28  7:13 ` Roedel, Joerg
2011-04-28 18:34   ` Zachary Amsden
2011-04-28 20:20     ` Joerg Roedel
2011-04-29  3:00       ` Zachary Amsden
2011-04-29  8:40         ` Joerg Roedel [this message]
2011-04-29 18:17           ` Zachary Amsden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110429084054.GG13402@8bytes.org \
    --to=joro@8bytes.org \
    --cc=Joerg.Roedel@amd.com \
    --cc=jan.kiszka@web.de \
    --cc=kvm@vger.kernel.org \
    --cc=zamsden@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.