From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: Bug in KVM clock backwards compensation Date: Thu, 28 Apr 2011 11:34:44 -0700 Message-ID: <4DB9B344.9010301@redhat.com> References: <4DB9106D.6040203@redhat.com> <20110428071316.GG20365@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm , Jan Kiszka To: "Roedel, Joerg" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:20947 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754846Ab1D1Ser (ORCPT ); Thu, 28 Apr 2011 14:34:47 -0400 In-Reply-To: <20110428071316.GG20365@amd.com> Sender: kvm-owner@vger.kernel.org List-ID: On 04/28/2011 12:13 AM, Roedel, Joerg wrote: > On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote: > >> So I've been going over the new code changes to the TSC related code and >> I don't like one particular set of changes. In particular, here: >> >> kvm_x86_ops->vcpu_load(vcpu, cpu); >> if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { >> /* Make sure TSC doesn't go backwards */ >> s64 tsc_delta; >> u64 tsc; >> >> kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc); >> tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : >> tsc - vcpu->arch.last_guest_tsc; >> >> if (tsc_delta< 0) >> mark_tsc_unstable("KVM discovered backwards TSC"); >> if (check_tsc_unstable()) { >> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); >> vcpu->arch.tsc_catchup = 1; >> } >> >> >> The point of this code fragment is to test the host clock to see if it >> is stable, because we may have just come back from an idle phase which >> stopped the TSC, switched CPUs, or come back from a deep sleep state >> which reset the host TSC. >> > I see it different. This code wants to check if the _guest_ tsc moves > forwared (or at least not backwards). So it is fully legitimate to just > do this by reading the guest-tsc and compare it to the last one the > guest had. > That wasn't the intention when I wrote that code. It's simply there to detect backwards motion of the host TSC. The guest TSC can legally go backwards whenever the guest decides to change it, so checking the guest TSC doesn't make sense here. >> I saw a patch floating around that touched this code recently, but I >> think there's a definite issue here that needs addressing. >> > In fact, this change was done to address one of your concerns. You > mentioned that the values passed to adjust_tsc_offset() were in > unconsistent units in my first version of tsc-scaling. This was a right > objection because one call-site used guest-tsc-units while the other > used host-tsc-units. This change intended to fix that by using > guest-tsc-units always for adjust_tsc_offset(). > > Not that the guest and the host tsc have the same units on current > machines. But with tsc-scaling these units are different. > Yes, with tsc-scaling, the machines already have stable TSCs - the above test is for older hardware which could have problems, and can be reverted back to the original code without worrying about switching units. Zach