From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: KVM: x86: fix tsc catchup issue with tsc scaling Date: Wed, 15 Jan 2014 12:43:02 +0100 Message-ID: <52D67446.3020403@redhat.com> References: <20140106141859.GA27260@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: kvm-devel To: Marcelo Tosatti Return-path: Received: from mx1.redhat.com ([209.132.183.28]:31330 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751010AbaAOLnG (ORCPT ); Wed, 15 Jan 2014 06:43:06 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s0FBh57O025455 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 15 Jan 2014 06:43:06 -0500 In-Reply-To: <20140106141859.GA27260@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Il 06/01/2014 15:18, Marcelo Tosatti ha scritto: > > To fix a problem related to different resolution of TSC and system clock, > the offset in TSC units is approximated by > > delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc > > (Guest TSC value at (Guest TSC value at last VM-exit) > the last kvm_guest_time_update > call) > > Delta is then later scaled using mult,shift pair found in hv_clock > structure (which is correct against tsc_timestamp in that > structure). > > However, if a frequency change is performed between these two points, > this delta is measured using different TSC frequencies, but scaled using > mult,shift pair for one frequency only. > > The end result is an incorrect delta. > > The bug which this code works around is not the only cause for > clock backwards events. The global accumulator is still > necessary, so remove the max_kernel_ns fix and rely on the > global accumulator for no clock backwards events. This is basically reverting commit 1d5f066 (KVM: x86: Fix a possible backwards warp of kvmclock, 2010-08-19). Your commit message basically says that the guest-side 489fb49 (x86, paravirt: Add a global synchronization point for pvclock, 2010-05-11) is the real solution to the problem that the host-side commit 1d5f066 was trying to fix. Right? This patch makes vcpu->hv_clock.tsc_timestamp write only. Please provide a follow up that drops the field entirely, then I'll apply both. In the meanwhile: Reviewed-by: Paolo Bonzini Thanks, Paolo > Signed-off-by: Marcelo Tosatti > > Index: linux-2.6.git/arch/x86/kvm/x86.c > =================================================================== > --- linux-2.6.git.orig/arch/x86/kvm/x86.c > +++ linux-2.6.git/arch/x86/kvm/x86.c > @@ -1484,7 +1484,7 @@ static int kvm_guest_time_update(struct > unsigned long flags, this_tsc_khz; > struct kvm_vcpu_arch *vcpu = &v->arch; > struct kvm_arch *ka = &v->kvm->arch; > - s64 kernel_ns, max_kernel_ns; > + s64 kernel_ns; > u64 tsc_timestamp, host_tsc; > struct pvclock_vcpu_time_info guest_hv_clock; > u8 pvclock_flags; > @@ -1543,37 +1543,6 @@ static int kvm_guest_time_update(struct > if (!vcpu->pv_time_enabled) > return 0; > > - /* > - * Time as measured by the TSC may go backwards when resetting the base > - * tsc_timestamp. The reason for this is that the TSC resolution is > - * higher than the resolution of the other clock scales. Thus, many > - * possible measurments of the TSC correspond to one measurement of any > - * other clock, and so a spread of values is possible. This is not a > - * problem for the computation of the nanosecond clock; with TSC rates > - * around 1GHZ, there can only be a few cycles which correspond to one > - * nanosecond value, and any path through this code will inevitably > - * take longer than that. However, with the kernel_ns value itself, > - * the precision may be much lower, down to HZ granularity. If the > - * first sampling of TSC against kernel_ns ends in the low part of the > - * range, and the second in the high end of the range, we can get: > - * > - * (TSC - offset_low) * S + kns_old > (TSC - offset_high) * S + kns_new > - * > - * As the sampling errors potentially range in the thousands of cycles, > - * it is possible such a time value has already been observed by the > - * guest. To protect against this, we must compute the system time as > - * observed by the guest and ensure the new system time is greater. > - */ > - max_kernel_ns = 0; > - if (vcpu->hv_clock.tsc_timestamp) { > - max_kernel_ns = vcpu->last_guest_tsc - > - vcpu->hv_clock.tsc_timestamp; > - max_kernel_ns = pvclock_scale_delta(max_kernel_ns, > - vcpu->hv_clock.tsc_to_system_mul, > - vcpu->hv_clock.tsc_shift); > - max_kernel_ns += vcpu->last_kernel_ns; > - } > - > if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) { > kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz, > &vcpu->hv_clock.tsc_shift, > @@ -1581,14 +1550,6 @@ static int kvm_guest_time_update(struct > vcpu->hw_tsc_khz = this_tsc_khz; > } > > - /* with a master tuple, > - * pvclock clock reads always increase at the (scaled) rate > - * of guest TSC - no need to deal with sampling errors. > - */ > - if (!use_master_clock) { > - if (max_kernel_ns > kernel_ns) > - kernel_ns = max_kernel_ns; > - } > /* With all the info we got, fill in the values */ > vcpu->hv_clock.tsc_timestamp = tsc_timestamp; > vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset; >