Re: [PATCH 11/17] Fix a possible backwards warp of kvmclock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Zachary Amsden <zamsden@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: avi@redhat.com, glommer@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 11/17] Fix a possible backwards warp of kvmclock
Date: Tue, 15 Jun 2010 14:21:17 -1000	[thread overview]
Message-ID: <4C1818FD.80603@redhat.com> (raw)
In-Reply-To: <20100615234725.GC3395@amt.cnet>

On 06/15/2010 01:47 PM, Marcelo Tosatti wrote:
> On Mon, Jun 14, 2010 at 09:34:13PM -1000, Zachary Amsden wrote:
>    
>> Kernel time, which advances in discrete steps may progress much slower
>> than TSC.  As a result, when kvmclock is adjusted to a new base, the
>> apparent time to the guest, which runs at a much higher, nsec scaled
>> rate based on the current TSC, may have already been observed to have
>> a larger value (kernel_ns + scaled tsc) than the value to which we are
>> setting it (kernel_ns + 0).
>>
>> We must instead compute the clock as potentially observed by the guest
>> for kernel_ns to make sure it does not go backwards.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |    4 ++
>>   arch/x86/kvm/x86.c              |   79 +++++++++++++++++++++++++++++++++------
>>   2 files changed, 71 insertions(+), 12 deletions(-)
>>
>>      
>    
>> +	/*
>> +	 * The protection we require is simple: we must not be preempted from
>> +	 * the CPU between our read of the TSC khz and our read of the TSC.
>> +	 * Interrupt protection is not strictly required, but it does result in
>> +	 * greater accuracy for the TSC / kernel_ns measurement.
>> +	 */
>> +	local_irq_save(flags);
>> +	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
>> +	kvm_get_msr(v, MSR_IA32_TSC,&tsc_timestamp);
>> +	ktime_get_ts(&ts);
>> +	monotonic_to_bootbased(&ts);
>> +	kernel_ns = timespec_to_ns(&ts);
>> +	local_irq_restore(flags);
>> +
>>   	if (unlikely(this_tsc_khz == 0)) {
>>   		kvm_request_guest_time_update(v);
>>   		return 1;
>>   	}
>>
>> +	/*
>> +	 * Time as measured by the TSC may go backwards when resetting the base
>> +	 * tsc_timestamp.  The reason for this is that the TSC resolution is
>> +	 * higher than the resolution of the other clock scales.  Thus, many
>> +	 * possible measurments of the TSC correspond to one measurement of any
>> +	 * other clock, and so a spread of values is possible.  This is not a
>> +	 * problem for the computation of the nanosecond clock; with TSC rates
>> +	 * around 1GHZ, there can only be a few cycles which correspond to one
>> +	 * nanosecond value, and any path through this code will inevitably
>> +	 * take longer than that.  However, with the kernel_ns value itself,
>> +	 * the precision may be much lower, down to HZ granularity.  If the
>> +	 * first sampling of TSC against kernel_ns ends in the low part of the
>> +	 * range, and the second in the high end of the range, we can get:
>> +	 *
>> +	 * (TSC - offset_low) * S + kns_old>  (TSC - offset_high) * S + kns_new
>> +	 *
>> +	 * As the sampling errors potentially range in the thousands of cycles,
>> +	 * it is possible such a time value has already been observed by the
>> +	 * guest.  To protect against this, we must compute the system time as
>> +	 * observed by the guest and ensure the new system time is greater.
>> + 	 */
>> +	max_kernel_ns = 0;
>> +	if (vcpu->hv_clock.tsc_timestamp) {
>> +		max_kernel_ns = vcpu->last_guest_tsc -
>> +				vcpu->hv_clock.tsc_timestamp;
>> +		max_kernel_ns = pvclock_scale_delta(max_kernel_ns,
>> +				    vcpu->hv_clock.tsc_to_system_mul,
>> +				    vcpu->hv_clock.tsc_shift);
>> +		max_kernel_ns += vcpu->last_kernel_ns;
>> +	}
>> +
>>   	if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
>> -		kvm_set_time_scale(this_tsc_khz,&vcpu->hv_clock);
>> +		kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
>> +				&vcpu->hv_clock.tsc_shift,
>> +				&vcpu->hv_clock.tsc_to_system_mul);
>>   		vcpu->hw_tsc_khz = this_tsc_khz;
>>   	}
>>
>> -	/* Keep irq disabled to prevent changes to the clock */
>> -	local_irq_save(flags);
>> -	kvm_get_msr(v, MSR_IA32_TSC,&vcpu->hv_clock.tsc_timestamp);
>> -	ktime_get_ts(&ts);
>> -	monotonic_to_bootbased(&ts);
>> -	local_irq_restore(flags);
>> +	if (max_kernel_ns>  kernel_ns) {
>> +		s64 overshoot = max_kernel_ns - kernel_ns;
>> +		++v->stat.tsc_ahead;
>> +		if (overshoot>  NSEC_PER_SEC / HZ) {
>> +			++v->stat.tsc_overshoot;
>> +			if (printk_ratelimit())
>> +				pr_debug("ns overshoot: %lld\n", overshoot);
>> +		}
>> +		kernel_ns = max_kernel_ns;
>> +	}
>>
>>   	/* With all the info we got, fill in the values */
>> -
>> -	vcpu->hv_clock.system_time = ts.tv_nsec +
>> -				     (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
>> +	vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
>> +	vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
>> +	vcpu->last_kernel_ns = kernel_ns;
>>
>>   	vcpu->hv_clock.flags = 0;
>>
>> @@ -4836,6 +4889,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   	if (hw_breakpoint_active())
>>   		hw_breakpoint_restore();
>>
>> +	kvm_get_msr(vcpu, MSR_IA32_TSC,&vcpu->arch.last_guest_tsc);
>> +
>>   	atomic_set(&vcpu->guest_mode, 0);
>>   	smp_wmb();
>>   	local_irq_enable();
>>      
> Is this still needed with the guest side global counter fix?
>    

It's debatable.  Instrumentation showed this happen 100% of the time 
when measuring TSC in the compensation sequence.  When measuring TSC in 
the hot-path exit from hardware virt, before interrupts are enabled, the 
compensation rate drops to 0%.

That's with an HPET clocksource for kernel time.  Kernels with less 
accurate and more granular clocksources would have worse problems, of 
course.

If we're ever going to turn on the "kvmclock is reliable" bit, though, I 
think at least paying attention to the potential need for compensation 
is required - it technically is a backwards warp of time, and even if we 
spend so long getting out of and back into hardware virtualization that 
the guest can't notice it today, that might not be true on a faster 
processor.

Zach

next prev parent reply	other threads:[~2010-06-16  0:21 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-15  7:34 TSC cleanups, fixes, documentation for KVM Zachary Amsden
2010-06-15  7:34 ` [PATCH 01/17] Eliminate duplicated timer code Zachary Amsden
2010-06-16 13:07   ` Glauber Costa
2010-06-15  7:34 ` [PATCH 02/17] Make cpu_tsc_khz updates use local CPU Zachary Amsden
2010-06-15  8:02   ` Avi Kivity
2010-06-15  7:34 ` [PATCH 03/17] Unify vendor TSC logic Zachary Amsden
2010-06-16  8:10   ` Jason Wang
2010-06-16 13:22     ` Glauber Costa
2010-06-17  8:03       ` Jason Wang
2010-06-16 18:42     ` Zachary Amsden
2010-06-17  8:15       ` Jason Wang
2010-06-17 20:30         ` Zachary Amsden
2010-06-15  7:34 ` [PATCH 04/17] Fix deep C-state TSC desynchronization Zachary Amsden
2010-06-15  8:09   ` Avi Kivity
2010-06-15  8:14     ` Zachary Amsden
2010-06-16  8:10   ` Jason Wang
2010-06-16 18:43     ` Zachary Amsden
2010-06-16 13:24   ` Glauber Costa
2010-06-16 19:20     ` Zachary Amsden
2010-06-15  7:34 ` [PATCH 05/17] Keep SMP VMs more in sync on unstable TSC Zachary Amsden
2010-06-15  8:11   ` Avi Kivity
2010-06-16  8:11   ` Jason Wang
2010-06-16 13:32   ` Glauber Costa
2010-06-16 21:15     ` Zachary Amsden
2010-06-15  7:34 ` [PATCH 06/17] Rename KVM_REQ_KVMCLOCK_UPDATE Zachary Amsden
2010-06-15  7:34 ` [PATCH 07/17] Perform hardware_enable in CPU_STARTING callback Zachary Amsden
2010-06-15  7:34 ` [PATCH 08/17] Add clock sync request to hardware enable Zachary Amsden
2010-06-15  8:24   ` Avi Kivity
2010-06-15  7:34 ` [PATCH 09/17] Move scale_delta into common header Zachary Amsden
2010-06-15  7:34 ` [PATCH 10/17] Make KVM clock computation work for other scales Zachary Amsden
2010-06-15  7:34 ` [PATCH 11/17] Fix a possible backwards warp of kvmclock Zachary Amsden
2010-06-15  8:40   ` Avi Kivity
2010-06-15 20:37     ` Zachary Amsden
2010-06-15 23:47   ` Marcelo Tosatti
2010-06-16  0:21     ` Zachary Amsden [this message]
2010-06-16  0:39       ` Marcelo Tosatti
2010-06-16  8:11   ` Jason Wang
2010-06-16 13:58     ` Glauber Costa
2010-06-16 14:13       ` Avi Kivity
2010-06-16 14:58         ` Glauber Costa
2010-06-16 22:38       ` Zachary Amsden
2010-06-16 19:36     ` Zachary Amsden
2010-06-15  7:34 ` [PATCH 12/17] Add helper function get_kernel_ns Zachary Amsden
2010-06-15  8:41   ` Avi Kivity
2010-06-15 21:03     ` Zachary Amsden
2010-06-15 21:13       ` john stultz
2010-06-16  8:12       ` Jason Wang
2010-06-16 14:03         ` Glauber Costa
2010-06-15  7:34 ` [PATCH 13/17] Add TSC offset tracking Zachary Amsden
2010-06-15  8:44   ` Avi Kivity
2010-06-15  7:34 ` [PATCH 14/17] Fix SVM VMCB reset Zachary Amsden
2010-06-15  7:34 ` [PATCH 15/17] Fix AMD C1 TSC desynchronization Zachary Amsden
2010-06-15  8:47   ` Avi Kivity
2010-06-15  9:21     ` Zachary Amsden
2010-06-15 14:46       ` Roedel, Joerg
2010-06-15  7:34 ` [PATCH 16/17] TSC reset compensation Zachary Amsden
2010-06-15  8:51   ` Avi Kivity
2010-06-15 20:32     ` Zachary Amsden
2010-06-16  0:27   ` Marcelo Tosatti
2010-06-16  0:32     ` Zachary Amsden
2010-06-16 13:52   ` Glauber Costa
2010-06-16 22:36     ` Zachary Amsden
2010-06-15  7:34 ` [PATCH 17/17] Add timekeeping documentation Zachary Amsden
2010-06-15  8:51   ` Avi Kivity
2010-06-15 20:27   ` Randy Dunlap
2010-06-16 23:59     ` Zachary Amsden
2010-06-17  8:55   ` Andi Kleen
2010-06-17 21:14     ` Zachary Amsden
2010-06-18  7:49       ` Andi Kleen
2010-06-18 16:33         ` Zachary Amsden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C1818FD.80603@redhat.com \
    --to=zamsden@redhat.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox