KVM clock synchronization

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM clock synchronization
@ 2010-12-29  5:38 Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 1/4] Make cyc_to_nsec conversions more reliable Zachary Amsden
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Zachary Amsden @ 2010-12-29  5:38 UTC (permalink / raw)
  To: kvm

A further set of improvements to KVM clock.  Now it actually can
stay synchronized on SMP systems with a stable TSC, does not get
destabilized by host suspend, and is resistant to migration.

I did look a bit into the second to last remaining migration issue;
that is, we read the TSCs for all VCPUs at different times, then
write them back at different times as well.  We have compensation
for this already (see patch 1), but there is a possibility that
ordering issues create a backwards condition:

  read VTSC-0
  stop VCPU-0
  read VTSC-1
  stop VCPU-1

In that case, what the compensation code does is reset VTSC-1
back to VTSC-0.  With the above ordering, this is broken.

However, in QEMU, the migration order is:

  stop VCPU-0
  stop VCPU-1
  read VTSC-0
  read VTSC-1

So even if a higher TSC value is read for VTSC-1, no backwards
condition can be generated, as VCPU-1 was not running at the time
and thus could not have observed the higher TSC.

This brings us close to having a perfect KVM clock.

Next steps will be testing to see if this in practice allows us
to drop the atomic backwards protection for KVM clock, and if so,
to implement vread so we have fast system calls for time.

Zach

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [KVM Clock Synchronization 1/4] Make cyc_to_nsec conversions more reliable
  2010-12-29  5:38 KVM clock synchronization Zachary Amsden
@ 2010-12-29  5:38 ` Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend Zachary Amsden
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zachary Amsden @ 2010-12-29  5:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

Rather than use the host CPU TSC rate, which can change, compute
cycle to nanosecond conversion in the guest TSC rate, which is
fixed.  This allows the math for write compensation detection
to be made more reliable.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/x86.c |   58 +++++++++++++++++++++------------------------------
 1 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfcf8fd..b9118f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -964,26 +964,10 @@ static inline u64 get_kernel_ns(void)
 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
 unsigned long max_tsc_khz;
 
-static inline int kvm_tsc_changes_freq(void)
+static inline u64 nsec_to_cycles(struct kvm *kvm, u64 nsec)
 {
-	int cpu = get_cpu();
-	int ret = !boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
-		  cpufreq_quick_get(cpu) != 0;
-	put_cpu();
-	return ret;
-}
-
-static inline u64 nsec_to_cycles(u64 nsec)
-{
-	u64 ret;
-
-	WARN_ON(preemptible());
-	if (kvm_tsc_changes_freq())
-		printk_once(KERN_WARNING
-		 "kvm: unreliable cycle conversion on adjustable rate TSC\n");
-	ret = nsec * __get_cpu_var(cpu_tsc_khz);
-	do_div(ret, USEC_PER_SEC);
-	return ret;
+	return pvclock_scale_delta(nsec, kvm->arch.virtual_tsc_mult,
+				   kvm->arch.virtual_tsc_shift);
 }
 
 static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
@@ -1010,6 +994,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 	u64 offset, ns, elapsed;
 	unsigned long flags;
 	s64 sdiff;
+	u64 delta;
 
 	spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
 	offset = data - native_read_tsc();
@@ -1020,29 +1005,34 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 		sdiff = -sdiff;
 
 	/*
-	 * Special case: close write to TSC within 5 seconds of
-	 * another CPU is interpreted as an attempt to synchronize
-	 * The 5 seconds is to accomodate host load / swapping as
-	 * well as any reset of TSC during the boot process.
+	 * Special case: TSC write with a small delta (1 second) of virtual
+	 * cycle time against real time is interpreted as an attempt
+	 * to synchronize the CPU.
 	 *
-	 * In that case, for a reliable TSC, we can match TSC offsets,
-	 * or make a best guest using elapsed value.
+	 * For a reliable TSC, we can match TSC offsets, and for an
+	 * unstable TSC, we will write the update, but ignore elapsed time
+	 * in this computation. The reason for this is that unstable TSC
+	 * will be compensated by the catchup code, and guest loops which
+	 * continually write the TSC could end up overshooting the TSC if
+	 * the elapsed time is factored in.
 	 */
-	if (sdiff < nsec_to_cycles(5ULL * NSEC_PER_SEC) &&
-	    elapsed < 5ULL * NSEC_PER_SEC) {
+	delta = nsec_to_cycles(kvm, elapsed);
+	sdiff -= delta;
+	if (sdiff < 0)
+		sdiff = -sdiff;
+	if (sdiff < nsec_to_cycles(kvm, NSEC_PER_SEC) ) {
 		if (!check_tsc_unstable()) {
 			offset = kvm->arch.last_tsc_offset;
 			pr_debug("kvm: matched tsc offset for %llu\n", data);
 		} else {
-			u64 delta = nsec_to_cycles(elapsed);
-			offset += delta;
-			pr_debug("kvm: adjusted tsc offset by %llu\n", delta);
+			/* Unstable write; don't add elapsed time */
+			pr_debug("kvm: matched write on unstable tsc\n");
 		}
-		ns = kvm->arch.last_tsc_nsec;
+	} else {
+		kvm->arch.last_tsc_nsec = ns;
+		kvm->arch.last_tsc_write = data;
+		kvm->arch.last_tsc_offset = offset;
 	}
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_offset = offset;
 	kvm_x86_ops->write_tsc_offset(vcpu, offset);
 	spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend
  2010-12-29  5:38 KVM clock synchronization Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 1/4] Make cyc_to_nsec conversions more reliable Zachary Amsden
@ 2010-12-29  5:38 ` Zachary Amsden
  2011-01-04 15:36   ` Marcelo Tosatti
  2010-12-29  5:38 ` [KVM Clock Synchronization 3/4] Refactor KVM clock update code Zachary Amsden
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Zachary Amsden @ 2010-12-29  5:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

During a host suspend, TSC may go backwards, which KVM interprets
as an unstable TSC.  Technically, KVM should not be marking the
TSC unstable, which causes the TSC clocksource to go bad, but
should be adjusting the TSC offsets in such a case.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/x86.c              |   66 +++++++++++++++++++++++++++++++++++---
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9e6fe39..63a82b0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -386,6 +386,7 @@ struct kvm_vcpu_arch {
 	u64 last_kernel_ns;
 	u64 last_tsc_nsec;
 	u64 last_tsc_write;
+	u64 tsc_offset_adjustment;
 	bool tsc_catchup;
 
 	bool nmi_pending;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9118f4..b509c01 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2042,12 +2042,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	}
 
 	kvm_x86_ops->vcpu_load(vcpu, cpu);
+
+	/* Apply any externally detected TSC adjustments (due to suspend) */
+	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
+		kvm_x86_ops->adjust_tsc_offset(vcpu,
+			vcpu->arch.tsc_offset_adjustment);
+		vcpu->arch.tsc_offset_adjustment = 0;
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+	}
 	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
 		/* Make sure TSC doesn't go backwards */
 		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
 				native_read_tsc() - vcpu->arch.last_host_tsc;
 		if (tsc_delta < 0)
-			mark_tsc_unstable("KVM discovered backwards TSC");
+			WARN_ON(!check_tsc_unstable());
 		if (check_tsc_unstable()) {
 			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
 			vcpu->arch.tsc_catchup = 1;
@@ -5778,14 +5786,60 @@ int kvm_arch_hardware_enable(void *garbage)
 {
 	struct kvm *kvm;
 	struct kvm_vcpu *vcpu;
-	int i;
+	int i, ret;
+	u64 local_tsc;
+	u64 max_tsc = 0;
+	bool stable, backwards_tsc = false;
 
 	kvm_shared_msr_cpu_online();
-	list_for_each_entry(kvm, &vm_list, vm_list)
-		kvm_for_each_vcpu(i, vcpu, kvm)
-			if (vcpu->cpu == smp_processor_id())
+	local_tsc = native_read_tsc();
+	stable = !check_tsc_unstable();
+	ret = kvm_x86_ops->hardware_enable(garbage);
+	if (ret)
+		return ret;
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			if (!stable && vcpu->cpu == smp_processor_id())
 				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
-	return kvm_x86_ops->hardware_enable(garbage);
+			if (stable && vcpu->arch.last_host_tsc > local_tsc) {
+				backwards_tsc = true;
+				if (vcpu->arch.last_host_tsc > max_tsc)
+					max_tsc = vcpu->arch.last_host_tsc;
+			}
+		}
+	}
+
+	/*
+	 * Sometimes, reliable TSCs go backwards.  This happens
+	 * on platforms that reset TSC during suspend or hibernate
+	 * actions, but maintain synchronization.  We must compensate.
+	 * Unfortunately, we can't bring the TSCs fully up to date
+	 * with real time.  The reason is that we aren't yet far
+	 * enough into CPU bringup that we know how much real time
+	 * has actually elapsed; our helper function, get_kernel_ns()
+	 * will be using boot variables that haven't been updated yet.
+	 * We simply find the maximum observed TSC above, then record
+	 * the adjustment to TSC in each VCPU.  When the VCPU later
+	 * gets loaded, the adjustment will be applied.  Note that we
+	 * accumulate adjustments, in case multiple suspend cycles
+	 * happen before the VCPU gets a chance to run again.
+	 *
+	 * Note that unreliable TSCs will be compensated already by
+	 * the logic in vcpu_load, which sets the TSC to catchup mode.
+	 * This will catchup all VCPUs to real time, but cannot
+	 * guarantee synchronization.
+	 */
+	if (backwards_tsc) {
+		u64 delta_cyc = max_tsc - local_tsc;
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			kvm_for_each_vcpu(i, vcpu, kvm) {
+				vcpu->arch.tsc_offset_adjustment += delta_cyc;
+				vcpu->arch.last_host_tsc = 0;
+			}
+	}
+
+	return 0;
 }
 
 void kvm_arch_hardware_disable(void *garbage)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend
  2010-12-29  5:38 ` [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend Zachary Amsden
@ 2011-01-04 15:36   ` Marcelo Tosatti
  2011-01-05  4:43     ` Zachary Amsden
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2011-01-04 15:36 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Tue, Dec 28, 2010 at 07:38:18PM -1000, Zachary Amsden wrote:
> During a host suspend, TSC may go backwards, which KVM interprets
> as an unstable TSC.  Technically, KVM should not be marking the
> TSC unstable, which causes the TSC clocksource to go bad, but
> should be adjusting the TSC offsets in such a case.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/x86.c              |   66 +++++++++++++++++++++++++++++++++++---
>  2 files changed, 61 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9e6fe39..63a82b0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -386,6 +386,7 @@ struct kvm_vcpu_arch {
>  	u64 last_kernel_ns;
>  	u64 last_tsc_nsec;
>  	u64 last_tsc_write;
> +	u64 tsc_offset_adjustment;
>  	bool tsc_catchup;
>  
>  	bool nmi_pending;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b9118f4..b509c01 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2042,12 +2042,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	}
>  
>  	kvm_x86_ops->vcpu_load(vcpu, cpu);
> +
> +	/* Apply any externally detected TSC adjustments (due to suspend) */
> +	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
> +		kvm_x86_ops->adjust_tsc_offset(vcpu,
> +			vcpu->arch.tsc_offset_adjustment);
> +		vcpu->arch.tsc_offset_adjustment = 0;
> +		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> +	}
>  	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>  		/* Make sure TSC doesn't go backwards */
>  		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>  				native_read_tsc() - vcpu->arch.last_host_tsc;
>  		if (tsc_delta < 0)
> -			mark_tsc_unstable("KVM discovered backwards TSC");
> +			WARN_ON(!check_tsc_unstable());
>  		if (check_tsc_unstable()) {
>  			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>  			vcpu->arch.tsc_catchup = 1;
> @@ -5778,14 +5786,60 @@ int kvm_arch_hardware_enable(void *garbage)
>  {
>  	struct kvm *kvm;
>  	struct kvm_vcpu *vcpu;
> -	int i;
> +	int i, ret;
> +	u64 local_tsc;
> +	u64 max_tsc = 0;
> +	bool stable, backwards_tsc = false;
>  
>  	kvm_shared_msr_cpu_online();
> -	list_for_each_entry(kvm, &vm_list, vm_list)
> -		kvm_for_each_vcpu(i, vcpu, kvm)
> -			if (vcpu->cpu == smp_processor_id())
> +	local_tsc = native_read_tsc();
> +	stable = !check_tsc_unstable();
> +	ret = kvm_x86_ops->hardware_enable(garbage);
> +	if (ret)
> +		return ret;
> +
> +	list_for_each_entry(kvm, &vm_list, vm_list) {
> +		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			if (!stable && vcpu->cpu == smp_processor_id())
>  				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> -	return kvm_x86_ops->hardware_enable(garbage);
> +			if (stable && vcpu->arch.last_host_tsc > local_tsc) {
> +				backwards_tsc = true;
> +				if (vcpu->arch.last_host_tsc > max_tsc)
> +					max_tsc = vcpu->arch.last_host_tsc;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * Sometimes, reliable TSCs go backwards.  This happens
> +	 * on platforms that reset TSC during suspend or hibernate
> +	 * actions, but maintain synchronization.  We must compensate.
> +	 * Unfortunately, we can't bring the TSCs fully up to date
> +	 * with real time.  The reason is that we aren't yet far
> +	 * enough into CPU bringup that we know how much real time
> +	 * has actually elapsed; our helper function, get_kernel_ns()
> +	 * will be using boot variables that haven't been updated yet.
> +	 * We simply find the maximum observed TSC above, then record
> +	 * the adjustment to TSC in each VCPU.  When the VCPU later
> +	 * gets loaded, the adjustment will be applied.  Note that we
> +	 * accumulate adjustments, in case multiple suspend cycles
> +	 * happen before the VCPU gets a chance to run again.
> +	 *
> +	 * Note that unreliable TSCs will be compensated already by
> +	 * the logic in vcpu_load, which sets the TSC to catchup mode.
> +	 * This will catchup all VCPUs to real time, but cannot
> +	 * guarantee synchronization.
> +	 */
> +	if (backwards_tsc) {
> +		u64 delta_cyc = max_tsc - local_tsc;
> +		list_for_each_entry(kvm, &vm_list, vm_list)
> +			kvm_for_each_vcpu(i, vcpu, kvm) {
> +				vcpu->arch.tsc_offset_adjustment += delta_cyc;
> +				vcpu->arch.last_host_tsc = 0;
> +			}
> +	}
> +
> +	return 0;
>  }

Wouldnt it be simpler to use cyc2ns_offset (or something equivalent)? In
any case, you forgot to compare smp_processor_id.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend
  2011-01-04 15:36   ` Marcelo Tosatti
@ 2011-01-05  4:43     ` Zachary Amsden
  2011-01-05 11:44       ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Zachary Amsden @ 2011-01-05  4:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On 01/04/2011 05:36 AM, Marcelo Tosatti wrote:
> On Tue, Dec 28, 2010 at 07:38:18PM -1000, Zachary Amsden wrote:
>    
>> During a host suspend, TSC may go backwards, which KVM interprets
>> as an unstable TSC.  Technically, KVM should not be marking the
>> TSC unstable, which causes the TSC clocksource to go bad, but
>> should be adjusting the TSC offsets in such a case.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |    1 +
>>   arch/x86/kvm/x86.c              |   66 +++++++++++++++++++++++++++++++++++---
>>   2 files changed, 61 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 9e6fe39..63a82b0 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -386,6 +386,7 @@ struct kvm_vcpu_arch {
>>   	u64 last_kernel_ns;
>>   	u64 last_tsc_nsec;
>>   	u64 last_tsc_write;
>> +	u64 tsc_offset_adjustment;
>>   	bool tsc_catchup;
>>
>>   	bool nmi_pending;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b9118f4..b509c01 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2042,12 +2042,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>   	}
>>
>>   	kvm_x86_ops->vcpu_load(vcpu, cpu);
>> +
>> +	/* Apply any externally detected TSC adjustments (due to suspend) */
>> +	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
>> +		kvm_x86_ops->adjust_tsc_offset(vcpu,
>> +			vcpu->arch.tsc_offset_adjustment);
>> +		vcpu->arch.tsc_offset_adjustment = 0;
>> +		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> +	}
>>   	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>   		/* Make sure TSC doesn't go backwards */
>>   		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>>   				native_read_tsc() - vcpu->arch.last_host_tsc;
>>   		if (tsc_delta<  0)
>> -			mark_tsc_unstable("KVM discovered backwards TSC");
>> +			WARN_ON(!check_tsc_unstable());
>>   		if (check_tsc_unstable()) {
>>   			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>>   			vcpu->arch.tsc_catchup = 1;
>> @@ -5778,14 +5786,60 @@ int kvm_arch_hardware_enable(void *garbage)
>>   {
>>   	struct kvm *kvm;
>>   	struct kvm_vcpu *vcpu;
>> -	int i;
>> +	int i, ret;
>> +	u64 local_tsc;
>> +	u64 max_tsc = 0;
>> +	bool stable, backwards_tsc = false;
>>
>>   	kvm_shared_msr_cpu_online();
>> -	list_for_each_entry(kvm,&vm_list, vm_list)
>> -		kvm_for_each_vcpu(i, vcpu, kvm)
>> -			if (vcpu->cpu == smp_processor_id())
>> +	local_tsc = native_read_tsc();
>> +	stable = !check_tsc_unstable();
>> +	ret = kvm_x86_ops->hardware_enable(garbage);
>> +	if (ret)
>> +		return ret;
>> +
>> +	list_for_each_entry(kvm,&vm_list, vm_list) {
>> +		kvm_for_each_vcpu(i, vcpu, kvm) {
>> +			if (!stable&&  vcpu->cpu == smp_processor_id())
>>   				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> -	return kvm_x86_ops->hardware_enable(garbage);
>> +			if (stable&&  vcpu->arch.last_host_tsc>  local_tsc) {
>> +				backwards_tsc = true;
>> +				if (vcpu->arch.last_host_tsc>  max_tsc)
>> +					max_tsc = vcpu->arch.last_host_tsc;
>> +			}
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Sometimes, reliable TSCs go backwards.  This happens
>> +	 * on platforms that reset TSC during suspend or hibernate
>> +	 * actions, but maintain synchronization.  We must compensate.
>> +	 * Unfortunately, we can't bring the TSCs fully up to date
>> +	 * with real time.  The reason is that we aren't yet far
>> +	 * enough into CPU bringup that we know how much real time
>> +	 * has actually elapsed; our helper function, get_kernel_ns()
>> +	 * will be using boot variables that haven't been updated yet.
>> +	 * We simply find the maximum observed TSC above, then record
>> +	 * the adjustment to TSC in each VCPU.  When the VCPU later
>> +	 * gets loaded, the adjustment will be applied.  Note that we
>> +	 * accumulate adjustments, in case multiple suspend cycles
>> +	 * happen before the VCPU gets a chance to run again.
>> +	 *
>> +	 * Note that unreliable TSCs will be compensated already by
>> +	 * the logic in vcpu_load, which sets the TSC to catchup mode.
>> +	 * This will catchup all VCPUs to real time, but cannot
>> +	 * guarantee synchronization.
>> +	 */
>> +	if (backwards_tsc) {
>> +		u64 delta_cyc = max_tsc - local_tsc;
>> +		list_for_each_entry(kvm,&vm_list, vm_list)
>> +			kvm_for_each_vcpu(i, vcpu, kvm) {
>> +				vcpu->arch.tsc_offset_adjustment += delta_cyc;
>> +				vcpu->arch.last_host_tsc = 0;
>> +			}
>> +	}
>> +
>> +	return 0;
>>   }
>>      
> Wouldnt it be simpler to use cyc2ns_offset (or something equivalent)? In
> any case, you forgot to compare smp_processor_id.
>    

I don't think so, as here, we're already dealing in units of cycles, and 
we don't want to just fix the kvmclock, we want to make sure the TSC 
also does not go backwards.

And we deliberately do not check smp_processor_id.  The reasoning is - 
if the TSC has gone backwards, but we have a stable TSC, it means all 
TSCs have gone backwards together, and so should all be adjusted 
equally.  Note that backwards_tsc is only set if TSC is marked as stable.

Zach

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend
  2011-01-05  4:43     ` Zachary Amsden
@ 2011-01-05 11:44       ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2011-01-05 11:44 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Tue, Jan 04, 2011 at 06:43:10PM -1000, Zachary Amsden wrote:
> On 01/04/2011 05:36 AM, Marcelo Tosatti wrote:
> >On Tue, Dec 28, 2010 at 07:38:18PM -1000, Zachary Amsden wrote:
> >>During a host suspend, TSC may go backwards, which KVM interprets
> >>as an unstable TSC.  Technically, KVM should not be marking the
> >>TSC unstable, which causes the TSC clocksource to go bad, but
> >>should be adjusting the TSC offsets in such a case.
> >>
> >>Signed-off-by: Zachary Amsden<zamsden@redhat.com>
> >>---
> >>  arch/x86/include/asm/kvm_host.h |    1 +
> >>  arch/x86/kvm/x86.c              |   66 +++++++++++++++++++++++++++++++++++---
> >>  2 files changed, 61 insertions(+), 6 deletions(-)
> >>
> >>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>index 9e6fe39..63a82b0 100644
> >>--- a/arch/x86/include/asm/kvm_host.h
> >>+++ b/arch/x86/include/asm/kvm_host.h
> >>@@ -386,6 +386,7 @@ struct kvm_vcpu_arch {
> >>  	u64 last_kernel_ns;
> >>  	u64 last_tsc_nsec;
> >>  	u64 last_tsc_write;
> >>+	u64 tsc_offset_adjustment;
> >>  	bool tsc_catchup;
> >>
> >>  	bool nmi_pending;
> >>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>index b9118f4..b509c01 100644
> >>--- a/arch/x86/kvm/x86.c
> >>+++ b/arch/x86/kvm/x86.c
> >>@@ -2042,12 +2042,20 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  	}
> >>
> >>  	kvm_x86_ops->vcpu_load(vcpu, cpu);
> >>+
> >>+	/* Apply any externally detected TSC adjustments (due to suspend) */
> >>+	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
> >>+		kvm_x86_ops->adjust_tsc_offset(vcpu,
> >>+			vcpu->arch.tsc_offset_adjustment);
> >>+		vcpu->arch.tsc_offset_adjustment = 0;
> >>+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> >>+	}
> >>  	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
> >>  		/* Make sure TSC doesn't go backwards */
> >>  		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
> >>  				native_read_tsc() - vcpu->arch.last_host_tsc;
> >>  		if (tsc_delta<  0)
> >>-			mark_tsc_unstable("KVM discovered backwards TSC");
> >>+			WARN_ON(!check_tsc_unstable());
> >>  		if (check_tsc_unstable()) {
> >>  			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
> >>  			vcpu->arch.tsc_catchup = 1;
> >>@@ -5778,14 +5786,60 @@ int kvm_arch_hardware_enable(void *garbage)
> >>  {
> >>  	struct kvm *kvm;
> >>  	struct kvm_vcpu *vcpu;
> >>-	int i;
> >>+	int i, ret;
> >>+	u64 local_tsc;
> >>+	u64 max_tsc = 0;
> >>+	bool stable, backwards_tsc = false;
> >>
> >>  	kvm_shared_msr_cpu_online();
> >>-	list_for_each_entry(kvm,&vm_list, vm_list)
> >>-		kvm_for_each_vcpu(i, vcpu, kvm)
> >>-			if (vcpu->cpu == smp_processor_id())
> >>+	local_tsc = native_read_tsc();
> >>+	stable = !check_tsc_unstable();
> >>+	ret = kvm_x86_ops->hardware_enable(garbage);
> >>+	if (ret)
> >>+		return ret;
> >>+
> >>+	list_for_each_entry(kvm,&vm_list, vm_list) {
> >>+		kvm_for_each_vcpu(i, vcpu, kvm) {
> >>+			if (!stable&&  vcpu->cpu == smp_processor_id())
> >>  				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> >>-	return kvm_x86_ops->hardware_enable(garbage);
> >>+			if (stable&&  vcpu->arch.last_host_tsc>  local_tsc) {
> >>+				backwards_tsc = true;
> >>+				if (vcpu->arch.last_host_tsc>  max_tsc)
> >>+					max_tsc = vcpu->arch.last_host_tsc;
> >>+			}
> >>+		}
> >>+	}
> >>+
> >>+	/*
> >>+	 * Sometimes, reliable TSCs go backwards.  This happens
> >>+	 * on platforms that reset TSC during suspend or hibernate
> >>+	 * actions, but maintain synchronization.  We must compensate.
> >>+	 * Unfortunately, we can't bring the TSCs fully up to date
> >>+	 * with real time.  The reason is that we aren't yet far
> >>+	 * enough into CPU bringup that we know how much real time
> >>+	 * has actually elapsed; our helper function, get_kernel_ns()
> >>+	 * will be using boot variables that haven't been updated yet.
> >>+	 * We simply find the maximum observed TSC above, then record
> >>+	 * the adjustment to TSC in each VCPU.  When the VCPU later
> >>+	 * gets loaded, the adjustment will be applied.  Note that we
> >>+	 * accumulate adjustments, in case multiple suspend cycles
> >>+	 * happen before the VCPU gets a chance to run again.
> >>+	 *
> >>+	 * Note that unreliable TSCs will be compensated already by
> >>+	 * the logic in vcpu_load, which sets the TSC to catchup mode.
> >>+	 * This will catchup all VCPUs to real time, but cannot
> >>+	 * guarantee synchronization.
> >>+	 */
> >>+	if (backwards_tsc) {
> >>+		u64 delta_cyc = max_tsc - local_tsc;
> >>+		list_for_each_entry(kvm,&vm_list, vm_list)
> >>+			kvm_for_each_vcpu(i, vcpu, kvm) {
> >>+				vcpu->arch.tsc_offset_adjustment += delta_cyc;
> >>+				vcpu->arch.last_host_tsc = 0;
> >>+			}
> >>+	}
> >>+
> >>+	return 0;
> >>  }
> >Wouldnt it be simpler to use cyc2ns_offset (or something equivalent)? In
> >any case, you forgot to compare smp_processor_id.
> 
> I don't think so, as here, we're already dealing in units of cycles,
> and we don't want to just fix the kvmclock, we want to make sure the
> TSC also does not go backwards.

I meant calculate the backwards delta similarly to cyc2ns_offset, and
use that to adjust offset on vcpu_load, instead of calculating manually
via last_host_tsc.

> And we deliberately do not check smp_processor_id.  The reasoning is
> - if the TSC has gone backwards, but we have a stable TSC, it means
> all TSCs have gone backwards together, and so should all be adjusted
> equally.  Note that backwards_tsc is only set if TSC is marked as
> stable.

OK. 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [KVM Clock Synchronization 3/4] Refactor KVM clock update code
  2010-12-29  5:38 KVM clock synchronization Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 1/4] Make cyc_to_nsec conversions more reliable Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend Zachary Amsden
@ 2010-12-29  5:38 ` Zachary Amsden
  2010-12-29  5:38 ` [KVM Clock Synchronization 4/4] Add master clock for KVM clock Zachary Amsden
  2011-01-05 12:17 ` KVM clock synchronization Marcelo Tosatti
  4 siblings, 0 replies; 11+ messages in thread
From: Zachary Amsden @ 2010-12-29  5:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

Refactor this to make upcoming steps easier to follow.
This should be 100% code motion and renaming.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kvm/x86.c              |   77 +++++++++++++++++++++++----------------
 2 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 63a82b0..8d829b8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -444,7 +444,7 @@ struct kvm_arch {
 
 	unsigned long irq_sources_bitmap;
 	s64 kvmclock_offset;
-	spinlock_t tsc_write_lock;
+	spinlock_t clock_lock;
 	u64 last_tsc_nsec;
 	u64 last_tsc_offset;
 	u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b509c01..59d5999 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -996,7 +996,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 	s64 sdiff;
 	u64 delta;
 
-	spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+	spin_lock_irqsave(&kvm->arch.clock_lock, flags);
 	offset = data - native_read_tsc();
 	ns = get_kernel_ns();
 	elapsed = ns - kvm->arch.last_tsc_nsec;
@@ -1034,7 +1034,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 		kvm->arch.last_tsc_offset = offset;
 	}
 	kvm_x86_ops->write_tsc_offset(vcpu, offset);
-	spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+	spin_unlock_irqrestore(&kvm->arch.clock_lock, flags);
 
 	/* Reset of TSC must disable overshoot protection below */
 	vcpu->arch.hv_clock.tsc_timestamp = 0;
@@ -1043,22 +1043,50 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
+static void update_pvclock(struct kvm_vcpu *v,
+			   struct pvclock_vcpu_time_info *pvclock,
+			   u64 tsc_timestamp,
+			   s64 kernel_ns,
+			   unsigned long tsc_khz)
+{
+	if (unlikely(v->arch.hw_tsc_khz != tsc_khz)) {
+		kvm_get_time_scale(NSEC_PER_SEC / 1000, tsc_khz,
+				   &pvclock->tsc_shift,
+				   &pvclock->tsc_to_system_mul);
+		v->arch.hw_tsc_khz = tsc_khz;
+	}
+	pvclock->tsc_timestamp = tsc_timestamp;
+	pvclock->system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
+	pvclock->flags = 0;
+}
+
+static void update_user_kvmclock(struct kvm_vcpu *v,
+				 struct pvclock_vcpu_time_info *pvclock)
+{
+	struct kvm_vcpu_arch *vcpu = &v->arch;
+	void *shared_kaddr;
+
+	shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
+	memcpy(shared_kaddr + vcpu->time_offset, pvclock, sizeof(*pvclock));
+	kunmap_atomic(shared_kaddr, KM_USER0);
+	mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
+}
+
 static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
 	unsigned long flags;
 	struct kvm_vcpu_arch *vcpu = &v->arch;
-	void *shared_kaddr;
-	unsigned long this_tsc_khz;
-	s64 kernel_ns, max_kernel_ns;
+	unsigned long tsc_khz;
+	s64 kernel_ns;
 	u64 tsc_timestamp;
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
 	kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp);
 	kernel_ns = get_kernel_ns();
-	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
+	tsc_khz = __get_cpu_var(cpu_tsc_khz);
 
-	if (unlikely(this_tsc_khz == 0)) {
+	if (unlikely(tsc_khz == 0)) {
 		local_irq_restore(flags);
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		return 1;
@@ -1108,32 +1136,23 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	 * guest.  To protect against this, we must compute the system time as
 	 * observed by the guest and ensure the new system time is greater.
 	 */
-	max_kernel_ns = 0;
 	if (vcpu->hv_clock.tsc_timestamp && vcpu->last_guest_tsc) {
-		max_kernel_ns = vcpu->last_guest_tsc -
-				vcpu->hv_clock.tsc_timestamp;
+		s64 max_kernel_ns = vcpu->last_guest_tsc -
+				    vcpu->hv_clock.tsc_timestamp;
 		max_kernel_ns = pvclock_scale_delta(max_kernel_ns,
 				    vcpu->hv_clock.tsc_to_system_mul,
 				    vcpu->hv_clock.tsc_shift);
 		max_kernel_ns += vcpu->last_kernel_ns;
+		if (max_kernel_ns > kernel_ns)
+			kernel_ns = max_kernel_ns;
 	}
 
-	if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
-		kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
-				   &vcpu->hv_clock.tsc_shift,
-				   &vcpu->hv_clock.tsc_to_system_mul);
-		vcpu->hw_tsc_khz = this_tsc_khz;
-	}
-
-	if (max_kernel_ns > kernel_ns)
-		kernel_ns = max_kernel_ns;
-
-	/* With all the info we got, fill in the values */
-	vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
-	vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
+	/* Record the last values observed for next time */
 	vcpu->last_kernel_ns = kernel_ns;
 	vcpu->last_guest_tsc = tsc_timestamp;
-	vcpu->hv_clock.flags = 0;
+
+	/* Compute new clock values */
+	update_pvclock(v, &vcpu->hv_clock, tsc_timestamp, kernel_ns, tsc_khz);
 
 	/*
 	 * The interface expects us to write an even number signaling that the
@@ -1142,14 +1161,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	 */
 	vcpu->hv_clock.version += 2;
 
-	shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
-
-	memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
-	       sizeof(vcpu->hv_clock));
+	update_user_kvmclock(v, &vcpu->hv_clock);
 
-	kunmap_atomic(shared_kaddr, KM_USER0);
-
-	mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
 	return 0;
 }
 
@@ -5951,7 +5964,7 @@ struct  kvm *kvm_arch_create_vm(void)
 	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
 	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
 
-	spin_lock_init(&kvm->arch.tsc_write_lock);
+	spin_lock_init(&kvm->arch.clock_lock);
 
 	return kvm;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [KVM Clock Synchronization 4/4] Add master clock for KVM clock
  2010-12-29  5:38 KVM clock synchronization Zachary Amsden
                   ` (2 preceding siblings ...)
  2010-12-29  5:38 ` [KVM Clock Synchronization 3/4] Refactor KVM clock update code Zachary Amsden
@ 2010-12-29  5:38 ` Zachary Amsden
  2011-01-04 18:20   ` Marcelo Tosatti
  2011-01-05 12:17 ` KVM clock synchronization Marcelo Tosatti
  4 siblings, 1 reply; 11+ messages in thread
From: Zachary Amsden @ 2010-12-29  5:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

On systems with synchronized TSCs, we still have VCPU individual
KVM clocks, each with their own computed offset.  As this all happens
at different times, the computed KVM clock offset can vary, causing a
globally visible backwards clock.  Currently this is protected against
by using an atomic compare to ensure it does not happen.

This change should remove that requirement.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/x86.c              |   42 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8d829b8..ff651b7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -445,6 +445,7 @@ struct kvm_arch {
 	unsigned long irq_sources_bitmap;
 	s64 kvmclock_offset;
 	spinlock_t clock_lock;
+	struct pvclock_vcpu_time_info master_clock;
 	u64 last_tsc_nsec;
 	u64 last_tsc_offset;
 	u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59d5999..a339e50 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1116,6 +1116,38 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 		return 0;
 
 	/*
+	 * If there is a stable TSC, we use a master reference clock for
+	 * the KVM clock; otherwise, individual computations for each VCPU
+	 * would exhibit slight drift relative to each other, which could
+	 * cause global time to go backwards.
+	 *
+	 * If the master clock has no TSC timestamp, that means we must
+	 * recompute the clock as either some real time has elapsed during
+	 * a suspend cycle, or we are measuring the clock for the first time
+	 * during VM creation (or following a migration).  Since master clock
+	 * changes should happen only at rare occasions, so we can ignore
+	 * the precautions below.
+	 */
+	if (!check_tsc_unstable()) {
+		struct pvclock_vcpu_time_info *master =
+						&v->kvm->arch.master_clock;
+		if (vcpu->hv_clock.version != master->version) {
+			spin_lock(&v->kvm->arch.clock_lock);
+			WARN_ON(master->version < vcpu->hv_clock.version);
+			if (!master->tsc_timestamp) {
+				pr_debug("KVM: computing new master clock\n");
+				update_pvclock(v, master, tsc_timestamp,
+						kernel_ns, tsc_khz);
+			}
+			memcpy(&vcpu->hv_clock, master, sizeof(*master));
+			spin_unlock(&v->kvm->arch.clock_lock);
+			update_user_kvmclock(v, &vcpu->hv_clock);
+		} else
+			pr_debug("ignoring spurious KVM clock update");
+		return 0;
+	}
+
+	/*
 	 * Time as measured by the TSC may go backwards when resetting the base
 	 * tsc_timestamp.  The reason for this is that the TSC resolution is
 	 * higher than the resolution of the other clock scales.  Thus, many
@@ -3482,7 +3514,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		now_ns = get_kernel_ns();
 		delta = user_ns.clock - now_ns;
+		spin_lock(&kvm->arch.clock_lock);
+		kvm->arch.master_clock.version += 2;
+		kvm->arch.master_clock.tsc_timestamp = 0;
 		kvm->arch.kvmclock_offset = delta;
+		spin_unlock(&kvm->arch.clock_lock);
 		break;
 	}
 	case KVM_GET_CLOCK: {
@@ -5845,11 +5881,14 @@ int kvm_arch_hardware_enable(void *garbage)
 	 */
 	if (backwards_tsc) {
 		u64 delta_cyc = max_tsc - local_tsc;
-		list_for_each_entry(kvm, &vm_list, vm_list)
+		list_for_each_entry(kvm, &vm_list, vm_list) {
 			kvm_for_each_vcpu(i, vcpu, kvm) {
 				vcpu->arch.tsc_offset_adjustment += delta_cyc;
 				vcpu->arch.last_host_tsc = 0;
 			}
+			kvm->arch.master_clock.tsc_timestamp = 0;
+			kvm->arch.master_clock.version += 2;
+		}
 	}
 
 	return 0;
@@ -5965,6 +6004,7 @@ struct  kvm *kvm_arch_create_vm(void)
 	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
 
 	spin_lock_init(&kvm->arch.clock_lock);
+	kvm->arch.master_clock.version = 1000;
 
 	return kvm;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [KVM Clock Synchronization 4/4] Add master clock for KVM clock
  2010-12-29  5:38 ` [KVM Clock Synchronization 4/4] Add master clock for KVM clock Zachary Amsden
@ 2011-01-04 18:20   ` Marcelo Tosatti
  2011-01-04 21:50     ` Zachary Amsden
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2011-01-04 18:20 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Tue, Dec 28, 2010 at 07:38:20PM -1000, Zachary Amsden wrote:
> On systems with synchronized TSCs, we still have VCPU individual
> KVM clocks, each with their own computed offset.  As this all happens
> at different times, the computed KVM clock offset can vary, causing a
> globally visible backwards clock.  Currently this is protected against
> by using an atomic compare to ensure it does not happen.
> 
> This change should remove that requirement.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/x86.c              |   42 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 42 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8d829b8..ff651b7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -445,6 +445,7 @@ struct kvm_arch {
>  	unsigned long irq_sources_bitmap;
>  	s64 kvmclock_offset;
>  	spinlock_t clock_lock;
> +	struct pvclock_vcpu_time_info master_clock;
>  	u64 last_tsc_nsec;
>  	u64 last_tsc_offset;
>  	u64 last_tsc_write;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 59d5999..a339e50 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1116,6 +1116,38 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  		return 0;
>  
>  	/*
> +	 * If there is a stable TSC, we use a master reference clock for
> +	 * the KVM clock; otherwise, individual computations for each VCPU
> +	 * would exhibit slight drift relative to each other, which could
> +	 * cause global time to go backwards.
> +	 *
> +	 * If the master clock has no TSC timestamp, that means we must
> +	 * recompute the clock as either some real time has elapsed during
> +	 * a suspend cycle, or we are measuring the clock for the first time
> +	 * during VM creation (or following a migration).  Since master clock
> +	 * changes should happen only at rare occasions, so we can ignore
> +	 * the precautions below.
> +	 */
> +	if (!check_tsc_unstable()) {
> +		struct pvclock_vcpu_time_info *master =
> +						&v->kvm->arch.master_clock;
> +		if (vcpu->hv_clock.version != master->version) {
> +			spin_lock(&v->kvm->arch.clock_lock);
> +			WARN_ON(master->version < vcpu->hv_clock.version);
> +			if (!master->tsc_timestamp) {
> +				pr_debug("KVM: computing new master clock\n");
> +				update_pvclock(v, master, tsc_timestamp,
> +						kernel_ns, tsc_khz);
> +			}
> +			memcpy(&vcpu->hv_clock, master, sizeof(*master));
> +			spin_unlock(&v->kvm->arch.clock_lock);
> +			update_user_kvmclock(v, &vcpu->hv_clock);
> +		} else
> +			pr_debug("ignoring spurious KVM clock update");
> +		return 0;
> +	}

This assumes guest TSC is synchronized across vcpus... Is this always
true?

Also, for stable TSC hosts, kvmclock update is performed only on VM
creation / host resume these days... Can you describe the problem in
more detail?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [KVM Clock Synchronization 4/4] Add master clock for KVM clock
  2011-01-04 18:20   ` Marcelo Tosatti
@ 2011-01-04 21:50     ` Zachary Amsden
  0 siblings, 0 replies; 11+ messages in thread
From: Zachary Amsden @ 2011-01-04 21:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On 01/04/2011 08:20 AM, Marcelo Tosatti wrote:
> On Tue, Dec 28, 2010 at 07:38:20PM -1000, Zachary Amsden wrote:
>    
>> On systems with synchronized TSCs, we still have VCPU individual
>> KVM clocks, each with their own computed offset.  As this all happens
>> at different times, the computed KVM clock offset can vary, causing a
>> globally visible backwards clock.  Currently this is protected against
>> by using an atomic compare to ensure it does not happen.
>>
>> This change should remove that requirement.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |    1 +
>>   arch/x86/kvm/x86.c              |   42 ++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 42 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 8d829b8..ff651b7 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -445,6 +445,7 @@ struct kvm_arch {
>>   	unsigned long irq_sources_bitmap;
>>   	s64 kvmclock_offset;
>>   	spinlock_t clock_lock;
>> +	struct pvclock_vcpu_time_info master_clock;
>>   	u64 last_tsc_nsec;
>>   	u64 last_tsc_offset;
>>   	u64 last_tsc_write;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 59d5999..a339e50 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1116,6 +1116,38 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>   		return 0;
>>
>>   	/*
>> +	 * If there is a stable TSC, we use a master reference clock for
>> +	 * the KVM clock; otherwise, individual computations for each VCPU
>> +	 * would exhibit slight drift relative to each other, which could
>> +	 * cause global time to go backwards.
>> +	 *
>> +	 * If the master clock has no TSC timestamp, that means we must
>> +	 * recompute the clock as either some real time has elapsed during
>> +	 * a suspend cycle, or we are measuring the clock for the first time
>> +	 * during VM creation (or following a migration).  Since master clock
>> +	 * changes should happen only at rare occasions, so we can ignore
>> +	 * the precautions below.
>> +	 */
>> +	if (!check_tsc_unstable()) {
>> +		struct pvclock_vcpu_time_info *master =
>> +						&v->kvm->arch.master_clock;
>> +		if (vcpu->hv_clock.version != master->version) {
>> +			spin_lock(&v->kvm->arch.clock_lock);
>> +			WARN_ON(master->version<  vcpu->hv_clock.version);
>> +			if (!master->tsc_timestamp) {
>> +				pr_debug("KVM: computing new master clock\n");
>> +				update_pvclock(v, master, tsc_timestamp,
>> +						kernel_ns, tsc_khz);
>> +			}
>> +			memcpy(&vcpu->hv_clock, master, sizeof(*master));
>> +			spin_unlock(&v->kvm->arch.clock_lock);
>> +			update_user_kvmclock(v,&vcpu->hv_clock);
>> +		} else
>> +			pr_debug("ignoring spurious KVM clock update");
>> +		return 0;
>> +	}
>>      
> This assumes guest TSC is synchronized across vcpus... Is this always
> true?
>    

By the kernel definition of stable TSC, yes.

> Also, for stable TSC hosts, kvmclock update is performed only on VM
> creation / host resume these days... Can you describe the problem in
> more detail?
>    

The problem is that even if it is done only once, all the VCPUs perform 
the kvmclock update at different times, so they measure different 
kernel_ns values and hardware TSC values.  There will be a spread of 
measurement there, which is only +/- a few hundred cycles, but since 
there is a difference, the global view of kvm clock across multiple 
VCPUs can go backwards.

Zach

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM clock synchronization
  2010-12-29  5:38 KVM clock synchronization Zachary Amsden
                   ` (3 preceding siblings ...)
  2010-12-29  5:38 ` [KVM Clock Synchronization 4/4] Add master clock for KVM clock Zachary Amsden
@ 2011-01-05 12:17 ` Marcelo Tosatti
  4 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2011-01-05 12:17 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm

On Tue, Dec 28, 2010 at 07:38:16PM -1000, Zachary Amsden wrote:
> A further set of improvements to KVM clock.  Now it actually can
> stay synchronized on SMP systems with a stable TSC, does not get
> destabilized by host suspend, and is resistant to migration.
> 
> I did look a bit into the second to last remaining migration issue;
> that is, we read the TSCs for all VCPUs at different times, then
> write them back at different times as well.  We have compensation
> for this already (see patch 1), but there is a possibility that
> ordering issues create a backwards condition:
> 
>   read VTSC-0
>   stop VCPU-0
>   read VTSC-1
>   stop VCPU-1
> 
> In that case, what the compensation code does is reset VTSC-1
> back to VTSC-0.  With the above ordering, this is broken.
> 
> However, in QEMU, the migration order is:
> 
>   stop VCPU-0
>   stop VCPU-1
>   read VTSC-0
>   read VTSC-1
> 
> So even if a higher TSC value is read for VTSC-1, no backwards
> condition can be generated, as VCPU-1 was not running at the time
> and thus could not have observed the higher TSC.
> 
> This brings us close to having a perfect KVM clock.
> 
> Next steps will be testing to see if this in practice allows us
> to drop the atomic backwards protection for KVM clock, and if so,
> to implement vread so we have fast system calls for time.
> 
> Zach

Patchset looks good to me.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-01-05 12:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-29  5:38 KVM clock synchronization Zachary Amsden
2010-12-29  5:38 ` [KVM Clock Synchronization 1/4] Make cyc_to_nsec conversions more reliable Zachary Amsden
2010-12-29  5:38 ` [KVM Clock Synchronization 2/4] Keep TSC synchronized across host suspend Zachary Amsden
2011-01-04 15:36   ` Marcelo Tosatti
2011-01-05  4:43     ` Zachary Amsden
2011-01-05 11:44       ` Marcelo Tosatti
2010-12-29  5:38 ` [KVM Clock Synchronization 3/4] Refactor KVM clock update code Zachary Amsden
2010-12-29  5:38 ` [KVM Clock Synchronization 4/4] Add master clock for KVM clock Zachary Amsden
2011-01-04 18:20   ` Marcelo Tosatti
2011-01-04 21:50     ` Zachary Amsden
2011-01-05 12:17 ` KVM clock synchronization Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox