[KVM timekeeping fixes 1/4] Fix kvmclock bug

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [KVM timekeeping fixes 1/4] Fix kvmclock bug
@ 2010-09-19  0:38 Zachary Amsden
  2010-09-19  0:38 ` [KVM timekeeping fixes 2/4] Make math work for other scales Zachary Amsden
  2010-09-20 15:41 ` [KVM timekeeping fixes 1/4] Fix kvmclock bug Marcelo Tosatti
  0 siblings, 2 replies; 10+ messages in thread
From: Zachary Amsden @ 2010-09-19  0:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

If preempted after kvmclock values are updated, but before hardware
virtualization is entered, the last tsc time as read by the guest is
never set.  It underflows the next time kvmclock is updated if there
has not yet been a successful entry / exit into hardware virt.

Fix this by simply setting last_tsc to the newly read tsc value so
that any computed nsec advance of kvmclock is nulled.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/x86.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a51635e..0b021e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1095,6 +1095,7 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
 	vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
 	vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
 	vcpu->last_kernel_ns = kernel_ns;
+	vcpu->last_guest_tsc = tsc_timestamp;
 	vcpu->hv_clock.flags = 0;
 
 	/*
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [KVM timekeeping fixes 2/4] Make math work for other scales
  2010-09-19  0:38 [KVM timekeeping fixes 1/4] Fix kvmclock bug Zachary Amsden
@ 2010-09-19  0:38 ` Zachary Amsden
  2010-09-19  0:38   ` [KVM timekeeping fixes 3/4] Rename timer function Zachary Amsden
  2010-09-20 15:41 ` [KVM timekeeping fixes 1/4] Fix kvmclock bug Marcelo Tosatti
  1 sibling, 1 reply; 10+ messages in thread
From: Zachary Amsden @ 2010-09-19  0:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

The math in kvm_get_time_scale relies on the fact that
NSEC_PER_SEC < 2^32.  To use the same function to compute
arbitrary time scales, we must extend the first reduction
step to shrink the base rate to a 32-bit value, and
possibly reduce the scaled rate into a 32-bit as well.

Note we must take care to avoid an arithmetic overflow
when scaling up the tps32 value (this could not happen
with the fixed scaled value of NSEC_PER_SEC, but can
happen with scaled rates above 2^31.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/x86.c |   30 ++++++++++++++++++------------
 1 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b021e1..f92401f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -920,31 +920,35 @@ static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
 	return quotient;
 }
 
-static void kvm_set_time_scale(uint32_t tsc_khz, struct pvclock_vcpu_time_info *hv_clock)
+static void kvm_get_time_scale(uint32_t scaled_khz, uint32_t base_khz,
+			       s8 *pshift, u32 *pmultiplier)
 {
-	uint64_t nsecs = 1000000000LL;
+	uint64_t scaled64;
 	int32_t  shift = 0;
 	uint64_t tps64;
 	uint32_t tps32;
 
-	tps64 = tsc_khz * 1000LL;
-	while (tps64 > nsecs*2) {
+	tps64 = base_khz * 1000LL;
+	scaled64 = scaled_khz * 1000LL;
+	while (tps64 > scaled64*2 || tps64 & 0xffffffff00000000UL) {
 		tps64 >>= 1;
 		shift--;
 	}
 
 	tps32 = (uint32_t)tps64;
-	while (tps32 <= (uint32_t)nsecs) {
-		tps32 <<= 1;
+	while (tps32 <= scaled64 || scaled64 & 0xffffffff00000000UL) {
+		if (scaled64 & 0xffffffff00000000UL || tps32 & 0x80000000)
+			scaled64 >>= 1;
+		else
+			tps32 <<= 1;
 		shift++;
 	}
 
-	hv_clock->tsc_shift = shift;
-	hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32);
+	*pshift = shift;
+	*pmultiplier = div_frac(scaled64, tps32);
 
-	pr_debug("%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n",
-		 __func__, tsc_khz, hv_clock->tsc_shift,
-		 hv_clock->tsc_to_system_mul);
+	pr_debug("%s: base_khz %u => %u, shift %d, mul %u\n",
+		 __func__, base_khz, scaled_khz, shift, *pmultiplier);
 }
 
 static inline u64 get_kernel_ns(void)
@@ -1084,7 +1088,9 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
 	}
 
 	if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
-		kvm_set_time_scale(this_tsc_khz, &vcpu->hv_clock);
+		kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
+				   &vcpu->hv_clock.tsc_shift,
+				   &vcpu->hv_clock.tsc_to_system_mul);
 		vcpu->hw_tsc_khz = this_tsc_khz;
 	}
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [KVM timekeeping fixes 3/4] Rename timer function
  2010-09-19  0:38 ` [KVM timekeeping fixes 2/4] Make math work for other scales Zachary Amsden
@ 2010-09-19  0:38   ` Zachary Amsden
  2010-09-19  0:38     ` [KVM timekeeping fixes 4/4] TSC catchup mode Zachary Amsden
  0 siblings, 1 reply; 10+ messages in thread
From: Zachary Amsden @ 2010-09-19  0:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

This just changes some names to better reflect the usage they
will be given.  Separated out to keep confusion to a minimum.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/kvm/x86.c       |   12 ++++++------
 include/linux/kvm_host.h |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f92401f..09f468a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -892,7 +892,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 
 	/*
 	 * The guest calculates current wall clock time by adding
-	 * system time (updated by kvm_write_guest_time below) to the
+	 * system time (updated by kvm_guest_time_update below) to the
 	 * wall clock specified here.  guest system time equals host
 	 * system time for us, thus we must fill in host boot time here.
 	 */
@@ -1032,7 +1032,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
-static int kvm_write_guest_time(struct kvm_vcpu *v)
+static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
 	unsigned long flags;
 	struct kvm_vcpu_arch *vcpu = &v->arch;
@@ -1052,7 +1052,7 @@ static int kvm_write_guest_time(struct kvm_vcpu *v)
 	local_irq_restore(flags);
 
 	if (unlikely(this_tsc_khz == 0)) {
-		kvm_make_request(KVM_REQ_KVMCLOCK_UPDATE, v);
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		return 1;
 	}
 
@@ -1128,7 +1128,7 @@ static int kvm_request_guest_time_update(struct kvm_vcpu *v)
 
 	if (!vcpu->time_page)
 		return 0;
-	kvm_make_request(KVM_REQ_KVMCLOCK_UPDATE, v);
+	kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 	return 1;
 }
 
@@ -5012,8 +5012,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_mmu_unload(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
 			__kvm_migrate_timers(vcpu);
-		if (kvm_check_request(KVM_REQ_KVMCLOCK_UPDATE, vcpu)) {
-			r = kvm_write_guest_time(vcpu);
+		if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) {
+			r = kvm_guest_time_update(vcpu);
 			if (unlikely(r))
 				goto out;
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6022da1..0b89d00 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -36,7 +36,7 @@
 #define KVM_REQ_PENDING_TIMER      5
 #define KVM_REQ_UNHALT             6
 #define KVM_REQ_MMU_SYNC           7
-#define KVM_REQ_KVMCLOCK_UPDATE    8
+#define KVM_REQ_CLOCK_UPDATE       8
 #define KVM_REQ_KICK               9
 #define KVM_REQ_DEACTIVATE_FPU    10
 #define KVM_REQ_EVENT             11
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-19  0:38   ` [KVM timekeeping fixes 3/4] Rename timer function Zachary Amsden
@ 2010-09-19  0:38     ` Zachary Amsden
  2010-09-20 15:38       ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Zachary Amsden @ 2010-09-19  0:38 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Marcelo Tosatti, Glauber Costa,
	linux-kernel

Negate the effects of AN TYM spell while kvm thread is preempted by tracking
conversion factor to the highest TSC rate and catching the TSC up when it has
fallen behind the kernel view of time.  Note that once triggered, we don't
turn off catchup mode.

A slightly more clever version of this is possible, which only does catchup
when TSC rate drops, and which specifically targets only CPUs with broken
TSC, but since these all are considered unstable_tsc(), this patch covers
all necessary cases.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    6 +++
 arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
 2 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8c5779d..e209078 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
 	u64 last_host_tsc;
 	u64 last_guest_tsc;
 	u64 last_kernel_ns;
+	u64 last_tsc_nsec;
+	u64 last_tsc_write;
+	bool tsc_catchup;
 
 	bool nmi_pending;
 	bool nmi_injected;
@@ -444,6 +447,9 @@ struct kvm_arch {
 	u64 last_tsc_nsec;
 	u64 last_tsc_offset;
 	u64 last_tsc_write;
+	u32 virtual_tsc_khz;
+	u32 virtual_tsc_mult;
+	s8 virtual_tsc_shift;
 
 	struct kvm_xen_hvm_config xen_hvm_config;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09f468a..9152156 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
 }
 
 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
+unsigned long max_tsc_khz;
 
 static inline int kvm_tsc_changes_freq(void)
 {
@@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
 	return ret;
 }
 
+static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
+{
+	/* Compute a scale to convert nanoseconds in TSC cycles */
+	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
+			   &kvm->arch.virtual_tsc_shift,
+			   &kvm->arch.virtual_tsc_mult);
+	kvm->arch.virtual_tsc_khz = this_tsc_khz;
+}
+
+static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
+{
+	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
+				      vcpu->kvm->arch.virtual_tsc_mult,
+				      vcpu->kvm->arch.virtual_tsc_shift);
+	tsc += vcpu->arch.last_tsc_write;
+	return tsc;
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 
 	/* Reset of TSC must disable overshoot protection below */
 	vcpu->arch.hv_clock.tsc_timestamp = 0;
+	vcpu->arch.last_tsc_write = data;
+	vcpu->arch.last_tsc_nsec = ns;
 }
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
@@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	s64 kernel_ns, max_kernel_ns;
 	u64 tsc_timestamp;
 
-	if ((!vcpu->time_page))
-		return 0;
-
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
 	kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp);
 	kernel_ns = get_kernel_ns();
 	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
-	local_irq_restore(flags);
 
 	if (unlikely(this_tsc_khz == 0)) {
+		local_irq_restore(flags);
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
 		return 1;
 	}
 
 	/*
+	 * We may have to catch up the TSC to match elapsed wall clock
+	 * time for two reasons, even if kvmclock is used.
+	 *   1) CPU could have been running below the maximum TSC rate
+	 *   2) Broken TSC compensation resets the base at each VCPU
+	 *      entry to avoid unknown leaps of TSC even when running
+	 *      again on the same CPU.  This may cause apparent elapsed
+	 *      time to disappear, and the guest to stand still or run
+	 *	very slowly.
+	 */
+	if (vcpu->tsc_catchup) {
+		u64 tsc = compute_guest_tsc(v, kernel_ns);
+		if (tsc > tsc_timestamp) {
+			kvm_x86_ops->adjust_tsc_offset(v, tsc - tsc_timestamp);
+			tsc_timestamp = tsc;
+		}
+	}
+	
+	local_irq_restore(flags);
+
+	if (!vcpu->time_page)
+		return 0;
+
+	/*
 	 * Time as measured by the TSC may go backwards when resetting the base
 	 * tsc_timestamp.  The reason for this is that the TSC resolution is
 	 * higher than the resolution of the other clock scales.  Thus, many
@@ -1122,16 +1163,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	return 0;
 }
 
-static int kvm_request_guest_time_update(struct kvm_vcpu *v)
-{
-	struct kvm_vcpu_arch *vcpu = &v->arch;
-
-	if (!vcpu->time_page)
-		return 0;
-	kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
-	return 1;
-}
-
 static bool msr_mtrr_valid(unsigned msr)
 {
 	switch (msr) {
@@ -1455,6 +1486,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		}
 
 		vcpu->arch.time = data;
+		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
 		/* we verify if the enable bit is set... */
 		if (!(data & 1))
@@ -1470,8 +1502,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 			kvm_release_page_clean(vcpu->arch.time_page);
 			vcpu->arch.time_page = NULL;
 		}
-
-		kvm_request_guest_time_update(vcpu);
 		break;
 	}
 	case MSR_IA32_MCG_CTL:
@@ -2028,9 +2058,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 				native_read_tsc() - vcpu->arch.last_host_tsc;
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
-		if (check_tsc_unstable())
+		if (check_tsc_unstable()) {
 			kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
-		kvm_migrate_timers(vcpu);
+			vcpu->arch.tsc_catchup = 1;
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+		}
+		if (vcpu->cpu != cpu)
+			kvm_migrate_timers(vcpu);
 		vcpu->cpu = cpu;
 	}
 }
@@ -4432,8 +4466,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (vcpu->cpu != freq->cpu)
 				continue;
-			if (!kvm_request_guest_time_update(vcpu))
-				continue;
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 			if (vcpu->cpu != smp_processor_id())
 				send_ipi = 1;
 		}
@@ -4488,11 +4521,20 @@ static void kvm_timer_init(void)
 {
 	int cpu;
 
+	max_tsc_khz = tsc_khz;
 	register_hotcpu_notifier(&kvmclock_cpu_notifier_block);
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+#ifdef CONFIG_CPU_FREQ
+		struct cpufreq_policy policy;
+		memset(&policy, 0, sizeof(policy));
+		cpufreq_get_policy(&policy, get_cpu());
+		if (policy.cpuinfo.max_freq)
+			max_tsc_khz = policy.cpuinfo.max_freq;
+#endif
 		cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
 					  CPUFREQ_TRANSITION_NOTIFIER);
 	}
+	pr_debug("kvm: max_tsc_khz = %ld\n", max_tsc_khz);
 	for_each_online_cpu(cpu)
 		smp_call_function_single(cpu, tsc_khz_changed, NULL, 1);
 }
@@ -5723,7 +5765,7 @@ int kvm_arch_hardware_enable(void *garbage)
 	list_for_each_entry(kvm, &vm_list, vm_list)
 		kvm_for_each_vcpu(i, vcpu, kvm)
 			if (vcpu->cpu == smp_processor_id())
-				kvm_request_guest_time_update(vcpu);
+				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 	return kvm_x86_ops->hardware_enable(garbage);
 }
 
@@ -5774,6 +5816,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	}
 	vcpu->arch.pio_data = page_address(page);
 
+	if (!kvm->arch.virtual_tsc_khz)
+		kvm_arch_set_tsc_khz(kvm, max_tsc_khz);
+
 	r = kvm_mmu_create(vcpu);
 	if (r < 0)
 		goto fail_free_pio_data;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-19  0:38     ` [KVM timekeeping fixes 4/4] TSC catchup mode Zachary Amsden
@ 2010-09-20 15:38       ` Marcelo Tosatti
  2010-09-21  1:11         ` Zachary Amsden
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2010-09-20 15:38 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote:
> Negate the effects of AN TYM spell while kvm thread is preempted by tracking
> conversion factor to the highest TSC rate and catching the TSC up when it has
> fallen behind the kernel view of time.  Note that once triggered, we don't
> turn off catchup mode.
> 
> A slightly more clever version of this is possible, which only does catchup
> when TSC rate drops, and which specifically targets only CPUs with broken
> TSC, but since these all are considered unstable_tsc(), this patch covers
> all necessary cases.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    6 +++
>  arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
>  2 files changed, 72 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8c5779d..e209078 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
>  	u64 last_host_tsc;
>  	u64 last_guest_tsc;
>  	u64 last_kernel_ns;
> +	u64 last_tsc_nsec;
> +	u64 last_tsc_write;
> +	bool tsc_catchup;
>  
>  	bool nmi_pending;
>  	bool nmi_injected;
> @@ -444,6 +447,9 @@ struct kvm_arch {
>  	u64 last_tsc_nsec;
>  	u64 last_tsc_offset;
>  	u64 last_tsc_write;
> +	u32 virtual_tsc_khz;
> +	u32 virtual_tsc_mult;
> +	s8 virtual_tsc_shift;
>  
>  	struct kvm_xen_hvm_config xen_hvm_config;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 09f468a..9152156 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
>  }
>  
>  static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
> +unsigned long max_tsc_khz;
>  
>  static inline int kvm_tsc_changes_freq(void)
>  {
> @@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
>  	return ret;
>  }
>  
> +static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
> +{
> +	/* Compute a scale to convert nanoseconds in TSC cycles */
> +	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
> +			   &kvm->arch.virtual_tsc_shift,
> +			   &kvm->arch.virtual_tsc_mult);
> +	kvm->arch.virtual_tsc_khz = this_tsc_khz;
> +}
> +
> +static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
> +{
> +	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
> +				      vcpu->kvm->arch.virtual_tsc_mult,
> +				      vcpu->kvm->arch.virtual_tsc_shift);
> +	tsc += vcpu->arch.last_tsc_write;
> +	return tsc;
> +}
> +
>  void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>  
>  	/* Reset of TSC must disable overshoot protection below */
>  	vcpu->arch.hv_clock.tsc_timestamp = 0;
> +	vcpu->arch.last_tsc_write = data;
> +	vcpu->arch.last_tsc_nsec = ns;
>  }
>  EXPORT_SYMBOL_GPL(kvm_write_tsc);
>  
> @@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	s64 kernel_ns, max_kernel_ns;
>  	u64 tsc_timestamp;
>  
> -	if ((!vcpu->time_page))
> -		return 0;
> -
>  	/* Keep irq disabled to prevent changes to the clock */
>  	local_irq_save(flags);
>  	kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp);
>  	kernel_ns = get_kernel_ns();
>  	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
> -	local_irq_restore(flags);
>  
>  	if (unlikely(this_tsc_khz == 0)) {
> +		local_irq_restore(flags);
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		return 1;
>  	}
>  
>  	/*
> +	 * We may have to catch up the TSC to match elapsed wall clock
> +	 * time for two reasons, even if kvmclock is used.
> +	 *   1) CPU could have been running below the maximum TSC rate

kvmclock handles frequency changes? 

> +	 *   2) Broken TSC compensation resets the base at each VCPU
> +	 *      entry to avoid unknown leaps of TSC even when running
> +	 *      again on the same CPU.  This may cause apparent elapsed
> +	 *      time to disappear, and the guest to stand still or run
> +	 *	very slowly.

I don't get this. Please explain.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-20 15:38       ` Marcelo Tosatti
@ 2010-09-21  1:11         ` Zachary Amsden
  2010-09-21 18:18           ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Zachary Amsden @ 2010-09-21  1:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On 09/20/2010 05:38 AM, Marcelo Tosatti wrote:
> On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote:
>    
>> Negate the effects of AN TYM spell while kvm thread is preempted by tracking
>> conversion factor to the highest TSC rate and catching the TSC up when it has
>> fallen behind the kernel view of time.  Note that once triggered, we don't
>> turn off catchup mode.
>>
>> A slightly more clever version of this is possible, which only does catchup
>> when TSC rate drops, and which specifically targets only CPUs with broken
>> TSC, but since these all are considered unstable_tsc(), this patch covers
>> all necessary cases.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |    6 +++
>>   arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
>>   2 files changed, 72 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 8c5779d..e209078 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
>>   	u64 last_host_tsc;
>>   	u64 last_guest_tsc;
>>   	u64 last_kernel_ns;
>> +	u64 last_tsc_nsec;
>> +	u64 last_tsc_write;
>> +	bool tsc_catchup;
>>
>>   	bool nmi_pending;
>>   	bool nmi_injected;
>> @@ -444,6 +447,9 @@ struct kvm_arch {
>>   	u64 last_tsc_nsec;
>>   	u64 last_tsc_offset;
>>   	u64 last_tsc_write;
>> +	u32 virtual_tsc_khz;
>> +	u32 virtual_tsc_mult;
>> +	s8 virtual_tsc_shift;
>>
>>   	struct kvm_xen_hvm_config xen_hvm_config;
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 09f468a..9152156 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
>>   }
>>
>>   static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
>> +unsigned long max_tsc_khz;
>>
>>   static inline int kvm_tsc_changes_freq(void)
>>   {
>> @@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
>>   	return ret;
>>   }
>>
>> +static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
>> +{
>> +	/* Compute a scale to convert nanoseconds in TSC cycles */
>> +	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
>> +			&kvm->arch.virtual_tsc_shift,
>> +			&kvm->arch.virtual_tsc_mult);
>> +	kvm->arch.virtual_tsc_khz = this_tsc_khz;
>> +}
>> +
>> +static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
>> +{
>> +	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
>> +				      vcpu->kvm->arch.virtual_tsc_mult,
>> +				      vcpu->kvm->arch.virtual_tsc_shift);
>> +	tsc += vcpu->arch.last_tsc_write;
>> +	return tsc;
>> +}
>> +
>>   void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>>   {
>>   	struct kvm *kvm = vcpu->kvm;
>> @@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>>
>>   	/* Reset of TSC must disable overshoot protection below */
>>   	vcpu->arch.hv_clock.tsc_timestamp = 0;
>> +	vcpu->arch.last_tsc_write = data;
>> +	vcpu->arch.last_tsc_nsec = ns;
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_write_tsc);
>>
>> @@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>   	s64 kernel_ns, max_kernel_ns;
>>   	u64 tsc_timestamp;
>>
>> -	if ((!vcpu->time_page))
>> -		return 0;
>> -
>>   	/* Keep irq disabled to prevent changes to the clock */
>>   	local_irq_save(flags);
>>   	kvm_get_msr(v, MSR_IA32_TSC,&tsc_timestamp);
>>   	kernel_ns = get_kernel_ns();
>>   	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
>> -	local_irq_restore(flags);
>>
>>   	if (unlikely(this_tsc_khz == 0)) {
>> +		local_irq_restore(flags);
>>   		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>>   		return 1;
>>   	}
>>
>>   	/*
>> +	 * We may have to catch up the TSC to match elapsed wall clock
>> +	 * time for two reasons, even if kvmclock is used.
>> +	 *   1) CPU could have been running below the maximum TSC rate
>>      
> kvmclock handles frequency changes?
>
>    
>> +	 *   2) Broken TSC compensation resets the base at each VCPU
>> +	 *      entry to avoid unknown leaps of TSC even when running
>> +	 *      again on the same CPU.  This may cause apparent elapsed
>> +	 *      time to disappear, and the guest to stand still or run
>> +	 *	very slowly.
>>      
> I don't get this. Please explain.
>    

This compensation in arch_vcpu_load, for unstable TSC case, causes time 
while preempted to disappear from the TSC by adjusting the TSC back to 
match the last observed TSC.

         if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
                 /* Make sure TSC doesn't go backwards */
                 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
                                 native_read_tsc() - 
vcpu->arch.last_host_tsc;
                 if (tsc_delta < 0)
                         mark_tsc_unstable("KVM discovered backwards TSC");
                 if (check_tsc_unstable())
                         kvm_x86_ops->adjust_tsc_offset(vcpu, 
-tsc_delta); <<<<<

Note that this is the correct thing to do if there are cross-CPU deltas, 
when switching CPUs, or if the TSC becomes inherently unpredictable 
while preempted (CPU bugs, kernel resets TSC).

However, all the time that elapsed while not running disappears from the 
TSC (and thus even from kvmclock, without recalibration, as it is based 
off the TSC).  Since we've got to recalibrate the kvmclock anyways, we 
might as well catch the TSC up to the proper value.

And if kvmclock is not in use, we must catch the tsc up to the proper value.

Zach

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-21  1:11         ` Zachary Amsden
@ 2010-09-21 18:18           ` Marcelo Tosatti
  2010-09-22 19:25             ` Zachary Amsden
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2010-09-21 18:18 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Mon, Sep 20, 2010 at 03:11:30PM -1000, Zachary Amsden wrote:
> On 09/20/2010 05:38 AM, Marcelo Tosatti wrote:
> >On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote:
> >>Negate the effects of AN TYM spell while kvm thread is preempted by tracking
> >>conversion factor to the highest TSC rate and catching the TSC up when it has
> >>fallen behind the kernel view of time.  Note that once triggered, we don't
> >>turn off catchup mode.
> >>
> >>A slightly more clever version of this is possible, which only does catchup
> >>when TSC rate drops, and which specifically targets only CPUs with broken
> >>TSC, but since these all are considered unstable_tsc(), this patch covers
> >>all necessary cases.
> >>
> >>Signed-off-by: Zachary Amsden<zamsden@redhat.com>
> >>---
> >>  arch/x86/include/asm/kvm_host.h |    6 +++
> >>  arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
> >>  2 files changed, 72 insertions(+), 21 deletions(-)
> >>
> >>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>index 8c5779d..e209078 100644
> >>--- a/arch/x86/include/asm/kvm_host.h
> >>+++ b/arch/x86/include/asm/kvm_host.h
> >>@@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
> >>  	u64 last_host_tsc;
> >>  	u64 last_guest_tsc;
> >>  	u64 last_kernel_ns;
> >>+	u64 last_tsc_nsec;
> >>+	u64 last_tsc_write;
> >>+	bool tsc_catchup;
> >>
> >>  	bool nmi_pending;
> >>  	bool nmi_injected;
> >>@@ -444,6 +447,9 @@ struct kvm_arch {
> >>  	u64 last_tsc_nsec;
> >>  	u64 last_tsc_offset;
> >>  	u64 last_tsc_write;
> >>+	u32 virtual_tsc_khz;
> >>+	u32 virtual_tsc_mult;
> >>+	s8 virtual_tsc_shift;
> >>
> >>  	struct kvm_xen_hvm_config xen_hvm_config;
> >>
> >>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>index 09f468a..9152156 100644
> >>--- a/arch/x86/kvm/x86.c
> >>+++ b/arch/x86/kvm/x86.c
> >>@@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
> >>  }
> >>
> >>  static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
> >>+unsigned long max_tsc_khz;
> >>
> >>  static inline int kvm_tsc_changes_freq(void)
> >>  {
> >>@@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
> >>  	return ret;
> >>  }
> >>
> >>+static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
> >>+{
> >>+	/* Compute a scale to convert nanoseconds in TSC cycles */
> >>+	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
> >>+			&kvm->arch.virtual_tsc_shift,
> >>+			&kvm->arch.virtual_tsc_mult);
> >>+	kvm->arch.virtual_tsc_khz = this_tsc_khz;
> >>+}
> >>+
> >>+static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
> >>+{
> >>+	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
> >>+				      vcpu->kvm->arch.virtual_tsc_mult,
> >>+				      vcpu->kvm->arch.virtual_tsc_shift);
> >>+	tsc += vcpu->arch.last_tsc_write;
> >>+	return tsc;
> >>+}
> >>+
> >>  void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> >>  {
> >>  	struct kvm *kvm = vcpu->kvm;
> >>@@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> >>
> >>  	/* Reset of TSC must disable overshoot protection below */
> >>  	vcpu->arch.hv_clock.tsc_timestamp = 0;
> >>+	vcpu->arch.last_tsc_write = data;
> >>+	vcpu->arch.last_tsc_nsec = ns;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvm_write_tsc);
> >>
> >>@@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >>  	s64 kernel_ns, max_kernel_ns;
> >>  	u64 tsc_timestamp;
> >>
> >>-	if ((!vcpu->time_page))
> >>-		return 0;
> >>-
> >>  	/* Keep irq disabled to prevent changes to the clock */
> >>  	local_irq_save(flags);
> >>  	kvm_get_msr(v, MSR_IA32_TSC,&tsc_timestamp);
> >>  	kernel_ns = get_kernel_ns();
> >>  	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
> >>-	local_irq_restore(flags);
> >>
> >>  	if (unlikely(this_tsc_khz == 0)) {
> >>+		local_irq_restore(flags);
> >>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> >>  		return 1;
> >>  	}
> >>
> >>  	/*
> >>+	 * We may have to catch up the TSC to match elapsed wall clock
> >>+	 * time for two reasons, even if kvmclock is used.
> >>+	 *   1) CPU could have been running below the maximum TSC rate
> >kvmclock handles frequency changes?
> >
> >>+	 *   2) Broken TSC compensation resets the base at each VCPU
> >>+	 *      entry to avoid unknown leaps of TSC even when running
> >>+	 *      again on the same CPU.  This may cause apparent elapsed
> >>+	 *      time to disappear, and the guest to stand still or run
> >>+	 *	very slowly.
> >I don't get this. Please explain.
> 
> This compensation in arch_vcpu_load, for unstable TSC case, causes
> time while preempted to disappear from the TSC by adjusting the TSC
> back to match the last observed TSC.
> 
>         if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>                 /* Make sure TSC doesn't go backwards */
>                 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>                                 native_read_tsc() -
> vcpu->arch.last_host_tsc;
>                 if (tsc_delta < 0)
>                         mark_tsc_unstable("KVM discovered backwards TSC");
>                 if (check_tsc_unstable())
>                         kvm_x86_ops->adjust_tsc_offset(vcpu,
> -tsc_delta); <<<<<
> 
> Note that this is the correct thing to do if there are cross-CPU
> deltas, when switching CPUs, or if the TSC becomes inherently
> unpredictable while preempted (CPU bugs, kernel resets TSC).
> 
> However, all the time that elapsed while not running disappears from
> the TSC (and thus even from kvmclock, without recalibration, as it
> is based off the TSC).  Since we've got to recalibrate the kvmclock
> anyways, we might as well catch the TSC up to the proper value.

Updating kvmclock's tsc_timestamp and system_time should be enough then,
to fix this particular issue?

The problem is you're sneaking in parts of trap mode (virtual_tsc_khz),
without dealing with the issues raised in the past iteration. The
interactions between catch and trap mode are not clear, migration is not
handled, etc.

> And if kvmclock is not in use, we must catch the tsc up to the proper value.
> 
> Zach

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-21 18:18           ` Marcelo Tosatti
@ 2010-09-22 19:25             ` Zachary Amsden
  2010-09-23 19:57               ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Zachary Amsden @ 2010-09-22 19:25 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On 09/21/2010 08:18 AM, Marcelo Tosatti wrote:
> On Mon, Sep 20, 2010 at 03:11:30PM -1000, Zachary Amsden wrote:
>    
>> On 09/20/2010 05:38 AM, Marcelo Tosatti wrote:
>>      
>>> On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote:
>>>        
>>>> Negate the effects of AN TYM spell while kvm thread is preempted by tracking
>>>> conversion factor to the highest TSC rate and catching the TSC up when it has
>>>> fallen behind the kernel view of time.  Note that once triggered, we don't
>>>> turn off catchup mode.
>>>>
>>>> A slightly more clever version of this is possible, which only does catchup
>>>> when TSC rate drops, and which specifically targets only CPUs with broken
>>>> TSC, but since these all are considered unstable_tsc(), this patch covers
>>>> all necessary cases.
>>>>
>>>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>>>> ---
>>>>   arch/x86/include/asm/kvm_host.h |    6 +++
>>>>   arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
>>>>   2 files changed, 72 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 8c5779d..e209078 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
>>>>   	u64 last_host_tsc;
>>>>   	u64 last_guest_tsc;
>>>>   	u64 last_kernel_ns;
>>>> +	u64 last_tsc_nsec;
>>>> +	u64 last_tsc_write;
>>>> +	bool tsc_catchup;
>>>>
>>>>   	bool nmi_pending;
>>>>   	bool nmi_injected;
>>>> @@ -444,6 +447,9 @@ struct kvm_arch {
>>>>   	u64 last_tsc_nsec;
>>>>   	u64 last_tsc_offset;
>>>>   	u64 last_tsc_write;
>>>> +	u32 virtual_tsc_khz;
>>>> +	u32 virtual_tsc_mult;
>>>> +	s8 virtual_tsc_shift;
>>>>
>>>>   	struct kvm_xen_hvm_config xen_hvm_config;
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 09f468a..9152156 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
>>>>   }
>>>>
>>>>   static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
>>>> +unsigned long max_tsc_khz;
>>>>
>>>>   static inline int kvm_tsc_changes_freq(void)
>>>>   {
>>>> @@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
>>>>   	return ret;
>>>>   }
>>>>
>>>> +static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
>>>> +{
>>>> +	/* Compute a scale to convert nanoseconds in TSC cycles */
>>>> +	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
>>>> +			&kvm->arch.virtual_tsc_shift,
>>>> +			&kvm->arch.virtual_tsc_mult);
>>>> +	kvm->arch.virtual_tsc_khz = this_tsc_khz;
>>>> +}
>>>> +
>>>> +static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
>>>> +{
>>>> +	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
>>>> +				      vcpu->kvm->arch.virtual_tsc_mult,
>>>> +				      vcpu->kvm->arch.virtual_tsc_shift);
>>>> +	tsc += vcpu->arch.last_tsc_write;
>>>> +	return tsc;
>>>> +}
>>>> +
>>>>   void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>>>>   {
>>>>   	struct kvm *kvm = vcpu->kvm;
>>>> @@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>>>>
>>>>   	/* Reset of TSC must disable overshoot protection below */
>>>>   	vcpu->arch.hv_clock.tsc_timestamp = 0;
>>>> +	vcpu->arch.last_tsc_write = data;
>>>> +	vcpu->arch.last_tsc_nsec = ns;
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(kvm_write_tsc);
>>>>
>>>> @@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>>>>   	s64 kernel_ns, max_kernel_ns;
>>>>   	u64 tsc_timestamp;
>>>>
>>>> -	if ((!vcpu->time_page))
>>>> -		return 0;
>>>> -
>>>>   	/* Keep irq disabled to prevent changes to the clock */
>>>>   	local_irq_save(flags);
>>>>   	kvm_get_msr(v, MSR_IA32_TSC,&tsc_timestamp);
>>>>   	kernel_ns = get_kernel_ns();
>>>>   	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
>>>> -	local_irq_restore(flags);
>>>>
>>>>   	if (unlikely(this_tsc_khz == 0)) {
>>>> +		local_irq_restore(flags);
>>>>   		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>>>>   		return 1;
>>>>   	}
>>>>
>>>>   	/*
>>>> +	 * We may have to catch up the TSC to match elapsed wall clock
>>>> +	 * time for two reasons, even if kvmclock is used.
>>>> +	 *   1) CPU could have been running below the maximum TSC rate
>>>>          
>>> kvmclock handles frequency changes?
>>>
>>>        
>>>> +	 *   2) Broken TSC compensation resets the base at each VCPU
>>>> +	 *      entry to avoid unknown leaps of TSC even when running
>>>> +	 *      again on the same CPU.  This may cause apparent elapsed
>>>> +	 *      time to disappear, and the guest to stand still or run
>>>> +	 *	very slowly.
>>>>          
>>> I don't get this. Please explain.
>>>        
>> This compensation in arch_vcpu_load, for unstable TSC case, causes
>> time while preempted to disappear from the TSC by adjusting the TSC
>> back to match the last observed TSC.
>>
>>          if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>>                  /* Make sure TSC doesn't go backwards */
>>                  s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
>>                                  native_read_tsc() -
>> vcpu->arch.last_host_tsc;
>>                  if (tsc_delta<  0)
>>                          mark_tsc_unstable("KVM discovered backwards TSC");
>>                  if (check_tsc_unstable())
>>                          kvm_x86_ops->adjust_tsc_offset(vcpu,
>> -tsc_delta);<<<<<
>>
>> Note that this is the correct thing to do if there are cross-CPU
>> deltas, when switching CPUs, or if the TSC becomes inherently
>> unpredictable while preempted (CPU bugs, kernel resets TSC).
>>
>> However, all the time that elapsed while not running disappears from
>> the TSC (and thus even from kvmclock, without recalibration, as it
>> is based off the TSC).  Since we've got to recalibrate the kvmclock
>> anyways, we might as well catch the TSC up to the proper value.
>>      
> Updating kvmclock's tsc_timestamp and system_time should be enough then,
> to fix this particular issue?
>    

Yes, it is, for kvmclock guests.  For TSC based kernels (RHEL < 5.5, 
FreeBSD, Darwin?), and guests which have userspace TSC, we still need this.

> The problem is you're sneaking in parts of trap mode (virtual_tsc_khz),
> without dealing with the issues raised in the past iteration. The
> interactions between catch and trap mode are not clear, migration is not
> handled, etc.
>    

Yes, I am :)

While I haven't yet resolved those issues to a successful conclusion, 
the situation is at least improved, and incremental progress is better 
than nothing.

I do believe that the catchup mode is at least clean and easy to 
understand, it is the transition to and from trap mode that raised a lot 
of problems, and that is what I'm reworking.  Regardless of how that 
turns out, it should integrate smoothly on top of the catchup mode, at 
least, that is the design goal I'm shooting for, so I'd like to get 
these pieces upstream now as I don't expect them to change much.

Zach

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 4/4] TSC catchup mode
  2010-09-22 19:25             ` Zachary Amsden
@ 2010-09-23 19:57               ` Marcelo Tosatti
  0 siblings, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2010-09-23 19:57 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Wed, Sep 22, 2010 at 09:25:49AM -1000, Zachary Amsden wrote:
> On 09/21/2010 08:18 AM, Marcelo Tosatti wrote:
> >On Mon, Sep 20, 2010 at 03:11:30PM -1000, Zachary Amsden wrote:
> >>On 09/20/2010 05:38 AM, Marcelo Tosatti wrote:
> >>>On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote:
> >>>>Negate the effects of AN TYM spell while kvm thread is preempted by tracking
> >>>>conversion factor to the highest TSC rate and catching the TSC up when it has
> >>>>fallen behind the kernel view of time.  Note that once triggered, we don't
> >>>>turn off catchup mode.
> >>>>
> >>>>A slightly more clever version of this is possible, which only does catchup
> >>>>when TSC rate drops, and which specifically targets only CPUs with broken
> >>>>TSC, but since these all are considered unstable_tsc(), this patch covers
> >>>>all necessary cases.
> >>>>
> >>>>Signed-off-by: Zachary Amsden<zamsden@redhat.com>
> >>>>---
> >>>>  arch/x86/include/asm/kvm_host.h |    6 +++
> >>>>  arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++++++---------
> >>>>  2 files changed, 72 insertions(+), 21 deletions(-)
> >>>>
> >>>>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>>index 8c5779d..e209078 100644
> >>>>--- a/arch/x86/include/asm/kvm_host.h
> >>>>+++ b/arch/x86/include/asm/kvm_host.h
> >>>>@@ -384,6 +384,9 @@ struct kvm_vcpu_arch {
> >>>>  	u64 last_host_tsc;
> >>>>  	u64 last_guest_tsc;
> >>>>  	u64 last_kernel_ns;
> >>>>+	u64 last_tsc_nsec;
> >>>>+	u64 last_tsc_write;
> >>>>+	bool tsc_catchup;
> >>>>
> >>>>  	bool nmi_pending;
> >>>>  	bool nmi_injected;
> >>>>@@ -444,6 +447,9 @@ struct kvm_arch {
> >>>>  	u64 last_tsc_nsec;
> >>>>  	u64 last_tsc_offset;
> >>>>  	u64 last_tsc_write;
> >>>>+	u32 virtual_tsc_khz;
> >>>>+	u32 virtual_tsc_mult;
> >>>>+	s8 virtual_tsc_shift;
> >>>>
> >>>>  	struct kvm_xen_hvm_config xen_hvm_config;
> >>>>
> >>>>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>>index 09f468a..9152156 100644
> >>>>--- a/arch/x86/kvm/x86.c
> >>>>+++ b/arch/x86/kvm/x86.c
> >>>>@@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void)
> >>>>  }
> >>>>
> >>>>  static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
> >>>>+unsigned long max_tsc_khz;
> >>>>
> >>>>  static inline int kvm_tsc_changes_freq(void)
> >>>>  {
> >>>>@@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec)
> >>>>  	return ret;
> >>>>  }
> >>>>
> >>>>+static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
> >>>>+{
> >>>>+	/* Compute a scale to convert nanoseconds in TSC cycles */
> >>>>+	kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
> >>>>+			&kvm->arch.virtual_tsc_shift,
> >>>>+			&kvm->arch.virtual_tsc_mult);
> >>>>+	kvm->arch.virtual_tsc_khz = this_tsc_khz;
> >>>>+}
> >>>>+
> >>>>+static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
> >>>>+{
> >>>>+	u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.last_tsc_nsec,
> >>>>+				      vcpu->kvm->arch.virtual_tsc_mult,
> >>>>+				      vcpu->kvm->arch.virtual_tsc_shift);
> >>>>+	tsc += vcpu->arch.last_tsc_write;
> >>>>+	return tsc;
> >>>>+}
> >>>>+
> >>>>  void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> >>>>  {
> >>>>  	struct kvm *kvm = vcpu->kvm;
> >>>>@@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> >>>>
> >>>>  	/* Reset of TSC must disable overshoot protection below */
> >>>>  	vcpu->arch.hv_clock.tsc_timestamp = 0;
> >>>>+	vcpu->arch.last_tsc_write = data;
> >>>>+	vcpu->arch.last_tsc_nsec = ns;
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(kvm_write_tsc);
> >>>>
> >>>>@@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >>>>  	s64 kernel_ns, max_kernel_ns;
> >>>>  	u64 tsc_timestamp;
> >>>>
> >>>>-	if ((!vcpu->time_page))
> >>>>-		return 0;
> >>>>-
> >>>>  	/* Keep irq disabled to prevent changes to the clock */
> >>>>  	local_irq_save(flags);
> >>>>  	kvm_get_msr(v, MSR_IA32_TSC,&tsc_timestamp);
> >>>>  	kernel_ns = get_kernel_ns();
> >>>>  	this_tsc_khz = __get_cpu_var(cpu_tsc_khz);
> >>>>-	local_irq_restore(flags);
> >>>>
> >>>>  	if (unlikely(this_tsc_khz == 0)) {
> >>>>+		local_irq_restore(flags);
> >>>>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> >>>>  		return 1;
> >>>>  	}
> >>>>
> >>>>  	/*
> >>>>+	 * We may have to catch up the TSC to match elapsed wall clock
> >>>>+	 * time for two reasons, even if kvmclock is used.
> >>>>+	 *   1) CPU could have been running below the maximum TSC rate
> >>>kvmclock handles frequency changes?
> >>>
> >>>>+	 *   2) Broken TSC compensation resets the base at each VCPU
> >>>>+	 *      entry to avoid unknown leaps of TSC even when running
> >>>>+	 *      again on the same CPU.  This may cause apparent elapsed
> >>>>+	 *      time to disappear, and the guest to stand still or run
> >>>>+	 *	very slowly.
> >>>I don't get this. Please explain.
> >>This compensation in arch_vcpu_load, for unstable TSC case, causes
> >>time while preempted to disappear from the TSC by adjusting the TSC
> >>back to match the last observed TSC.
> >>
> >>         if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
> >>                 /* Make sure TSC doesn't go backwards */
> >>                 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
> >>                                 native_read_tsc() -
> >>vcpu->arch.last_host_tsc;
> >>                 if (tsc_delta<  0)
> >>                         mark_tsc_unstable("KVM discovered backwards TSC");
> >>                 if (check_tsc_unstable())
> >>                         kvm_x86_ops->adjust_tsc_offset(vcpu,
> >>-tsc_delta);<<<<<
> >>
> >>Note that this is the correct thing to do if there are cross-CPU
> >>deltas, when switching CPUs, or if the TSC becomes inherently
> >>unpredictable while preempted (CPU bugs, kernel resets TSC).
> >>
> >>However, all the time that elapsed while not running disappears from
> >>the TSC (and thus even from kvmclock, without recalibration, as it
> >>is based off the TSC).  Since we've got to recalibrate the kvmclock
> >>anyways, we might as well catch the TSC up to the proper value.
> >Updating kvmclock's tsc_timestamp and system_time should be enough then,
> >to fix this particular issue?
> 
> Yes, it is, for kvmclock guests.  For TSC based kernels (RHEL < 5.5,
> FreeBSD, Darwin?), and guests which have userspace TSC, we still
> need this.
> 
> >The problem is you're sneaking in parts of trap mode (virtual_tsc_khz),
> >without dealing with the issues raised in the past iteration. The
> >interactions between catch and trap mode are not clear, migration is not
> >handled, etc.
> 
> Yes, I am :)
> 
> While I haven't yet resolved those issues to a successful
> conclusion, the situation is at least improved, and incremental
> progress is better than nothing.
> 
> I do believe that the catchup mode is at least clean and easy to
> understand, it is the transition to and from trap mode that raised a
> lot of problems, and that is what I'm reworking.  Regardless of how
> that turns out, it should integrate smoothly on top of the catchup
> mode, at least, that is the design goal I'm shooting for, so I'd
> like to get these pieces upstream now as I don't expect them to
> change much.
> 
> Zach

OK, applied. I don't like catchup mode being used to fix a bug in
last_host_tsc logic, hopefully it can be killed with trap mode.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [KVM timekeeping fixes 1/4] Fix kvmclock bug
  2010-09-19  0:38 [KVM timekeeping fixes 1/4] Fix kvmclock bug Zachary Amsden
  2010-09-19  0:38 ` [KVM timekeeping fixes 2/4] Make math work for other scales Zachary Amsden
@ 2010-09-20 15:41 ` Marcelo Tosatti
  1 sibling, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2010-09-20 15:41 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: kvm, Avi Kivity, Glauber Costa, linux-kernel

On Sat, Sep 18, 2010 at 02:38:12PM -1000, Zachary Amsden wrote:
> If preempted after kvmclock values are updated, but before hardware
> virtualization is entered, the last tsc time as read by the guest is
> never set.  It underflows the next time kvmclock is updated if there
> has not yet been a successful entry / exit into hardware virt.
> 
> Fix this by simply setting last_tsc to the newly read tsc value so
> that any computed nsec advance of kvmclock is nulled.
> 
> Signed-off-by: Zachary Amsden <zamsden@redhat.com>

Applied, thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-24  0:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-19  0:38 [KVM timekeeping fixes 1/4] Fix kvmclock bug Zachary Amsden
2010-09-19  0:38 ` [KVM timekeeping fixes 2/4] Make math work for other scales Zachary Amsden
2010-09-19  0:38   ` [KVM timekeeping fixes 3/4] Rename timer function Zachary Amsden
2010-09-19  0:38     ` [KVM timekeeping fixes 4/4] TSC catchup mode Zachary Amsden
2010-09-20 15:38       ` Marcelo Tosatti
2010-09-21  1:11         ` Zachary Amsden
2010-09-21 18:18           ` Marcelo Tosatti
2010-09-22 19:25             ` Zachary Amsden
2010-09-23 19:57               ` Marcelo Tosatti
2010-09-20 15:41 ` [KVM timekeeping fixes 1/4] Fix kvmclock bug Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox