public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: kvm: global clock updates
@ 2014-02-26 18:15 Andrew Jones
  2014-02-26 18:15 ` [PATCH 1/2] x86: kvm: rate-limit " Andrew Jones
  2014-02-26 18:15 ` [PATCH 2/2] x86: kvm: introduce periodic " Andrew Jones
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Jones @ 2014-02-26 18:15 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: mtosatti, pbonzini

This patch series addresses two issues with global clock updates.
The first fixes a bug found on hosts that have a tsc marked as
unstable. As global clock updates get triggered on every vcpu load
in these cases, guests with a large number of vcpus have their
progress nearly halted. The fix for that bug should also go to
stable. The second patch in this series ensures that NTP corrections
on the host, as well as on guests with all vcpus pinned, will be
propagated periodically. That patch improves things, but doesn't
fix a bug, thus it can be merged at a later time than the first.
I've posted them together as a series, as the second one builds
on the first.

Andrew Jones (2):
  x86: kvm: rate-limit global clock updates
  x86: kvm: introduce periodic global clock updates

 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/x86.c              | 92 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 90 insertions(+), 4 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] x86: kvm: rate-limit global clock updates
  2014-02-26 18:15 [PATCH 0/2] x86: kvm: global clock updates Andrew Jones
@ 2014-02-26 18:15 ` Andrew Jones
  2014-02-26 20:25   ` Marcelo Tosatti
  2014-02-26 18:15 ` [PATCH 2/2] x86: kvm: introduce periodic " Andrew Jones
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Jones @ 2014-02-26 18:15 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: mtosatti, pbonzini

When we update a vcpu's local clock it may pick up an NTP correction.
We can't wait an indeterminate amount of time for other vcpus to pick
up that correction, so commit 0061d53daf26f introduced a global clock
update. However, we can't request a global clock update on every vcpu
load either (which is what happens if the tsc is marked as unstable).
The solution is to rate-limit the global clock updates. Marcelo
calculated that we should delay the global clock updates no more
than 0.1s as follows:

Assume an NTP correction c is applied to one vcpu, but not the other,
then in n seconds the delta of the vcpu system_timestamps will be
c * n. If we assume a correction of 500ppm (worst-case), then the two
vcpus will diverge 100us in 0.1s, which is a considerable amount.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 33 +++++++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e714f8c08ccf2..9aa09d330a4b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -598,6 +598,7 @@ struct kvm_arch {
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	cycle_t master_cycle_now;
+	struct delayed_work kvmclock_update_work;
 
 	struct kvm_xen_hvm_config xen_hvm_config;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cca45853dfeb..a2d30de597b7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1628,20 +1628,43 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
  * the others.
  *
  * So in those cases, request a kvmclock update for all vcpus.
- * The worst case for a remote vcpu to update its kvmclock
- * is then bounded by maximum nohz sleep latency.
+ * We need to rate-limit these requests though, as they can
+ * considerably slow guests that have a large number of vcpus.
+ * The time for a remote vcpu to update its kvmclock is bound
+ * by the delay we use to rate-limit the updates.
  */
 
-static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
+#define KVMCLOCK_UPDATE_DELAY msecs_to_jiffies(100)
+
+static void kvmclock_update_fn(struct work_struct *work)
 {
 	int i;
-	struct kvm *kvm = v->kvm;
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
+					   kvmclock_update_work);
+	struct kvm *kvm = container_of(ka, struct kvm, arch);
 	struct kvm_vcpu *vcpu;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
 		kvm_vcpu_kick(vcpu);
 	}
+	kvm_put_kvm(kvm);
+}
+
+static void kvm_schedule_kvmclock_update(struct kvm *kvm)
+{
+	kvm_get_kvm(kvm);
+	schedule_delayed_work(&kvm->arch.kvmclock_update_work,
+					KVMCLOCK_UPDATE_DELAY);
+}
+
+static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
+{
+	struct kvm *kvm = v->kvm;
+
+	set_bit(KVM_REQ_CLOCK_UPDATE, &v->requests);
+	kvm_schedule_kvmclock_update(kvm);
 }
 
 static bool msr_mtrr_valid(unsigned msr)
@@ -7019,6 +7042,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	pvclock_update_vm_gtod_copy(kvm);
 
+	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
+
 	return 0;
 }
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] x86: kvm: introduce periodic global clock updates
  2014-02-26 18:15 [PATCH 0/2] x86: kvm: global clock updates Andrew Jones
  2014-02-26 18:15 ` [PATCH 1/2] x86: kvm: rate-limit " Andrew Jones
@ 2014-02-26 18:15 ` Andrew Jones
  2014-02-26 20:28   ` Marcelo Tosatti
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Jones @ 2014-02-26 18:15 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: mtosatti, pbonzini

commit 0061d53daf26f introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 65 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9aa09d330a4b5..77c69aa4756f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -599,6 +599,7 @@ struct kvm_arch {
 	u64 master_kernel_ns;
 	cycle_t master_cycle_now;
 	struct delayed_work kvmclock_update_work;
+	bool clocks_synced;
 
 	struct kvm_xen_hvm_config xen_hvm_config;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2d30de597b7d..5cba20b446aac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1620,6 +1620,60 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	return 0;
 }
 
+static void kvm_schedule_kvmclock_update(struct kvm *kvm, bool now);
+static void clock_sync_fn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(clock_sync_work, clock_sync_fn);
+
+#define CLOCK_SYNC_PERIOD_SECS	300
+#define CLOCK_SYNC_BUMP_SECS	30
+#define CLOCK_SYNC_STEP_MSECS	100
+
+#define __steps(s) (((s) * MSEC_PER_SEC) / CLOCK_SYNC_STEP_MSECS)
+
+static void clock_sync_fn(struct work_struct *work)
+{
+	static unsigned reset_step = __steps(CLOCK_SYNC_PERIOD_SECS);
+	static unsigned step = 0;
+	struct kvm *kvm;
+	bool sync = false;
+
+	spin_lock(&kvm_lock);
+
+	if (step == 0)
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			kvm->arch.clocks_synced = false;
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		if (!kvm->arch.clocks_synced) {
+			kvm_get_kvm(kvm);
+			sync = true;
+			break;
+		}
+	}
+
+	spin_unlock(&kvm_lock);
+
+	if (sync) {
+		kvm_schedule_kvmclock_update(kvm, true);
+		kvm_put_kvm(kvm);
+
+		if (++step == reset_step) {
+			reset_step += __steps(CLOCK_SYNC_BUMP_SECS);
+			pr_warn("kvmclock: reducing VM clock sync frequency "
+				"to every %ld seconds.\n", (reset_step
+					* CLOCK_SYNC_STEP_MSECS)/MSEC_PER_SEC);
+		}
+
+		schedule_delayed_work(&clock_sync_work,
+				msecs_to_jiffies(CLOCK_SYNC_STEP_MSECS));
+	} else {
+		unsigned s = reset_step - step;
+		step = 0;
+		schedule_delayed_work(&clock_sync_work,
+				msecs_to_jiffies(s * CLOCK_SYNC_STEP_MSECS));
+	}
+}
+
 /*
  * kvmclock updates which are isolated to a given vcpu, such as
  * vcpu->cpu migration, should not allow system_timestamp from
@@ -1652,11 +1706,12 @@ static void kvmclock_update_fn(struct work_struct *work)
 	kvm_put_kvm(kvm);
 }
 
-static void kvm_schedule_kvmclock_update(struct kvm *kvm)
+static void kvm_schedule_kvmclock_update(struct kvm *kvm, bool now)
 {
 	kvm_get_kvm(kvm);
+	kvm->arch.clocks_synced = true;
 	schedule_delayed_work(&kvm->arch.kvmclock_update_work,
-					KVMCLOCK_UPDATE_DELAY);
+				now ? 0 : KVMCLOCK_UPDATE_DELAY);
 }
 
 static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
@@ -1664,7 +1719,7 @@ static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
 	struct kvm *kvm = v->kvm;
 
 	set_bit(KVM_REQ_CLOCK_UPDATE, &v->requests);
-	kvm_schedule_kvmclock_update(kvm);
+	kvm_schedule_kvmclock_update(kvm, false);
 }
 
 static bool msr_mtrr_valid(unsigned msr)
@@ -5584,6 +5639,8 @@ int kvm_arch_init(void *opaque)
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
 #endif
 
+	schedule_delayed_work(&clock_sync_work, CLOCK_SYNC_PERIOD_SECS * HZ);
+
 	return 0;
 
 out_free_percpu:
@@ -5594,6 +5651,8 @@ out:
 
 void kvm_arch_exit(void)
 {
+	cancel_delayed_work_sync(&clock_sync_work);
+
 	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
 
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86: kvm: rate-limit global clock updates
  2014-02-26 18:15 ` [PATCH 1/2] x86: kvm: rate-limit " Andrew Jones
@ 2014-02-26 20:25   ` Marcelo Tosatti
  2014-02-27 13:11     ` Andrew Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2014-02-26 20:25 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, linux-kernel, pbonzini

On Wed, Feb 26, 2014 at 07:15:11PM +0100, Andrew Jones wrote:
> When we update a vcpu's local clock it may pick up an NTP correction.
> We can't wait an indeterminate amount of time for other vcpus to pick
> up that correction, so commit 0061d53daf26f introduced a global clock
> update. However, we can't request a global clock update on every vcpu
> load either (which is what happens if the tsc is marked as unstable).
> The solution is to rate-limit the global clock updates. Marcelo
> calculated that we should delay the global clock updates no more
> than 0.1s as follows:
> 
> Assume an NTP correction c is applied to one vcpu, but not the other,
> then in n seconds the delta of the vcpu system_timestamps will be
> c * n. If we assume a correction of 500ppm (worst-case), then the two
> vcpus will diverge 100us in 0.1s, which is a considerable amount.

100us -> 50us.

> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 33 +++++++++++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e714f8c08ccf2..9aa09d330a4b5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -598,6 +598,7 @@ struct kvm_arch {
>  	bool use_master_clock;
>  	u64 master_kernel_ns;
>  	cycle_t master_cycle_now;
> +	struct delayed_work kvmclock_update_work;
>  
>  	struct kvm_xen_hvm_config xen_hvm_config;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cca45853dfeb..a2d30de597b7d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1628,20 +1628,43 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   * the others.
>   *
>   * So in those cases, request a kvmclock update for all vcpus.
> - * The worst case for a remote vcpu to update its kvmclock
> - * is then bounded by maximum nohz sleep latency.
> + * We need to rate-limit these requests though, as they can
> + * considerably slow guests that have a large number of vcpus.
> + * The time for a remote vcpu to update its kvmclock is bound
> + * by the delay we use to rate-limit the updates.
>   */
>  
> -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> +#define KVMCLOCK_UPDATE_DELAY msecs_to_jiffies(100)
> +
> +static void kvmclock_update_fn(struct work_struct *work)
>  {
>  	int i;
> -	struct kvm *kvm = v->kvm;
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> +					   kvmclock_update_work);
> +	struct kvm *kvm = container_of(ka, struct kvm, arch);
>  	struct kvm_vcpu *vcpu;
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
>  		kvm_vcpu_kick(vcpu);
>  	}
> +	kvm_put_kvm(kvm);
> +}

Can cancel_work_sync on vm shutdown instead of get/put kvm ?

(somewhat annoying for vm to not go down immediatelly).

> +static void kvm_schedule_kvmclock_update(struct kvm *kvm)
> +{
> +	kvm_get_kvm(kvm);
> +	schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> +					KVMCLOCK_UPDATE_DELAY);
> +}
> +
> +static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> +{
> +	struct kvm *kvm = v->kvm;
> +
> +	set_bit(KVM_REQ_CLOCK_UPDATE, &v->requests);
> +	kvm_schedule_kvmclock_update(kvm);
>  }
>  
>  static bool msr_mtrr_valid(unsigned msr)
> @@ -7019,6 +7042,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  	pvclock_update_vm_gtod_copy(kvm);
>  
> +	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
> +
>  	return 0;
>  }
>  
> -- 
> 1.8.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86: kvm: introduce periodic global clock updates
  2014-02-26 18:15 ` [PATCH 2/2] x86: kvm: introduce periodic " Andrew Jones
@ 2014-02-26 20:28   ` Marcelo Tosatti
  2014-02-27 14:12     ` Andrew Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2014-02-26 20:28 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, linux-kernel, pbonzini

On Wed, Feb 26, 2014 at 07:15:12PM +0100, Andrew Jones wrote:
> commit 0061d53daf26f introduced a mechanism to execute a global clock
> update for a vm. We can apply this periodically in order to propagate
> host NTP corrections. Also, if all vcpus of a vm are pinned, then
> without an additional trigger, no guest NTP corrections can propagate
> either, as the current trigger is only vcpu cpu migration.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 65 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9aa09d330a4b5..77c69aa4756f9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -599,6 +599,7 @@ struct kvm_arch {
>  	u64 master_kernel_ns;
>  	cycle_t master_cycle_now;
>  	struct delayed_work kvmclock_update_work;
> +	bool clocks_synced;
>  
>  	struct kvm_xen_hvm_config xen_hvm_config;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a2d30de597b7d..5cba20b446aac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1620,6 +1620,60 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  	return 0;
>  }
>  
> +static void kvm_schedule_kvmclock_update(struct kvm *kvm, bool now);
> +static void clock_sync_fn(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(clock_sync_work, clock_sync_fn);
> +
> +#define CLOCK_SYNC_PERIOD_SECS	300
> +#define CLOCK_SYNC_BUMP_SECS	30
> +#define CLOCK_SYNC_STEP_MSECS	100
> +
> +#define __steps(s) (((s) * MSEC_PER_SEC) / CLOCK_SYNC_STEP_MSECS)
> +
> +static void clock_sync_fn(struct work_struct *work)
> +{
> +	static unsigned reset_step = __steps(CLOCK_SYNC_PERIOD_SECS);
> +	static unsigned step = 0;
> +	struct kvm *kvm;
> +	bool sync = false;
> +
> +	spin_lock(&kvm_lock);

Better have it per vm to avoid kvm_lock?

> +	if (step == 0)
> +		list_for_each_entry(kvm, &vm_list, vm_list)
> +			kvm->arch.clocks_synced = false;
> +
> +	list_for_each_entry(kvm, &vm_list, vm_list) {
> +		if (!kvm->arch.clocks_synced) {
> +			kvm_get_kvm(kvm);
> +			sync = true;
> +			break;
> +		}
> +	}
> +
> +	spin_unlock(&kvm_lock);
> +
> +	if (sync) {
> +		kvm_schedule_kvmclock_update(kvm, true);
> +		kvm_put_kvm(kvm);
> +
> +		if (++step == reset_step) {
> +			reset_step += __steps(CLOCK_SYNC_BUMP_SECS);
> +			pr_warn("kvmclock: reducing VM clock sync frequency "
> +				"to every %ld seconds.\n", (reset_step
> +					* CLOCK_SYNC_STEP_MSECS)/MSEC_PER_SEC);
> +		}

Can you explain the variable sync frequency? To be based on NTP
frequency correction?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86: kvm: rate-limit global clock updates
  2014-02-26 20:25   ` Marcelo Tosatti
@ 2014-02-27 13:11     ` Andrew Jones
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Jones @ 2014-02-27 13:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, linux-kernel, pbonzini

On Wed, Feb 26, 2014 at 05:25:26PM -0300, Marcelo Tosatti wrote:
> On Wed, Feb 26, 2014 at 07:15:11PM +0100, Andrew Jones wrote:
> > When we update a vcpu's local clock it may pick up an NTP correction.
> > We can't wait an indeterminate amount of time for other vcpus to pick
> > up that correction, so commit 0061d53daf26f introduced a global clock
> > update. However, we can't request a global clock update on every vcpu
> > load either (which is what happens if the tsc is marked as unstable).
> > The solution is to rate-limit the global clock updates. Marcelo
> > calculated that we should delay the global clock updates no more
> > than 0.1s as follows:
> > 
> > Assume an NTP correction c is applied to one vcpu, but not the other,
> > then in n seconds the delta of the vcpu system_timestamps will be
> > c * n. If we assume a correction of 500ppm (worst-case), then the two
> > vcpus will diverge 100us in 0.1s, which is a considerable amount.
> 
> 100us -> 50us.

doh, will fix.

> 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/x86.c              | 33 +++++++++++++++++++++++++++++----
> >  2 files changed, 30 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index e714f8c08ccf2..9aa09d330a4b5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -598,6 +598,7 @@ struct kvm_arch {
> >  	bool use_master_clock;
> >  	u64 master_kernel_ns;
> >  	cycle_t master_cycle_now;
> > +	struct delayed_work kvmclock_update_work;
> >  
> >  	struct kvm_xen_hvm_config xen_hvm_config;
> >  
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 4cca45853dfeb..a2d30de597b7d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1628,20 +1628,43 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >   * the others.
> >   *
> >   * So in those cases, request a kvmclock update for all vcpus.
> > - * The worst case for a remote vcpu to update its kvmclock
> > - * is then bounded by maximum nohz sleep latency.
> > + * We need to rate-limit these requests though, as they can
> > + * considerably slow guests that have a large number of vcpus.
> > + * The time for a remote vcpu to update its kvmclock is bound
> > + * by the delay we use to rate-limit the updates.
> >   */
> >  
> > -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> > +#define KVMCLOCK_UPDATE_DELAY msecs_to_jiffies(100)
> > +
> > +static void kvmclock_update_fn(struct work_struct *work)
> >  {
> >  	int i;
> > -	struct kvm *kvm = v->kvm;
> > +	struct delayed_work *dwork = to_delayed_work(work);
> > +	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> > +					   kvmclock_update_work);
> > +	struct kvm *kvm = container_of(ka, struct kvm, arch);
> >  	struct kvm_vcpu *vcpu;
> >  
> >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> >  		set_bit(KVM_REQ_CLOCK_UPDATE, &vcpu->requests);
> >  		kvm_vcpu_kick(vcpu);
> >  	}
> > +	kvm_put_kvm(kvm);
> > +}
> 
> Can cancel_work_sync on vm shutdown instead of get/put kvm ?
> 
> (somewhat annoying for vm to not go down immediatelly).

OK

> 
> > +static void kvm_schedule_kvmclock_update(struct kvm *kvm)
> > +{
> > +	kvm_get_kvm(kvm);
> > +	schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > +					KVMCLOCK_UPDATE_DELAY);
> > +}
> > +
> > +static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> > +{
> > +	struct kvm *kvm = v->kvm;
> > +
> > +	set_bit(KVM_REQ_CLOCK_UPDATE, &v->requests);
> > +	kvm_schedule_kvmclock_update(kvm);
> >  }
> >  
> >  static bool msr_mtrr_valid(unsigned msr)
> > @@ -7019,6 +7042,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >  
> >  	pvclock_update_vm_gtod_copy(kvm);
> >  
> > +	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
> > +
> >  	return 0;
> >  }
> >  
> > -- 
> > 1.8.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86: kvm: introduce periodic global clock updates
  2014-02-26 20:28   ` Marcelo Tosatti
@ 2014-02-27 14:12     ` Andrew Jones
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Jones @ 2014-02-27 14:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, linux-kernel, pbonzini

On Wed, Feb 26, 2014 at 05:28:57PM -0300, Marcelo Tosatti wrote:
> On Wed, Feb 26, 2014 at 07:15:12PM +0100, Andrew Jones wrote:
> > commit 0061d53daf26f introduced a mechanism to execute a global clock
> > update for a vm. We can apply this periodically in order to propagate
> > host NTP corrections. Also, if all vcpus of a vm are pinned, then
> > without an additional trigger, no guest NTP corrections can propagate
> > either, as the current trigger is only vcpu cpu migration.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/x86.c              | 65 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 63 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 9aa09d330a4b5..77c69aa4756f9 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -599,6 +599,7 @@ struct kvm_arch {
> >  	u64 master_kernel_ns;
> >  	cycle_t master_cycle_now;
> >  	struct delayed_work kvmclock_update_work;
> > +	bool clocks_synced;
> >  
> >  	struct kvm_xen_hvm_config xen_hvm_config;
> >  
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a2d30de597b7d..5cba20b446aac 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1620,6 +1620,60 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> >  	return 0;
> >  }
> >  
> > +static void kvm_schedule_kvmclock_update(struct kvm *kvm, bool now);
> > +static void clock_sync_fn(struct work_struct *work);
> > +static DECLARE_DELAYED_WORK(clock_sync_work, clock_sync_fn);
> > +
> > +#define CLOCK_SYNC_PERIOD_SECS	300
> > +#define CLOCK_SYNC_BUMP_SECS	30
> > +#define CLOCK_SYNC_STEP_MSECS	100
> > +
> > +#define __steps(s) (((s) * MSEC_PER_SEC) / CLOCK_SYNC_STEP_MSECS)
> > +
> > +static void clock_sync_fn(struct work_struct *work)
> > +{
> > +	static unsigned reset_step = __steps(CLOCK_SYNC_PERIOD_SECS);
> > +	static unsigned step = 0;
> > +	struct kvm *kvm;
> > +	bool sync = false;
> > +
> > +	spin_lock(&kvm_lock);
> 
> Better have it per vm to avoid kvm_lock?

We could, but the way it is now guarantees we never update more than
one vm at a time. Updates can generate a lot of scheduling and IPIs
when the vms have lots of vcpus. That said, I guess it's pretty
unlikely that more one vm's 5 minute period would ever get scheduled
at the same time. I can rework this to avoid the lock.

> 
> > +	if (step == 0)
> > +		list_for_each_entry(kvm, &vm_list, vm_list)
> > +			kvm->arch.clocks_synced = false;
> > +
> > +	list_for_each_entry(kvm, &vm_list, vm_list) {
> > +		if (!kvm->arch.clocks_synced) {
> > +			kvm_get_kvm(kvm);
> > +			sync = true;
> > +			break;
> > +		}
> > +	}
> > +
> > +	spin_unlock(&kvm_lock);
> > +
> > +	if (sync) {
> > +		kvm_schedule_kvmclock_update(kvm, true);
> > +		kvm_put_kvm(kvm);
> > +
> > +		if (++step == reset_step) {
> > +			reset_step += __steps(CLOCK_SYNC_BUMP_SECS);
> > +			pr_warn("kvmclock: reducing VM clock sync frequency "
> > +				"to every %ld seconds.\n", (reset_step
> > +					* CLOCK_SYNC_STEP_MSECS)/MSEC_PER_SEC);
> > +		}
> 
> Can you explain the variable sync frequency? To be based on NTP
> frequency correction?
> 

NTP correction polling time is variable, but commonly ends up being close
to the max polling time (1024s). We choose 300s for this periodic update
because even if the host is polling at a higher rate it's ok to leave the
guest clocks uncorrected a bit longer, and for the common case it's a few
times more frequent. The variability (bumping) I have in this function is
actually not related to ntp correction though. Here I'm just ensuring that
if we ever had more than 3000 vms running at the same time that we could
get to each of their updates before clearing all clocks_synced flags.
I chose 30 seconds (300 vms) as the amount to bump by arbitrarily, but
it seems reasonable to me. Reworking this patch so each vm has its own
periodic work removes the need for this bumping too.

drew

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-27 14:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26 18:15 [PATCH 0/2] x86: kvm: global clock updates Andrew Jones
2014-02-26 18:15 ` [PATCH 1/2] x86: kvm: rate-limit " Andrew Jones
2014-02-26 20:25   ` Marcelo Tosatti
2014-02-27 13:11     ` Andrew Jones
2014-02-26 18:15 ` [PATCH 2/2] x86: kvm: introduce periodic " Andrew Jones
2014-02-26 20:28   ` Marcelo Tosatti
2014-02-27 14:12     ` Andrew Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox