public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
@ 2016-10-11 12:17 Wanpeng Li
  2016-10-11 12:17 ` [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle " Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-10-11 12:17 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Yunhong Jiang,
	Wanpeng Li

Most windows guests which I have on hand currently still utilize APIC Timer 
periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
- windows 2008 server r2
- windows 2012 server r2
- windows 7
- windows 10 

This patchset adds the support using the VMX preemption timer for APIC Timer 
periodic/oneshot mode. 

I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and observed
that w/ patch the latency is ~2% of w/o patch. I think maybe something is still 
not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is not 
calculated correctly. Your comments to improve the patchset is a great appreciated.

Wanpeng Li (2):
  KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
  KVM: x86: Support using the vmx preemption timer for APIC Timer periodic/one mode

 arch/x86/kvm/lapic.c | 162 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 95 insertions(+), 67 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle periodic/oneshot mode
  2016-10-11 12:17 [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode Wanpeng Li
@ 2016-10-11 12:17 ` Wanpeng Li
  2016-10-11 12:17 ` [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer " Wanpeng Li
  2017-07-11  0:13 ` [PATCH RFC 0/2] " Andy Lutomirski
  2 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-10-11 12:17 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Yunhong Jiang,
	Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Extract start_sw_period() to handle periodic/oneshot mode, it will be 
used by later patch.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/lapic.c | 89 +++++++++++++++++++++++++++-------------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 23b99f3..dad743e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1347,6 +1347,50 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 	local_irq_restore(flags);
 }
 
+static void start_sw_period(struct kvm_lapic *apic)
+{
+	ktime_t now;
+
+	/* lapic timer in oneshot or periodic mode */
+	now = apic->lapic_timer.timer.base->get_time();
+	apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+		    * APIC_BUS_CYCLE_NS * apic->divide_count;
+
+	if (!apic->lapic_timer.period)
+		return;
+	/*
+	 * Do not allow the guest to program periodic timers with small
+	 * interval, since the hrtimers are not throttled by the host
+	 * scheduler.
+	 */
+	if (apic_lvtt_period(apic)) {
+		s64 min_period = min_timer_period_us * 1000LL;
+
+		if (apic->lapic_timer.period < min_period) {
+			pr_info_ratelimited(
+			    "kvm: vcpu %i: requested %lld ns "
+			    "lapic timer period limited to %lld ns\n",
+			    apic->vcpu->vcpu_id,
+			    apic->lapic_timer.period, min_period);
+			apic->lapic_timer.period = min_period;
+		}
+	}
+
+	hrtimer_start(&apic->lapic_timer.timer,
+		      ktime_add_ns(now, apic->lapic_timer.period),
+		      HRTIMER_MODE_ABS_PINNED);
+
+	apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
+		   PRIx64 ", "
+		   "timer initial count 0x%x, period %lldns, "
+		   "expire @ 0x%016" PRIx64 ".\n", __func__,
+		   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
+		   kvm_lapic_get_reg(apic, APIC_TMICT),
+		   apic->lapic_timer.period,
+		   ktime_to_ns(ktime_add_ns(now,
+				apic->lapic_timer.period)));
+}
+
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
 {
 	if (!lapic_in_kernel(vcpu))
@@ -1424,50 +1468,11 @@ EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
-	ktime_t now;
-
 	atomic_set(&apic->lapic_timer.pending, 0);
 
-	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
-		/* lapic timer in oneshot or periodic mode */
-		now = apic->lapic_timer.timer.base->get_time();
-		apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
-			    * APIC_BUS_CYCLE_NS * apic->divide_count;
-
-		if (!apic->lapic_timer.period)
-			return;
-		/*
-		 * Do not allow the guest to program periodic timers with small
-		 * interval, since the hrtimers are not throttled by the host
-		 * scheduler.
-		 */
-		if (apic_lvtt_period(apic)) {
-			s64 min_period = min_timer_period_us * 1000LL;
-
-			if (apic->lapic_timer.period < min_period) {
-				pr_info_ratelimited(
-				    "kvm: vcpu %i: requested %lld ns "
-				    "lapic timer period limited to %lld ns\n",
-				    apic->vcpu->vcpu_id,
-				    apic->lapic_timer.period, min_period);
-				apic->lapic_timer.period = min_period;
-			}
-		}
-
-		hrtimer_start(&apic->lapic_timer.timer,
-			      ktime_add_ns(now, apic->lapic_timer.period),
-			      HRTIMER_MODE_ABS_PINNED);
-
-		apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
-			   PRIx64 ", "
-			   "timer initial count 0x%x, period %lldns, "
-			   "expire @ 0x%016" PRIx64 ".\n", __func__,
-			   APIC_BUS_CYCLE_NS, ktime_to_ns(now),
-			   kvm_lapic_get_reg(apic, APIC_TMICT),
-			   apic->lapic_timer.period,
-			   ktime_to_ns(ktime_add_ns(now,
-					apic->lapic_timer.period)));
-	} else if (apic_lvtt_tscdeadline(apic)) {
+	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
+		start_sw_period(apic);
+	else if (apic_lvtt_tscdeadline(apic)) {
 		if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic)))
 			start_sw_tscdeadline(apic);
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2016-10-11 12:17 [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode Wanpeng Li
  2016-10-11 12:17 ` [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle " Wanpeng Li
@ 2016-10-11 12:17 ` Wanpeng Li
  2016-10-11 15:06   ` Radim Krčmář
  2017-07-11  0:13 ` [PATCH RFC 0/2] " Andy Lutomirski
  2 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2016-10-11 12:17 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Yunhong Jiang,
	Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Most windows guests still utilize APIC Timer periodic/oneshot mode 
instead of tsc-deadline mode, and the APIC Timer periodic/oneshot 
mode are still emulated by high overhead hrtimer on host. This patch 
converts the expected expire time of the periodic/oneshot mode to
guest deadline tsc in order to leverage VMX preemption timer logic 
for APIC Timer tsc-deadline mode.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/lapic.c | 83 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dad743e..f9df9e3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 		apic->lapic_timer.period == 0)
 		return 0;
 
-	remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
-	if (ktime_to_ns(remaining) < 0)
-		remaining = ktime_set(0, 0);
+	if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
+		u64 tscl = rdtsc();
 
-	ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
-	tmcct = div64_u64(ns,
-			 (APIC_BUS_CYCLE_NS * apic->divide_count));
+		tmcct = apic->lapic_timer.tscdeadline -
+			kvm_read_l1_tsc(apic->vcpu, tscl);
+	} else {
+		remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
+		if (ktime_to_ns(remaining) < 0)
+			remaining = ktime_set(0, 0);
+
+		ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
+		tmcct = div64_u64(ns,
+				 (APIC_BUS_CYCLE_NS * apic->divide_count));
+	}
 
 	return tmcct;
 }
@@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
 
-static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
+static void cancel_hv_timer(struct kvm_lapic *apic)
 {
 	kvm_x86_ops->cancel_hv_timer(apic->vcpu);
 	apic->lapic_timer.hv_timer_in_use = false;
 }
 
-void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
+static bool start_hv_timer(struct kvm_lapic *apic)
 {
-	struct kvm_lapic *apic = vcpu->arch.apic;
+	u64 tscdeadline;
 
-	WARN_ON(!apic->lapic_timer.hv_timer_in_use);
-	WARN_ON(swait_active(&vcpu->wq));
-	cancel_hv_tscdeadline(apic);
-	apic_timer_expired(apic);
-}
-EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
+	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
+		u64 tscl = rdtsc();
 
-static bool start_hv_tscdeadline(struct kvm_lapic *apic)
-{
-	u64 tscdeadline = apic->lapic_timer.tscdeadline;
+		apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
+			* APIC_BUS_CYCLE_NS * apic->divide_count;
+		apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
+			nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
+	}
+
+	tscdeadline = apic->lapic_timer.tscdeadline;
 
 	if (atomic_read(&apic->lapic_timer.pending) ||
 		kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
 		if (apic->lapic_timer.hv_timer_in_use)
-			cancel_hv_tscdeadline(apic);
+			cancel_hv_timer(apic);
 	} else {
 		apic->lapic_timer.hv_timer_in_use = true;
 		hrtimer_cancel(&apic->lapic_timer.timer);
 
 		/* In case the sw timer triggered in the window */
 		if (atomic_read(&apic->lapic_timer.pending))
-			cancel_hv_tscdeadline(apic);
+			cancel_hv_timer(apic);
 	}
 	trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
 			apic->lapic_timer.hv_timer_in_use);
 	return apic->lapic_timer.hv_timer_in_use;
 }
 
+void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+
+	WARN_ON(!apic->lapic_timer.hv_timer_in_use);
+	WARN_ON(swait_active(&vcpu->wq));
+	cancel_hv_timer(apic);
+	apic_timer_expired(apic);
+
+	if (apic_lvtt_period(apic))
+		start_hv_timer(apic);
+}
+EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
+
 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
 	WARN_ON(apic->lapic_timer.hv_timer_in_use);
 
-	if (apic_lvtt_tscdeadline(apic))
-		start_hv_tscdeadline(apic);
+	start_hv_timer(apic);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
 
@@ -1457,12 +1477,15 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
 	if (!apic->lapic_timer.hv_timer_in_use)
 		return;
 
-	cancel_hv_tscdeadline(apic);
+	cancel_hv_timer(apic);
 
 	if (atomic_read(&apic->lapic_timer.pending))
 		return;
 
-	start_sw_tscdeadline(apic);
+	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
+		start_sw_period(apic);
+	else if (apic_lvtt_tscdeadline(apic))
+		start_sw_tscdeadline(apic);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
 
@@ -1470,10 +1493,11 @@ static void start_apic_timer(struct kvm_lapic *apic)
 {
 	atomic_set(&apic->lapic_timer.pending, 0);
 
-	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-		start_sw_period(apic);
-	else if (apic_lvtt_tscdeadline(apic)) {
-		if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic)))
+	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
+		if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
+			start_sw_period(apic);
+	} else if (apic_lvtt_tscdeadline(apic)) {
+		if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
 			start_sw_tscdeadline(apic);
 	}
 }
@@ -1711,8 +1735,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) ||
-			apic_lvtt_period(apic))
+	if (!lapic_in_kernel(vcpu))
 		return 0;
 
 	return apic->lapic_timer.tscdeadline;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2016-10-11 12:17 ` [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer " Wanpeng Li
@ 2016-10-11 15:06   ` Radim Krčmář
  2016-10-12  8:35     ` Wanpeng Li
  0 siblings, 1 reply; 9+ messages in thread
From: Radim Krčmář @ 2016-10-11 15:06 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Yunhong Jiang, Wanpeng Li

2016-10-11 20:17+0800, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Most windows guests still utilize APIC Timer periodic/oneshot mode 
> instead of tsc-deadline mode, and the APIC Timer periodic/oneshot 
> mode are still emulated by high overhead hrtimer on host. This patch 
> converts the expected expire time of the periodic/oneshot mode to
> guest deadline tsc in order to leverage VMX preemption timer logic 
> for APIC Timer tsc-deadline mode.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
>  		apic->lapic_timer.period == 0)
>  		return 0;
>  
> +	if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
> +		u64 tscl = rdtsc();
> +
> +		tmcct = apic->lapic_timer.tscdeadline -
> +			kvm_read_l1_tsc(apic->vcpu, tscl);

Yes, this won't work.  The easiest way to return a less bogus TMCCT
would be remember the timeout when setting up the timer and replace
hrtimer_get_remaining() with it -- our deliver method shouldn't change
the expiration time.

> +	} else {
> +		remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
> +		if (ktime_to_ns(remaining) < 0)
> +			remaining = ktime_set(0, 0);
> +
> +		ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> +		tmcct = div64_u64(ns,
> +				 (APIC_BUS_CYCLE_NS * apic->divide_count));
> +	}
>  
>  	return tmcct;
>  }
> @@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
>  
> -static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
> +static void cancel_hv_timer(struct kvm_lapic *apic)
>  {
>  	kvm_x86_ops->cancel_hv_timer(apic->vcpu);
>  	apic->lapic_timer.hv_timer_in_use = false;
>  }
>  
> +static bool start_hv_timer(struct kvm_lapic *apic)
>  {
> +	u64 tscdeadline;
>  
> +	if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
> +		u64 tscl = rdtsc();
>  
> +		apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
> +			* APIC_BUS_CYCLE_NS * apic->divide_count;
> +		apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
> +			nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);

start_hv_timer() is called from kvm_lapic_switch_to_hv_timer(), which
can happen mid-period.  The worst case is that the timer will never
fire, because we always convert back and forth.  You have to compute the
equivalent deadline only once, and carry it around.

> +	}
> +
> +	tscdeadline = apic->lapic_timer.tscdeadline;
>  
>  	if (atomic_read(&apic->lapic_timer.pending) ||
>  		kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
>  		if (apic->lapic_timer.hv_timer_in_use)
> +			cancel_hv_timer(apic);
>  	} else {
>  		apic->lapic_timer.hv_timer_in_use = true;
>  		hrtimer_cancel(&apic->lapic_timer.timer);
>  
>  		/* In case the sw timer triggered in the window */
>  		if (atomic_read(&apic->lapic_timer.pending))
> +			cancel_hv_timer(apic);
>  	}
>  	trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
>  			apic->lapic_timer.hv_timer_in_use);
>  	return apic->lapic_timer.hv_timer_in_use;
>  }
>  
> +void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +
> +	WARN_ON(!apic->lapic_timer.hv_timer_in_use);
> +	WARN_ON(swait_active(&vcpu->wq));
> +	cancel_hv_timer(apic);
> +	apic_timer_expired(apic);
> +
> +	if (apic_lvtt_period(apic))
> +		start_hv_timer(apic);
> +}
> +EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
> +
>  void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
>  	WARN_ON(apic->lapic_timer.hv_timer_in_use);
>  
> -	if (apic_lvtt_tscdeadline(apic))
> -		start_hv_tscdeadline(apic);
> +	start_hv_timer(apic);
>  }
>  EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2016-10-11 15:06   ` Radim Krčmář
@ 2016-10-12  8:35     ` Wanpeng Li
  0 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-10-12  8:35 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel@vger.kernel.org, kvm, Paolo Bonzini, Yunhong Jiang,
	Wanpeng Li

2016-10-11 23:06 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2016-10-11 20:17+0800, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Most windows guests still utilize APIC Timer periodic/oneshot mode
>> instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
>> mode are still emulated by high overhead hrtimer on host. This patch
>> converts the expected expire time of the periodic/oneshot mode to
>> guest deadline tsc in order to leverage VMX preemption timer logic
>> for APIC Timer tsc-deadline mode.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> @@ -1101,13 +1101,20 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
>>               apic->lapic_timer.period == 0)
>>               return 0;
>>
>> +     if (kvm_lapic_hv_timer_in_use(apic->vcpu)) {
>> +             u64 tscl = rdtsc();
>> +
>> +             tmcct = apic->lapic_timer.tscdeadline -
>> +                     kvm_read_l1_tsc(apic->vcpu, tscl);
>
> Yes, this won't work.  The easiest way to return a less bogus TMCCT
> would be remember the timeout when setting up the timer and replace
> hrtimer_get_remaining() with it -- our deliver method shouldn't change
> the expiration time.

Agreed.

>
>> +     } else {
>> +             remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
>> +             if (ktime_to_ns(remaining) < 0)
>> +                     remaining = ktime_set(0, 0);
>> +
>> +             ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
>> +             tmcct = div64_u64(ns,
>> +                              (APIC_BUS_CYCLE_NS * apic->divide_count));
>> +     }
>>
>>       return tmcct;
>>  }
>> @@ -1400,52 +1407,65 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
>>
>> -static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
>> +static void cancel_hv_timer(struct kvm_lapic *apic)
>>  {
>>       kvm_x86_ops->cancel_hv_timer(apic->vcpu);
>>       apic->lapic_timer.hv_timer_in_use = false;
>>  }
>>
>> +static bool start_hv_timer(struct kvm_lapic *apic)
>>  {
>> +     u64 tscdeadline;
>>
>> +     if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
>> +             u64 tscl = rdtsc();
>>
>> +             apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
>> +                     * APIC_BUS_CYCLE_NS * apic->divide_count;
>> +             apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
>> +                     nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
>
> start_hv_timer() is called from kvm_lapic_switch_to_hv_timer(), which
> can happen mid-period.  The worst case is that the timer will never
> fire, because we always convert back and forth.  You have to compute the
> equivalent deadline only once, and carry it around.

Agreed. Thanks for your review. :) Please see RFC V2.

Regards,
Wanpeng Li

>
>> +     }
>> +
>> +     tscdeadline = apic->lapic_timer.tscdeadline;
>>
>>       if (atomic_read(&apic->lapic_timer.pending) ||
>>               kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
>>               if (apic->lapic_timer.hv_timer_in_use)
>> +                     cancel_hv_timer(apic);
>>       } else {
>>               apic->lapic_timer.hv_timer_in_use = true;
>>               hrtimer_cancel(&apic->lapic_timer.timer);
>>
>>               /* In case the sw timer triggered in the window */
>>               if (atomic_read(&apic->lapic_timer.pending))
>> +                     cancel_hv_timer(apic);
>>       }
>>       trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
>>                       apic->lapic_timer.hv_timer_in_use);
>>       return apic->lapic_timer.hv_timer_in_use;
>>  }
>>
>> +void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>> +{
>> +     struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> +     WARN_ON(!apic->lapic_timer.hv_timer_in_use);
>> +     WARN_ON(swait_active(&vcpu->wq));
>> +     cancel_hv_timer(apic);
>> +     apic_timer_expired(apic);
>> +
>> +     if (apic_lvtt_period(apic))
>> +             start_hv_timer(apic);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
>> +
>>  void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
>>  {
>>       struct kvm_lapic *apic = vcpu->arch.apic;
>>
>>       WARN_ON(apic->lapic_timer.hv_timer_in_use);
>>
>> -     if (apic_lvtt_tscdeadline(apic))
>> -             start_hv_tscdeadline(apic);
>> +     start_hv_timer(apic);
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2016-10-11 12:17 [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode Wanpeng Li
  2016-10-11 12:17 ` [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle " Wanpeng Li
  2016-10-11 12:17 ` [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer " Wanpeng Li
@ 2017-07-11  0:13 ` Andy Lutomirski
  2017-07-11  2:43   ` Wanpeng Li
  2017-07-11  7:43   ` Paolo Bonzini
  2 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2017-07-11  0:13 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm, Thomas Gleixner, X86 ML
  Cc: Paolo Bonzini, Radim Krčmář, Yunhong Jiang,
	Wanpeng Li

On 10/11/2016 05:17 AM, Wanpeng Li wrote:
> Most windows guests which I have on hand currently still utilize APIC Timer
> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
> - windows 2008 server r2
> - windows 2012 server r2
> - windows 7
> - windows 10
> 
> This patchset adds the support using the VMX preemption timer for APIC Timer
> periodic/oneshot mode.
> 
> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and observed
> that w/ patch the latency is ~2% of w/o patch. I think maybe something is still
> not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is not
> calculated correctly. Your comments to improve the patchset is a great appreciated.
> 
> Wanpeng Li (2):
>    KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
>    KVM: x86: Support using the vmx preemption timer for APIC Timer periodic/one mode
> 
>   arch/x86/kvm/lapic.c | 162 ++++++++++++++++++++++++++++++---------------------
>   1 file changed, 95 insertions(+), 67 deletions(-)
> 

I think this is a step in the right direction, but I think there's a 
different approach that would be much, much faster: use the VMX 
preemption timer for *host* preemption.  Specifically, do this:

1. Refactor the host TSC deadline timer a bit to allow the TSC deadline 
timer to be "borrow".  It might look something like this:

u64 borrow_tsc_deadline(void (*timer_callback)());

The caller is now permitted to use the TSC deadline timer for its own 
nefarious purposes.  The caller promises to call return_tsc_deadline() 
in a timely manner if the TSC exceeds the return value while the 
deadline timer is borrowed.

If the TSC deadline fires while it's borrowed, timer_callback() will be 
called.

void return_tsc_deadline(bool timer_fired);

The caller is done borrowing the TSC deadline timer.  The caller need 
not reset the TSC deadline timer MSR to its previous value before 
calling this.  It must be called with IRQs on and preemption off.

Getting this to work cleanly without races may be a bit tricky.  So be it.

2. Teach KVM to use the VMX preemption timer as a substitute deadline 
timer while in guest mode.  Specifically, KVM will borrow_tsc_deadline() 
(if TSC deadline is enabled) when entering guest mode and 
return_tsc_deadline() when returning out of guest mode.

3. Now KVM can change its MSR bitmaps to allow the guest to program the 
TSC deadline MSR directly.  No exit at all needed to handle guest writes 
to the deadline timer.


Tglx, etc, would this be even remotely acceptable?  I'm not personally 
touching this code with a ten-foot pole, but there seem to be plenty of 
people who really care about performance here.

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2017-07-11  0:13 ` [PATCH RFC 0/2] " Andy Lutomirski
@ 2017-07-11  2:43   ` Wanpeng Li
  2017-07-11 14:40     ` Andy Lutomirski
  2017-07-11  7:43   ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2017-07-11  2:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel@vger.kernel.org, kvm, Thomas Gleixner, X86 ML,
	Paolo Bonzini, Radim Krčmář, Yunhong Jiang,
	Wanpeng Li

2017-07-11 8:13 GMT+08:00 Andy Lutomirski <luto@kernel.org>:
> On 10/11/2016 05:17 AM, Wanpeng Li wrote:
>>
>> Most windows guests which I have on hand currently still utilize APIC
>> Timer
>> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
>> - windows 2008 server r2
>> - windows 2012 server r2
>> - windows 7
>> - windows 10
>>
>> This patchset adds the support using the VMX preemption timer for APIC
>> Timer
>> periodic/oneshot mode.
>>
>> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and
>> observed
>> that w/ patch the latency is ~2% of w/o patch. I think maybe something is
>> still
>> not right in the patchset, in addition, tmcct in apic_get_tmcct() maybe is
>> not
>> calculated correctly. Your comments to improve the patchset is a great
>> appreciated.
>>
>> Wanpeng Li (2):
>>    KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
>>    KVM: x86: Support using the vmx preemption timer for APIC Timer
>> periodic/one mode
>>
>>   arch/x86/kvm/lapic.c | 162
>> ++++++++++++++++++++++++++++++---------------------
>>   1 file changed, 95 insertions(+), 67 deletions(-)
>>
>
> I think this is a step in the right direction, but I think there's a
> different approach that would be much, much faster: use the VMX preemption
> timer for *host* preemption.  Specifically, do this:

Yeah, I agreed.

>
> 1. Refactor the host TSC deadline timer a bit to allow the TSC deadline
> timer to be "borrow".  It might look something like this:
>
> u64 borrow_tsc_deadline(void (*timer_callback)());
>
> The caller is now permitted to use the TSC deadline timer for its own
> nefarious purposes.  The caller promises to call return_tsc_deadline() in a
> timely manner if the TSC exceeds the return value while the deadline timer
> is borrowed.
>
> If the TSC deadline fires while it's borrowed, timer_callback() will be
> called.
>
> void return_tsc_deadline(bool timer_fired);
>
> The caller is done borrowing the TSC deadline timer.  The caller need not
> reset the TSC deadline timer MSR to its previous value before calling this.
> It must be called with IRQs on and preemption off.
>
> Getting this to work cleanly without races may be a bit tricky.  So be it.
>
> 2. Teach KVM to use the VMX preemption timer as a substitute deadline timer
> while in guest mode.  Specifically, KVM will borrow_tsc_deadline() (if TSC
> deadline is enabled) when entering guest mode and return_tsc_deadline() when
> returning out of guest mode.
>
> 3. Now KVM can change its MSR bitmaps to allow the guest to program the TSC
> deadline MSR directly.  No exit at all needed to handle guest writes to the
> deadline timer.
>
>
> Tglx, etc, would this be even remotely acceptable?  I'm not personally
> touching this code with a ten-foot pole, but there seem to be plenty of
> people who really care about performance here.

Sigh, I heard the same idea from a company called Huawei, they applied
the patent for the idea.

Regard,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2017-07-11  0:13 ` [PATCH RFC 0/2] " Andy Lutomirski
  2017-07-11  2:43   ` Wanpeng Li
@ 2017-07-11  7:43   ` Paolo Bonzini
  1 sibling, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2017-07-11  7:43 UTC (permalink / raw)
  To: Andy Lutomirski, Wanpeng Li, linux-kernel, kvm, Thomas Gleixner,
	X86 ML
  Cc: Radim Krčmář, Yunhong Jiang, Wanpeng Li

On 11/07/2017 02:13, Andy Lutomirski wrote:
> On 10/11/2016 05:17 AM, Wanpeng Li wrote:
>> Most windows guests which I have on hand currently still utilize APIC
>> Timer
>> periodic/oneshot mode instead of APIC Timer tsc-deadline mode:
>> - windows 2008 server r2
>> - windows 2012 server r2
>> - windows 7
>> - windows 10
>>
>> This patchset adds the support using the VMX preemption timer for APIC
>> Timer
>> periodic/oneshot mode.
>>
>> I add a print in oneshot mode testcase of kvm-unit-tests/apic.flat and
>> observed
>> that w/ patch the latency is ~2% of w/o patch. I think maybe something
>> is still
>> not right in the patchset, in addition, tmcct in apic_get_tmcct()
>> maybe is not
>> calculated correctly. Your comments to improve the patchset is a great
>> appreciated.
>>
>> Wanpeng Li (2):
>>    KVM: lapic: Extract start_sw_period() to handle oneshot/periodic mode
>>    KVM: x86: Support using the vmx preemption timer for APIC Timer
>> periodic/one mode
>>
>>   arch/x86/kvm/lapic.c | 162
>> ++++++++++++++++++++++++++++++---------------------
>>   1 file changed, 95 insertions(+), 67 deletions(-)
>>
> 
> I think this is a step in the right direction, but I think there's a
> different approach that would be much, much faster: use the VMX
> preemption timer for *host* preemption.  Specifically, do this:
> 
> 1. Refactor the host TSC deadline timer a bit to allow the TSC deadline
> timer to be "borrow".  It might look something like this:
> 
> u64 borrow_tsc_deadline(void (*timer_callback)());
> 
> The caller is now permitted to use the TSC deadline timer for its own
> nefarious purposes.  The caller promises to call return_tsc_deadline()
> in a timely manner if the TSC exceeds the return value while the
> deadline timer is borrowed.
> 
> If the TSC deadline fires while it's borrowed, timer_callback() will be
> called.
> 
> void return_tsc_deadline(bool timer_fired);
> 
> The caller is done borrowing the TSC deadline timer.  The caller need
> not reset the TSC deadline timer MSR to its previous value before
> calling this.  It must be called with IRQs on and preemption off.
> 
> Getting this to work cleanly without races may be a bit tricky.  So be it.
> 
> 2. Teach KVM to use the VMX preemption timer as a substitute deadline
> timer while in guest mode.  Specifically, KVM will borrow_tsc_deadline()
> (if TSC deadline is enabled) when entering guest mode and
> return_tsc_deadline() when returning out of guest mode.
> 
> 3. Now KVM can change its MSR bitmaps to allow the guest to program the
> TSC deadline MSR directly.  No exit at all needed to handle guest writes
> to the deadline timer.

This assumes that the TSC deadline MSR observes the guest TSC offset,
which I'm not at all sure of.  If you can't, you break live migration.

Also, while it would halve the cost of a guest's programming of the
timer tick, you would still incur the cost of a vmexit to call
timer_callback (it would be different if you could program the TSC
deadline timer to send a posted interrupt, of course).  Things would be
half as slow, but still a far cry from bare metal.

Really, we should just ask Intel to virtualize the TSC deadline MSR.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode
  2017-07-11  2:43   ` Wanpeng Li
@ 2017-07-11 14:40     ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2017-07-11 14:40 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, kvm,
	Thomas Gleixner, X86 ML, Paolo Bonzini,
	Radim Krčmář, Yunhong Jiang, Wanpeng Li

On Mon, Jul 10, 2017 at 7:43 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Sigh, I heard the same idea from a company called Huawei, they applied
> the patent for the idea.

Seriously?  What arseholes.

Also, calling that an invention is a bit of a joke.  It's called the
VMX *preemption* timer.  Using it for preemption isn't exactly rocket
science.  Neither is passing an MSR straight through.

That being said, the lack of scaling and offset support is indeed problematic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-11 14:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-11 12:17 [PATCH RFC 0/2] KVM: x86: Support using the VMX preemption timer for APIC Timer periodic/oneshot mode Wanpeng Li
2016-10-11 12:17 ` [PATCH RFC 1/2] KVM: lapic: Extract start_sw_period() to handle " Wanpeng Li
2016-10-11 12:17 ` [PATCH RFC 2/2] KVM: x86: Support using the VMX preemption timer for APIC Timer " Wanpeng Li
2016-10-11 15:06   ` Radim Krčmář
2016-10-12  8:35     ` Wanpeng Li
2017-07-11  0:13 ` [PATCH RFC 0/2] " Andy Lutomirski
2017-07-11  2:43   ` Wanpeng Li
2017-07-11 14:40     ` Andy Lutomirski
2017-07-11  7:43   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox