[PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE
@ 2013-10-08 17:38 Marc Zyngier
  2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm; +Cc: Christoffer Dall, raghavendra.kt

This is a respin of a patch I posted a long while ago, this time with
numbers that I hope to be convincing enough.

The basic idea is that spinning on WFE in a guest is a waste of
resource, and that we're better of running another vcpu instead. This
specially shows when the system is oversubscribed. The guest vcpus can
be seen spinning, waiting for a lock to be released while the lock
holder is nowhere near a physical CPU.

This patch series just enables WFE trapping on both ARM and arm64, and
calls kvm_vcpu_on_spin(). This is enough to boost other vcpus, and
dramatically reduce the overhead.

Branch available at:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/wfe-trap

Changes from v1:
- Added CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT, as it seems to give
  slightly better results (Thanks to Raghavendra K T)
- Updated commit message with results of 8x configuration

Marc Zyngier (2):
  ARM: KVM: Yield CPU when vcpu executes a WFE
  arm64: KVM: Yield CPU when vcpu executes a WFE

 arch/arm/include/asm/kvm_arm.h   |  4 +++-
 arch/arm/kvm/Kconfig             |  1 +
 arch/arm/kvm/handle_exit.c       |  6 +++++-
 arch/arm64/include/asm/kvm_arm.h |  8 ++++++--
 arch/arm64/kvm/Kconfig           |  1 +
 arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
 6 files changed, 29 insertions(+), 9 deletions(-)

-- 
1.8.2.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
  2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
@ 2013-10-08 17:38 ` Marc Zyngier
  2013-10-16  1:14   ` Christoffer Dall
  2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
  2013-10-09  9:12 ` [PATCH v2 0/2] " Raghavendra K T
  2 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm; +Cc: Christoffer Dall, raghavendra.kt

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Quick test to estimate the performance: hackbench 1 process 1000

2xA15 host (baseline):	1.843s

2xA15 guest w/o patch:	2.083s
4xA15 guest w/o patch:	80.212s
8xA15 guest w/o patch:	Could not be bothered to find out

2xA15 guest w/ patch:	2.102s
4xA15 guest w/ patch:	3.205s
8xA15 guest w/ patch:	6.887s

So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_arm.h | 4 +++-
 arch/arm/kvm/Kconfig           | 1 +
 arch/arm/kvm/handle_exit.c     | 6 +++++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 64e9696..693d5b2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -67,7 +67,7 @@
  */
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
 			HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
-			HCR_SWIO | HCR_TIDCP)
+			HCR_TWE | HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* System Control Register (SCTLR) bits */
@@ -208,6 +208,8 @@
 #define HSR_EC_DABT	(0x24)
 #define HSR_EC_DABT_HYP	(0x25)
 
+#define HSR_WFI_IS_WFE		(1U << 0)
+
 #define HSR_HVC_IMM_MASK	((1UL << 16) - 1)
 
 #define HSR_DABT_S1PTW		(1U << 7)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index ebf5015..466bd29 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
 	bool "Kernel-based Virtual Machine (KVM) support"
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE
diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index df4c82d..c4c496f 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
 static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	trace_kvm_wfi(*vcpu_pc(vcpu));
-	kvm_vcpu_block(vcpu);
+	if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
+		kvm_vcpu_on_spin(vcpu);
+	else
+		kvm_vcpu_block(vcpu);
+
 	return 1;
 }
 
-- 
1.8.2.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] arm64: KVM: Yield CPU when vcpu executes a WFE
  2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
  2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
@ 2013-10-08 17:38 ` Marc Zyngier
  2013-10-16  1:14   ` Christoffer Dall
  2013-10-09  9:12 ` [PATCH v2 0/2] " Raghavendra K T
  2 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm; +Cc: Christoffer Dall, raghavendra.kt

On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.

The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  8 ++++++--
 arch/arm64/kvm/Kconfig           |  1 +
 arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a5f28e2..c98ef47 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -63,6 +63,7 @@
  * TAC:		Trap ACTLR
  * TSC:		Trap SMC
  * TSW:		Trap cache operations by set/way
+ * TWE:		Trap WFE
  * TWI:		Trap WFI
  * TIDCP:	Trap L2CTLR/L2ECTLR
  * BSU_IS:	Upgrade barriers to the inner shareable domain
@@ -72,8 +73,9 @@
  * FMO:		Override CPSR.F and enable signaling with VF
  * SWIO:	Turn set/way invalidates into set/way clean+invalidate
  */
-#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
-			 HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
+			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
+			 HCR_AMO | HCR_IMO | HCR_FMO | \
 			 HCR_SWIO | HCR_TIDCP | HCR_RW)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
@@ -242,4 +244,6 @@
 
 #define ESR_EL2_EC_xABT_xFSR_EXTABT	0x10
 
+#define ESR_EL2_EC_WFI_ISS_WFE	(1 << 0)
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 21e9082..4480ab3 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	select KVM_ARM_VGIC
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 9beaca03..8da5606 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 }
 
 /**
- * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
+ *		    instruction executed by a guest
+ *
  * @vcpu:	the vcpu pointer
  *
- * Simply call kvm_vcpu_block(), which will halt execution of
+ * WFE: Yield the CPU and come back to this vcpu when the scheduler
+ * decides to.
+ * WFI: Simply call kvm_vcpu_block(), which will halt execution of
  * world-switches and schedule other host processes until there is an
  * incoming IRQ or FIQ to the VM.
  */
-static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	kvm_vcpu_block(vcpu);
+	if (kvm_vcpu_get_hsr(vcpu) & ESR_EL2_EC_WFI_ISS_WFE)
+		kvm_vcpu_on_spin(vcpu);
+	else
+		kvm_vcpu_block(vcpu);
+
 	return 1;
 }
 
 static exit_handle_fn arm_exit_handlers[] = {
-	[ESR_EL2_EC_WFI]	= kvm_handle_wfi,
+	[ESR_EL2_EC_WFI]	= kvm_handle_wfx,
 	[ESR_EL2_EC_CP15_32]	= kvm_handle_cp15_32,
 	[ESR_EL2_EC_CP15_64]	= kvm_handle_cp15_64,
 	[ESR_EL2_EC_CP14_MR]	= kvm_handle_cp14_access,
-- 
1.8.2.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE
  2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
  2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
  2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
@ 2013-10-09  9:12 ` Raghavendra K T
  2 siblings, 0 replies; 8+ messages in thread
From: Raghavendra K T @ 2013-10-09  9:12 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvmarm, kvm, Christoffer Dall

On 10/08/2013 11:08 PM, Marc Zyngier wrote:
> This is a respin of a patch I posted a long while ago, this time with
> numbers that I hope to be convincing enough.
>
> The basic idea is that spinning on WFE in a guest is a waste of
> resource, and that we're better of running another vcpu instead. This
> specially shows when the system is oversubscribed. The guest vcpus can
> be seen spinning, waiting for a lock to be released while the lock
> holder is nowhere near a physical CPU.
>
> This patch series just enables WFE trapping on both ARM and arm64, and
> calls kvm_vcpu_on_spin(). This is enough to boost other vcpus, and
> dramatically reduce the overhead.
>
> Branch available at:
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/wfe-trap
>
> Changes from v1:
> - Added CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT, as it seems to give
>    slightly better results (Thanks to Raghavendra K T)
> - Updated commit message with results of 8x configuration
>
> Marc Zyngier (2):
>    ARM: KVM: Yield CPU when vcpu executes a WFE
>    arm64: KVM: Yield CPU when vcpu executes a WFE

Using PLE handler and enabling CPU_RELAX_INTERCEPT part of the patches
looks fine.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
  2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
@ 2013-10-16  1:14   ` Christoffer Dall
  2013-10-16  7:08     ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Christoffer Dall @ 2013-10-16  1:14 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvmarm, kvm, raghavendra.kt

On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
> On an (even slightly) oversubscribed system, spinlocks are quickly
> becoming a bottleneck, as some vcpus are spinning, waiting for a
> lock to be released, while the vcpu holding the lock may not be
> running at all.
> 
> This creates contention, and the observed slowdown is 40x for
> hackbench. No, this isn't a typo.
> 
> The solution is to trap blocking WFEs and tell KVM that we're
> now spinning. This ensures that other vpus will get a scheduling
> boost, allowing the lock to be released more quickly. Also, using
> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
> when the VM is severely overcommited.
> 
> Quick test to estimate the performance: hackbench 1 process 1000
> 
> 2xA15 host (baseline):	1.843s
> 
> 2xA15 guest w/o patch:	2.083s
> 4xA15 guest w/o patch:	80.212s
> 8xA15 guest w/o patch:	Could not be bothered to find out
> 
> 2xA15 guest w/ patch:	2.102s
> 4xA15 guest w/ patch:	3.205s
> 8xA15 guest w/ patch:	6.887s
> 
> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
> which is vaguely more acceptable.
> 
Patch looks good, I can just apply it and add the other one I just send
as a reply if there are no objections.

Sorry for the long turn-around on this one.

-Christoffer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] arm64: KVM: Yield CPU when vcpu executes a WFE
  2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
@ 2013-10-16  1:14   ` Christoffer Dall
  0 siblings, 0 replies; 8+ messages in thread
From: Christoffer Dall @ 2013-10-16  1:14 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvmarm, kvm, raghavendra.kt

On Tue, Oct 08, 2013 at 06:38:14PM +0100, Marc Zyngier wrote:
> On an (even slightly) oversubscribed system, spinlocks are quickly
> becoming a bottleneck, as some vcpus are spinning, waiting for a
> lock to be released, while the vcpu holding the lock may not be
> running at all.
> 
> The solution is to trap blocking WFEs and tell KVM that we're
> now spinning. This ensures that other vpus will get a scheduling
> boost, allowing the lock to be released more quickly. Also, using
> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
> when the VM is severely overcommited.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  8 ++++++--
>  arch/arm64/kvm/Kconfig           |  1 +
>  arch/arm64/kvm/handle_exit.c     | 18 +++++++++++++-----
>  3 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index a5f28e2..c98ef47 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -63,6 +63,7 @@
>   * TAC:		Trap ACTLR
>   * TSC:		Trap SMC
>   * TSW:		Trap cache operations by set/way
> + * TWE:		Trap WFE
>   * TWI:		Trap WFI
>   * TIDCP:	Trap L2CTLR/L2ECTLR
>   * BSU_IS:	Upgrade barriers to the inner shareable domain
> @@ -72,8 +73,9 @@
>   * FMO:		Override CPSR.F and enable signaling with VF
>   * SWIO:	Turn set/way invalidates into set/way clean+invalidate
>   */
> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
> -			 HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
> +#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> +			 HCR_BSU_IS | HCR_FB | HCR_TAC | \
> +			 HCR_AMO | HCR_IMO | HCR_FMO | \
>  			 HCR_SWIO | HCR_TIDCP | HCR_RW)
>  #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>  
> @@ -242,4 +244,6 @@
>  
>  #define ESR_EL2_EC_xABT_xFSR_EXTABT	0x10
>  
> +#define ESR_EL2_EC_WFI_ISS_WFE	(1 << 0)
> +
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 21e9082..4480ab3 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>  	select MMU_NOTIFIER
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
> +	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select KVM_MMIO
>  	select KVM_ARM_HOST
>  	select KVM_ARM_VGIC
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 9beaca03..8da5606 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  }
>  
>  /**
> - * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
> + * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
> + *		    instruction executed by a guest
> + *
>   * @vcpu:	the vcpu pointer
>   *
> - * Simply call kvm_vcpu_block(), which will halt execution of
> + * WFE: Yield the CPU and come back to this vcpu when the scheduler
> + * decides to.
> + * WFI: Simply call kvm_vcpu_block(), which will halt execution of
>   * world-switches and schedule other host processes until there is an
>   * incoming IRQ or FIQ to the VM.
>   */
> -static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	kvm_vcpu_block(vcpu);
> +	if (kvm_vcpu_get_hsr(vcpu) & ESR_EL2_EC_WFI_ISS_WFE)
> +		kvm_vcpu_on_spin(vcpu);
> +	else
> +		kvm_vcpu_block(vcpu);
> +
>  	return 1;
>  }
>  
>  static exit_handle_fn arm_exit_handlers[] = {
> -	[ESR_EL2_EC_WFI]	= kvm_handle_wfi,
> +	[ESR_EL2_EC_WFI]	= kvm_handle_wfx,
>  	[ESR_EL2_EC_CP15_32]	= kvm_handle_cp15_32,
>  	[ESR_EL2_EC_CP15_64]	= kvm_handle_cp15_64,
>  	[ESR_EL2_EC_CP14_MR]	= kvm_handle_cp14_access,
> -- 
> 1.8.2.3
> 
ack

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
  2013-10-16  1:14   ` Christoffer Dall
@ 2013-10-16  7:08     ` Marc Zyngier
  2013-10-16 16:55       ` Christoffer Dall
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2013-10-16  7:08 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: raghavendra.kt, kvmarm, linux-arm-kernel, kvm

On 2013-10-16 02:14, Christoffer Dall wrote:
> On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
>> On an (even slightly) oversubscribed system, spinlocks are quickly
>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>> lock to be released, while the vcpu holding the lock may not be
>> running at all.
>>
>> This creates contention, and the observed slowdown is 40x for
>> hackbench. No, this isn't a typo.
>>
>> The solution is to trap blocking WFEs and tell KVM that we're
>> now spinning. This ensures that other vpus will get a scheduling
>> boost, allowing the lock to be released more quickly. Also, using
>> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the 
>> performance
>> when the VM is severely overcommited.
>>
>> Quick test to estimate the performance: hackbench 1 process 1000
>>
>> 2xA15 host (baseline):	1.843s
>>
>> 2xA15 guest w/o patch:	2.083s
>> 4xA15 guest w/o patch:	80.212s
>> 8xA15 guest w/o patch:	Could not be bothered to find out
>>
>> 2xA15 guest w/ patch:	2.102s
>> 4xA15 guest w/ patch:	3.205s
>> 8xA15 guest w/ patch:	6.887s
>>
>> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
>> which is vaguely more acceptable.
>>
> Patch looks good, I can just apply it and add the other one I just 
> send
> as a reply if there are no objections.

Yeah, I missed the updated comments on this one, thanks for taking care 
of it.

> Sorry for the long turn-around on this one.

No worries. As long as it goes in, I'm happy. It makes such a 
difference on my box, it is absolutely mind boggling.

Thanks,

          M.
-- 
Fast, cheap, reliable. Pick two.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
  2013-10-16  7:08     ` Marc Zyngier
@ 2013-10-16 16:55       ` Christoffer Dall
  0 siblings, 0 replies; 8+ messages in thread
From: Christoffer Dall @ 2013-10-16 16:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Raghavendra KT, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org

On 16 October 2013 00:08, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-10-16 02:14, Christoffer Dall wrote:
>>
>> On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
>>>
>>> On an (even slightly) oversubscribed system, spinlocks are quickly
>>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>>> lock to be released, while the vcpu holding the lock may not be
>>> running at all.
>>>
>>> This creates contention, and the observed slowdown is 40x for
>>> hackbench. No, this isn't a typo.
>>>
>>> The solution is to trap blocking WFEs and tell KVM that we're
>>> now spinning. This ensures that other vpus will get a scheduling
>>> boost, allowing the lock to be released more quickly. Also, using
>>> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
>>> when the VM is severely overcommited.
>>>
>>> Quick test to estimate the performance: hackbench 1 process 1000
>>>
>>> 2xA15 host (baseline):  1.843s
>>>
>>> 2xA15 guest w/o patch:  2.083s
>>> 4xA15 guest w/o patch:  80.212s
>>> 8xA15 guest w/o patch:  Could not be bothered to find out
>>>
>>> 2xA15 guest w/ patch:   2.102s
>>> 4xA15 guest w/ patch:   3.205s
>>> 8xA15 guest w/ patch:   6.887s
>>>
>>> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
>>> which is vaguely more acceptable.
>>>
>> Patch looks good, I can just apply it and add the other one I just send
>> as a reply if there are no objections.
>
>
> Yeah, I missed the updated comments on this one, thanks for taking care of
> it.
>

np.


>
>> Sorry for the long turn-around on this one.
>
>
> No worries. As long as it goes in, I'm happy. It makes such a difference on
> my box, it is absolutely mind boggling.
>
Applied to kvm-arm-next.

-Christoffer

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-10-16 16:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
2013-10-16  1:14   ` Christoffer Dall
2013-10-16  7:08     ` Marc Zyngier
2013-10-16 16:55       ` Christoffer Dall
2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
2013-10-16  1:14   ` Christoffer Dall
2013-10-09  9:12 ` [PATCH v2 0/2] " Raghavendra K T

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox