* [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE
@ 2013-10-08 17:38 Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
To: linux-arm-kernel
This is a respin of a patch I posted a long while ago, this time with
numbers that I hope to be convincing enough.
The basic idea is that spinning on WFE in a guest is a waste of
resource, and that we're better of running another vcpu instead. This
specially shows when the system is oversubscribed. The guest vcpus can
be seen spinning, waiting for a lock to be released while the lock
holder is nowhere near a physical CPU.
This patch series just enables WFE trapping on both ARM and arm64, and
calls kvm_vcpu_on_spin(). This is enough to boost other vcpus, and
dramatically reduce the overhead.
Branch available at:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/wfe-trap
Changes from v1:
- Added CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT, as it seems to give
slightly better results (Thanks to Raghavendra K T)
- Updated commit message with results of 8x configuration
Marc Zyngier (2):
ARM: KVM: Yield CPU when vcpu executes a WFE
arm64: KVM: Yield CPU when vcpu executes a WFE
arch/arm/include/asm/kvm_arm.h | 4 +++-
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/handle_exit.c | 6 +++++-
arch/arm64/include/asm/kvm_arm.h | 8 ++++++--
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/handle_exit.c | 18 +++++++++++++-----
6 files changed, 29 insertions(+), 9 deletions(-)
--
1.8.2.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
@ 2013-10-08 17:38 ` Marc Zyngier
2013-10-16 1:13 ` [PATCH] KVM: ARM: Update comments for kvm_handle_wfi Christoffer Dall
2013-10-16 1:14 ` [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE Christoffer Dall
2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
2013-10-09 9:12 ` [PATCH v2 0/2] " Raghavendra K T
2 siblings, 2 replies; 11+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
To: linux-arm-kernel
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.
This creates contention, and the observed slowdown is 40x for
hackbench. No, this isn't a typo.
The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.
Quick test to estimate the performance: hackbench 1 process 1000
2xA15 host (baseline): 1.843s
2xA15 guest w/o patch: 2.083s
4xA15 guest w/o patch: 80.212s
8xA15 guest w/o patch: Could not be bothered to find out
2xA15 guest w/ patch: 2.102s
4xA15 guest w/ patch: 3.205s
8xA15 guest w/ patch: 6.887s
So we go from a 40x degradation to 1.5x in the 2x overcommit case,
which is vaguely more acceptable.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
arch/arm/include/asm/kvm_arm.h | 4 +++-
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/handle_exit.c | 6 +++++-
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 64e9696..693d5b2 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -67,7 +67,7 @@
*/
#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
- HCR_SWIO | HCR_TIDCP)
+ HCR_TWE | HCR_SWIO | HCR_TIDCP)
#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
/* System Control Register (SCTLR) bits */
@@ -208,6 +208,8 @@
#define HSR_EC_DABT (0x24)
#define HSR_EC_DABT_HYP (0x25)
+#define HSR_WFI_IS_WFE (1U << 0)
+
#define HSR_HVC_IMM_MASK ((1UL << 16) - 1)
#define HSR_DABT_S1PTW (1U << 7)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index ebf5015..466bd29 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
bool "Kernel-based Virtual Machine (KVM) support"
select PREEMPT_NOTIFIERS
select ANON_INODES
+ select HAVE_KVM_CPU_RELAX_INTERCEPT
select KVM_MMIO
select KVM_ARM_HOST
depends on ARM_VIRT_EXT && ARM_LPAE
diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index df4c82d..c4c496f 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
trace_kvm_wfi(*vcpu_pc(vcpu));
- kvm_vcpu_block(vcpu);
+ if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
+ kvm_vcpu_on_spin(vcpu);
+ else
+ kvm_vcpu_block(vcpu);
+
return 1;
}
--
1.8.2.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] arm64: KVM: Yield CPU when vcpu executes a WFE
2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
@ 2013-10-08 17:38 ` Marc Zyngier
2013-10-16 1:14 ` Christoffer Dall
2013-10-09 9:12 ` [PATCH v2 0/2] " Raghavendra K T
2 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2013-10-08 17:38 UTC (permalink / raw)
To: linux-arm-kernel
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.
The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
arch/arm64/include/asm/kvm_arm.h | 8 ++++++--
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/handle_exit.c | 18 +++++++++++++-----
3 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a5f28e2..c98ef47 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -63,6 +63,7 @@
* TAC: Trap ACTLR
* TSC: Trap SMC
* TSW: Trap cache operations by set/way
+ * TWE: Trap WFE
* TWI: Trap WFI
* TIDCP: Trap L2CTLR/L2ECTLR
* BSU_IS: Upgrade barriers to the inner shareable domain
@@ -72,8 +73,9 @@
* FMO: Override CPSR.F and enable signaling with VF
* SWIO: Turn set/way invalidates into set/way clean+invalidate
*/
-#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
- HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
+ HCR_BSU_IS | HCR_FB | HCR_TAC | \
+ HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_SWIO | HCR_TIDCP | HCR_RW)
#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
@@ -242,4 +244,6 @@
#define ESR_EL2_EC_xABT_xFSR_EXTABT 0x10
+#define ESR_EL2_EC_WFI_ISS_WFE (1 << 0)
+
#endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 21e9082..4480ab3 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
+ select HAVE_KVM_CPU_RELAX_INTERCEPT
select KVM_MMIO
select KVM_ARM_HOST
select KVM_ARM_VGIC
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 9beaca03..8da5606 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
/**
- * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
+ * instruction executed by a guest
+ *
* @vcpu: the vcpu pointer
*
- * Simply call kvm_vcpu_block(), which will halt execution of
+ * WFE: Yield the CPU and come back to this vcpu when the scheduler
+ * decides to.
+ * WFI: Simply call kvm_vcpu_block(), which will halt execution of
* world-switches and schedule other host processes until there is an
* incoming IRQ or FIQ to the VM.
*/
-static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- kvm_vcpu_block(vcpu);
+ if (kvm_vcpu_get_hsr(vcpu) & ESR_EL2_EC_WFI_ISS_WFE)
+ kvm_vcpu_on_spin(vcpu);
+ else
+ kvm_vcpu_block(vcpu);
+
return 1;
}
static exit_handle_fn arm_exit_handlers[] = {
- [ESR_EL2_EC_WFI] = kvm_handle_wfi,
+ [ESR_EL2_EC_WFI] = kvm_handle_wfx,
[ESR_EL2_EC_CP15_32] = kvm_handle_cp15_32,
[ESR_EL2_EC_CP15_64] = kvm_handle_cp15_64,
[ESR_EL2_EC_CP14_MR] = kvm_handle_cp14_access,
--
1.8.2.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE
2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
@ 2013-10-09 9:12 ` Raghavendra K T
2 siblings, 0 replies; 11+ messages in thread
From: Raghavendra K T @ 2013-10-09 9:12 UTC (permalink / raw)
To: linux-arm-kernel
On 10/08/2013 11:08 PM, Marc Zyngier wrote:
> This is a respin of a patch I posted a long while ago, this time with
> numbers that I hope to be convincing enough.
>
> The basic idea is that spinning on WFE in a guest is a waste of
> resource, and that we're better of running another vcpu instead. This
> specially shows when the system is oversubscribed. The guest vcpus can
> be seen spinning, waiting for a lock to be released while the lock
> holder is nowhere near a physical CPU.
>
> This patch series just enables WFE trapping on both ARM and arm64, and
> calls kvm_vcpu_on_spin(). This is enough to boost other vcpus, and
> dramatically reduce the overhead.
>
> Branch available at:
> git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/wfe-trap
>
> Changes from v1:
> - Added CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT, as it seems to give
> slightly better results (Thanks to Raghavendra K T)
> - Updated commit message with results of 8x configuration
>
> Marc Zyngier (2):
> ARM: KVM: Yield CPU when vcpu executes a WFE
> arm64: KVM: Yield CPU when vcpu executes a WFE
Using PLE handler and enabling CPU_RELAX_INTERCEPT part of the patches
looks fine.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] KVM: ARM: Update comments for kvm_handle_wfi
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
@ 2013-10-16 1:13 ` Christoffer Dall
2013-10-16 4:19 ` Bhushan Bharat-R65777
2013-10-16 1:14 ` [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE Christoffer Dall
1 sibling, 1 reply; 11+ messages in thread
From: Christoffer Dall @ 2013-10-16 1:13 UTC (permalink / raw)
To: linux-arm-kernel
Update comments to reflect what is really going on and add the TWE bit
to the comments in kvm_arm.h.
Also renames the function to kvm_handle_wfx like is done on arm64 for
consistency and uber-correctness.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
arch/arm/include/asm/kvm_arm.h | 1 +
arch/arm/kvm/handle_exit.c | 14 ++++++++------
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index fe395b7..1d3153c 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -57,6 +57,7 @@
* TSC: Trap SMC
* TSW: Trap cache operations by set/way
* TWI: Trap WFI
+ * TWE: Trap WFE
* TIDCP: Trap L2CTLR/L2ECTLR
* BSU_IS: Upgrade barriers to the inner shareable domain
* FB: Force broadcast of all maintainance operations
diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index c4c496f..a920790 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -73,15 +73,17 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
/**
- * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
+ * kvm_handle_wfx - handle a WFI or WFE instructions trapped in guests
* @vcpu: the vcpu pointer
* @run: the kvm_run structure pointer
*
- * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
- * halt execution of world-switches and schedule other host processes until
- * there is an incoming IRQ or FIQ to the VM.
+ * WFE: Yield the CPU and come back to this vcpu when the scheduler
+ * decides to.
+ * WFI: Simply call kvm_vcpu_block(), which will halt execution of
+ * world-switches and schedule other host processes until there is an
+ * incoming IRQ or FIQ to the VM.
*/
-static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
trace_kvm_wfi(*vcpu_pc(vcpu));
if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
@@ -93,7 +95,7 @@ static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
static exit_handle_fn arm_exit_handlers[] = {
- [HSR_EC_WFI] = kvm_handle_wfi,
+ [HSR_EC_WFI] = kvm_handle_wfx,
[HSR_EC_CP15_32] = kvm_handle_cp15_32,
[HSR_EC_CP15_64] = kvm_handle_cp15_64,
[HSR_EC_CP14_MR] = kvm_handle_cp14_access,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
2013-10-16 1:13 ` [PATCH] KVM: ARM: Update comments for kvm_handle_wfi Christoffer Dall
@ 2013-10-16 1:14 ` Christoffer Dall
2013-10-16 7:08 ` Marc Zyngier
1 sibling, 1 reply; 11+ messages in thread
From: Christoffer Dall @ 2013-10-16 1:14 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
> On an (even slightly) oversubscribed system, spinlocks are quickly
> becoming a bottleneck, as some vcpus are spinning, waiting for a
> lock to be released, while the vcpu holding the lock may not be
> running at all.
>
> This creates contention, and the observed slowdown is 40x for
> hackbench. No, this isn't a typo.
>
> The solution is to trap blocking WFEs and tell KVM that we're
> now spinning. This ensures that other vpus will get a scheduling
> boost, allowing the lock to be released more quickly. Also, using
> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
> when the VM is severely overcommited.
>
> Quick test to estimate the performance: hackbench 1 process 1000
>
> 2xA15 host (baseline): 1.843s
>
> 2xA15 guest w/o patch: 2.083s
> 4xA15 guest w/o patch: 80.212s
> 8xA15 guest w/o patch: Could not be bothered to find out
>
> 2xA15 guest w/ patch: 2.102s
> 4xA15 guest w/ patch: 3.205s
> 8xA15 guest w/ patch: 6.887s
>
> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
> which is vaguely more acceptable.
>
Patch looks good, I can just apply it and add the other one I just send
as a reply if there are no objections.
Sorry for the long turn-around on this one.
-Christoffer
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/2] arm64: KVM: Yield CPU when vcpu executes a WFE
2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
@ 2013-10-16 1:14 ` Christoffer Dall
0 siblings, 0 replies; 11+ messages in thread
From: Christoffer Dall @ 2013-10-16 1:14 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Oct 08, 2013 at 06:38:14PM +0100, Marc Zyngier wrote:
> On an (even slightly) oversubscribed system, spinlocks are quickly
> becoming a bottleneck, as some vcpus are spinning, waiting for a
> lock to be released, while the vcpu holding the lock may not be
> running at all.
>
> The solution is to trap blocking WFEs and tell KVM that we're
> now spinning. This ensures that other vpus will get a scheduling
> boost, allowing the lock to be released more quickly. Also, using
> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
> when the VM is severely overcommited.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> arch/arm64/include/asm/kvm_arm.h | 8 ++++++--
> arch/arm64/kvm/Kconfig | 1 +
> arch/arm64/kvm/handle_exit.c | 18 +++++++++++++-----
> 3 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index a5f28e2..c98ef47 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -63,6 +63,7 @@
> * TAC: Trap ACTLR
> * TSC: Trap SMC
> * TSW: Trap cache operations by set/way
> + * TWE: Trap WFE
> * TWI: Trap WFI
> * TIDCP: Trap L2CTLR/L2ECTLR
> * BSU_IS: Upgrade barriers to the inner shareable domain
> @@ -72,8 +73,9 @@
> * FMO: Override CPSR.F and enable signaling with VF
> * SWIO: Turn set/way invalidates into set/way clean+invalidate
> */
> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
> - HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
> +#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
> + HCR_BSU_IS | HCR_FB | HCR_TAC | \
> + HCR_AMO | HCR_IMO | HCR_FMO | \
> HCR_SWIO | HCR_TIDCP | HCR_RW)
> #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>
> @@ -242,4 +244,6 @@
>
> #define ESR_EL2_EC_xABT_xFSR_EXTABT 0x10
>
> +#define ESR_EL2_EC_WFI_ISS_WFE (1 << 0)
> +
> #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 21e9082..4480ab3 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
> select MMU_NOTIFIER
> select PREEMPT_NOTIFIERS
> select ANON_INODES
> + select HAVE_KVM_CPU_RELAX_INTERCEPT
> select KVM_MMIO
> select KVM_ARM_HOST
> select KVM_ARM_VGIC
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 9beaca03..8da5606 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -47,21 +47,29 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> }
>
> /**
> - * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a guest
> + * kvm_handle_wfx - handle a wait-for-interrupts or wait-for-event
> + * instruction executed by a guest
> + *
> * @vcpu: the vcpu pointer
> *
> - * Simply call kvm_vcpu_block(), which will halt execution of
> + * WFE: Yield the CPU and come back to this vcpu when the scheduler
> + * decides to.
> + * WFI: Simply call kvm_vcpu_block(), which will halt execution of
> * world-switches and schedule other host processes until there is an
> * incoming IRQ or FIQ to the VM.
> */
> -static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> - kvm_vcpu_block(vcpu);
> + if (kvm_vcpu_get_hsr(vcpu) & ESR_EL2_EC_WFI_ISS_WFE)
> + kvm_vcpu_on_spin(vcpu);
> + else
> + kvm_vcpu_block(vcpu);
> +
> return 1;
> }
>
> static exit_handle_fn arm_exit_handlers[] = {
> - [ESR_EL2_EC_WFI] = kvm_handle_wfi,
> + [ESR_EL2_EC_WFI] = kvm_handle_wfx,
> [ESR_EL2_EC_CP15_32] = kvm_handle_cp15_32,
> [ESR_EL2_EC_CP15_64] = kvm_handle_cp15_64,
> [ESR_EL2_EC_CP14_MR] = kvm_handle_cp14_access,
> --
> 1.8.2.3
>
ack
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] KVM: ARM: Update comments for kvm_handle_wfi
2013-10-16 1:13 ` [PATCH] KVM: ARM: Update comments for kvm_handle_wfi Christoffer Dall
@ 2013-10-16 4:19 ` Bhushan Bharat-R65777
2013-10-16 4:37 ` Christoffer Dall
0 siblings, 1 reply; 11+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-10-16 4:19 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Christoffer Dall [mailto:christoffer.dall at linaro.org]
> Sent: Wednesday, October 16, 2013 6:43 AM
> To: Marc Zyngier
> Cc: kvmarm at lists.cs.columbia.edu; linux-arm-kernel at lists.infradead.org
> Subject: [PATCH] KVM: ARM: Update comments for kvm_handle_wfi
>
> Update comments to reflect what is really going on and add the TWE bit to the
> comments in kvm_arm.h.
>
> Also renames the function to kvm_handle_wfx like is done on arm64 for
> consistency and uber-correctness.
s/uber/user
-Bharat
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> arch/arm/include/asm/kvm_arm.h | 1 +
> arch/arm/kvm/handle_exit.c | 14 ++++++++------
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index fe395b7..1d3153c 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -57,6 +57,7 @@
> * TSC: Trap SMC
> * TSW: Trap cache operations by set/way
> * TWI: Trap WFI
> + * TWE: Trap WFE
> * TIDCP: Trap L2CTLR/L2ECTLR
> * BSU_IS: Upgrade barriers to the inner shareable domain
> * FB: Force broadcast of all maintainance operations
> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c index
> c4c496f..a920790 100644
> --- a/arch/arm/kvm/handle_exit.c
> +++ b/arch/arm/kvm/handle_exit.c
> @@ -73,15 +73,17 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct
> kvm_run *run) }
>
> /**
> - * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a
> guest
> + * kvm_handle_wfx - handle a WFI or WFE instructions trapped in guests
> * @vcpu: the vcpu pointer
> * @run: the kvm_run structure pointer
> *
> - * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
> - * halt execution of world-switches and schedule other host processes until
> - * there is an incoming IRQ or FIQ to the VM.
> + * WFE: Yield the CPU and come back to this vcpu when the scheduler
> + * decides to.
> + * WFI: Simply call kvm_vcpu_block(), which will halt execution of
> + * world-switches and schedule other host processes until there is an
> + * incoming IRQ or FIQ to the VM.
> */
> -static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> trace_kvm_wfi(*vcpu_pc(vcpu));
> if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE) @@ -93,7 +95,7 @@ static int
> kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run) }
>
> static exit_handle_fn arm_exit_handlers[] = {
> - [HSR_EC_WFI] = kvm_handle_wfi,
> + [HSR_EC_WFI] = kvm_handle_wfx,
> [HSR_EC_CP15_32] = kvm_handle_cp15_32,
> [HSR_EC_CP15_64] = kvm_handle_cp15_64,
> [HSR_EC_CP14_MR] = kvm_handle_cp14_access,
> --
> 1.7.10.4
>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] KVM: ARM: Update comments for kvm_handle_wfi
2013-10-16 4:19 ` Bhushan Bharat-R65777
@ 2013-10-16 4:37 ` Christoffer Dall
0 siblings, 0 replies; 11+ messages in thread
From: Christoffer Dall @ 2013-10-16 4:37 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Oct 16, 2013 at 04:19:45AM +0000, Bhushan Bharat-R65777 wrote:
>
>
> > -----Original Message-----
> > From: Christoffer Dall [mailto:christoffer.dall at linaro.org]
> > Sent: Wednesday, October 16, 2013 6:43 AM
> > To: Marc Zyngier
> > Cc: kvmarm at lists.cs.columbia.edu; linux-arm-kernel at lists.infradead.org
> > Subject: [PATCH] KVM: ARM: Update comments for kvm_handle_wfi
> >
> > Update comments to reflect what is really going on and add the TWE bit to the
> > comments in kvm_arm.h.
> >
> > Also renames the function to kvm_handle_wfx like is done on arm64 for
> > consistency and uber-correctness.
>
> s/uber/user
>
No, that was actually the German word "uber" (with the dots, but I
didn't want to put them), with the meaning very very correct. My
attempt at humor failed here miserably I see :)
-Christoffer
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
2013-10-16 1:14 ` [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE Christoffer Dall
@ 2013-10-16 7:08 ` Marc Zyngier
2013-10-16 16:55 ` Christoffer Dall
0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2013-10-16 7:08 UTC (permalink / raw)
To: linux-arm-kernel
On 2013-10-16 02:14, Christoffer Dall wrote:
> On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
>> On an (even slightly) oversubscribed system, spinlocks are quickly
>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>> lock to be released, while the vcpu holding the lock may not be
>> running at all.
>>
>> This creates contention, and the observed slowdown is 40x for
>> hackbench. No, this isn't a typo.
>>
>> The solution is to trap blocking WFEs and tell KVM that we're
>> now spinning. This ensures that other vpus will get a scheduling
>> boost, allowing the lock to be released more quickly. Also, using
>> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the
>> performance
>> when the VM is severely overcommited.
>>
>> Quick test to estimate the performance: hackbench 1 process 1000
>>
>> 2xA15 host (baseline): 1.843s
>>
>> 2xA15 guest w/o patch: 2.083s
>> 4xA15 guest w/o patch: 80.212s
>> 8xA15 guest w/o patch: Could not be bothered to find out
>>
>> 2xA15 guest w/ patch: 2.102s
>> 4xA15 guest w/ patch: 3.205s
>> 8xA15 guest w/ patch: 6.887s
>>
>> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
>> which is vaguely more acceptable.
>>
> Patch looks good, I can just apply it and add the other one I just
> send
> as a reply if there are no objections.
Yeah, I missed the updated comments on this one, thanks for taking care
of it.
> Sorry for the long turn-around on this one.
No worries. As long as it goes in, I'm happy. It makes such a
difference on my box, it is absolutely mind boggling.
Thanks,
M.
--
Fast, cheap, reliable. Pick two.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
2013-10-16 7:08 ` Marc Zyngier
@ 2013-10-16 16:55 ` Christoffer Dall
0 siblings, 0 replies; 11+ messages in thread
From: Christoffer Dall @ 2013-10-16 16:55 UTC (permalink / raw)
To: linux-arm-kernel
On 16 October 2013 00:08, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-10-16 02:14, Christoffer Dall wrote:
>>
>> On Tue, Oct 08, 2013 at 06:38:13PM +0100, Marc Zyngier wrote:
>>>
>>> On an (even slightly) oversubscribed system, spinlocks are quickly
>>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>>> lock to be released, while the vcpu holding the lock may not be
>>> running at all.
>>>
>>> This creates contention, and the observed slowdown is 40x for
>>> hackbench. No, this isn't a typo.
>>>
>>> The solution is to trap blocking WFEs and tell KVM that we're
>>> now spinning. This ensures that other vpus will get a scheduling
>>> boost, allowing the lock to be released more quickly. Also, using
>>> CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
>>> when the VM is severely overcommited.
>>>
>>> Quick test to estimate the performance: hackbench 1 process 1000
>>>
>>> 2xA15 host (baseline): 1.843s
>>>
>>> 2xA15 guest w/o patch: 2.083s
>>> 4xA15 guest w/o patch: 80.212s
>>> 8xA15 guest w/o patch: Could not be bothered to find out
>>>
>>> 2xA15 guest w/ patch: 2.102s
>>> 4xA15 guest w/ patch: 3.205s
>>> 8xA15 guest w/ patch: 6.887s
>>>
>>> So we go from a 40x degradation to 1.5x in the 2x overcommit case,
>>> which is vaguely more acceptable.
>>>
>> Patch looks good, I can just apply it and add the other one I just send
>> as a reply if there are no objections.
>
>
> Yeah, I missed the updated comments on this one, thanks for taking care of
> it.
>
np.
>
>> Sorry for the long turn-around on this one.
>
>
> No worries. As long as it goes in, I'm happy. It makes such a difference on
> my box, it is absolutely mind boggling.
>
Applied to kvm-arm-next.
-Christoffer
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-10-16 16:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-08 17:38 [PATCH v2 0/2] KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
2013-10-08 17:38 ` [PATCH v2 1/2] ARM: " Marc Zyngier
2013-10-16 1:13 ` [PATCH] KVM: ARM: Update comments for kvm_handle_wfi Christoffer Dall
2013-10-16 4:19 ` Bhushan Bharat-R65777
2013-10-16 4:37 ` Christoffer Dall
2013-10-16 1:14 ` [PATCH v2 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE Christoffer Dall
2013-10-16 7:08 ` Marc Zyngier
2013-10-16 16:55 ` Christoffer Dall
2013-10-08 17:38 ` [PATCH v2 2/2] arm64: " Marc Zyngier
2013-10-16 1:14 ` Christoffer Dall
2013-10-09 9:12 ` [PATCH v2 0/2] " Raghavendra K T
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).