From: marc.zyngier@arm.com (Marc Zyngier)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
Date: Tue, 08 Oct 2013 13:43:09 +0100 [thread overview]
Message-ID: <5253FDDD.6050008@arm.com> (raw)
In-Reply-To: <CAC4Lta14xZNnEUXxZiaF4=PQQTjq4efjsNDAK2oBH4-Uqh_d0A@mail.gmail.com>
On 08/10/13 12:26, Raghavendra KT wrote:
> On Mon, Oct 7, 2013 at 9:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On an (even slightly) oversubscribed system, spinlocks are quickly
>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>> lock to be released, while the vcpu holding the lock may not be
>> running at all.
>>
>> This creates contention, and the observed slowdown is 40x for
>> hackbench. No, this isn't a typo.
>>
>> The solution is to trap blocking WFEs and tell KVM that we're
>> now spinning. This ensures that other vpus will get a scheduling
>> boost, allowing the lock to be released more quickly.
>>
>> From a performance point of view: hackbench 1 process 1000
>>
>> 2xA15 host (baseline): 1.843s
>>
>> 2xA15 guest w/o patch: 2.083s
>> 4xA15 guest w/o patch: 80.212s
>>
>> 2xA15 guest w/ patch: 2.072s
>> 4xA15 guest w/ patch: 3.202s
>>
>> So we go from a 40x degradation to 1.5x, which is vaguely more
>> acceptable.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>> arch/arm/include/asm/kvm_arm.h | 4 +++-
>> arch/arm/kvm/handle_exit.c | 6 +++++-
>> 2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 64e9696..693d5b2 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -67,7 +67,7 @@
>> */
>> #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>> HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>> - HCR_SWIO | HCR_TIDCP)
>> + HCR_TWE | HCR_SWIO | HCR_TIDCP)
>> #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>>
>> /* System Control Register (SCTLR) bits */
>> @@ -208,6 +208,8 @@
>> #define HSR_EC_DABT (0x24)
>> #define HSR_EC_DABT_HYP (0x25)
>>
>> +#define HSR_WFI_IS_WFE (1U << 0)
>> +
>> #define HSR_HVC_IMM_MASK ((1UL << 16) - 1)
>>
>> #define HSR_DABT_S1PTW (1U << 7)
>> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
>> index df4c82d..c4c496f 100644
>> --- a/arch/arm/kvm/handle_exit.c
>> +++ b/arch/arm/kvm/handle_exit.c
>> @@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> trace_kvm_wfi(*vcpu_pc(vcpu));
>> - kvm_vcpu_block(vcpu);
>> + if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
>> + kvm_vcpu_on_spin(vcpu);
>
> Could you also enable CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT for arm and
> check if ple handler logic helps further?
> we would ideally get one more optimization folded into ple handler if
> you enable that.
Just gave it a go, and the results are slightly (but consistently)
worse. Over 10 runs:
Without RELAX_INTERCEPT: Average run 3.3623s
With RELAX_INTERCEPT: Average run 3.4226s
Not massive, but still noticeable. Any clue?
M.
--
Jazz is not dead. It just smells funny...
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <marc.zyngier@arm.com>
To: Raghavendra KT <raghavendra.kt.linux@gmail.com>
Cc: "linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Christoffer Dall <christoffer.dall@linaro.org>,
Raghavendra KT <raghavendra.kt@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
Date: Tue, 08 Oct 2013 13:43:09 +0100 [thread overview]
Message-ID: <5253FDDD.6050008@arm.com> (raw)
In-Reply-To: <CAC4Lta14xZNnEUXxZiaF4=PQQTjq4efjsNDAK2oBH4-Uqh_d0A@mail.gmail.com>
On 08/10/13 12:26, Raghavendra KT wrote:
> On Mon, Oct 7, 2013 at 9:10 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On an (even slightly) oversubscribed system, spinlocks are quickly
>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>> lock to be released, while the vcpu holding the lock may not be
>> running at all.
>>
>> This creates contention, and the observed slowdown is 40x for
>> hackbench. No, this isn't a typo.
>>
>> The solution is to trap blocking WFEs and tell KVM that we're
>> now spinning. This ensures that other vpus will get a scheduling
>> boost, allowing the lock to be released more quickly.
>>
>> From a performance point of view: hackbench 1 process 1000
>>
>> 2xA15 host (baseline): 1.843s
>>
>> 2xA15 guest w/o patch: 2.083s
>> 4xA15 guest w/o patch: 80.212s
>>
>> 2xA15 guest w/ patch: 2.072s
>> 4xA15 guest w/ patch: 3.202s
>>
>> So we go from a 40x degradation to 1.5x, which is vaguely more
>> acceptable.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>> arch/arm/include/asm/kvm_arm.h | 4 +++-
>> arch/arm/kvm/handle_exit.c | 6 +++++-
>> 2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index 64e9696..693d5b2 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -67,7 +67,7 @@
>> */
>> #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>> HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
>> - HCR_SWIO | HCR_TIDCP)
>> + HCR_TWE | HCR_SWIO | HCR_TIDCP)
>> #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>>
>> /* System Control Register (SCTLR) bits */
>> @@ -208,6 +208,8 @@
>> #define HSR_EC_DABT (0x24)
>> #define HSR_EC_DABT_HYP (0x25)
>>
>> +#define HSR_WFI_IS_WFE (1U << 0)
>> +
>> #define HSR_HVC_IMM_MASK ((1UL << 16) - 1)
>>
>> #define HSR_DABT_S1PTW (1U << 7)
>> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
>> index df4c82d..c4c496f 100644
>> --- a/arch/arm/kvm/handle_exit.c
>> +++ b/arch/arm/kvm/handle_exit.c
>> @@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> trace_kvm_wfi(*vcpu_pc(vcpu));
>> - kvm_vcpu_block(vcpu);
>> + if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
>> + kvm_vcpu_on_spin(vcpu);
>
> Could you also enable CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT for arm and
> check if ple handler logic helps further?
> we would ideally get one more optimization folded into ple handler if
> you enable that.
Just gave it a go, and the results are slightly (but consistently)
worse. Over 10 runs:
Without RELAX_INTERCEPT: Average run 3.3623s
With RELAX_INTERCEPT: Average run 3.4226s
Not massive, but still noticeable. Any clue?
M.
--
Jazz is not dead. It just smells funny...
next prev parent reply other threads:[~2013-10-08 12:43 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-07 15:40 [PATCH 0/2] ARM/arm64: KVM: Yield CPU when vcpu executes a WFE Marc Zyngier
2013-10-07 15:40 ` Marc Zyngier
2013-10-07 15:40 ` [PATCH 1/2] ARM: " Marc Zyngier
2013-10-07 15:40 ` Marc Zyngier
2013-10-07 16:04 ` Alexander Graf
2013-10-07 16:04 ` Alexander Graf
2013-10-07 16:16 ` Marc Zyngier
2013-10-07 16:16 ` Marc Zyngier
2013-10-07 16:30 ` Alexander Graf
2013-10-07 16:30 ` Alexander Graf
2013-10-07 16:53 ` Gleb Natapov
2013-10-07 16:53 ` Gleb Natapov
2013-10-09 13:09 ` Alexander Graf
2013-10-09 13:09 ` Alexander Graf
2013-10-09 13:26 ` Gleb Natapov
2013-10-09 13:26 ` Gleb Natapov
2013-10-09 14:18 ` Marc Zyngier
2013-10-09 14:18 ` Marc Zyngier
2013-10-09 14:50 ` Anup Patel
2013-10-09 14:50 ` Anup Patel
2013-10-09 14:52 ` Anup Patel
2013-10-09 14:52 ` Anup Patel
2013-10-09 14:59 ` Marc Zyngier
2013-10-09 14:59 ` Marc Zyngier
2013-10-09 15:10 ` Anup Patel
2013-10-09 15:10 ` Anup Patel
2013-10-09 15:17 ` Marc Zyngier
2013-10-09 15:17 ` Marc Zyngier
2013-10-09 15:17 ` Anup Patel
2013-10-09 15:17 ` Anup Patel
2013-10-07 16:55 ` Marc Zyngier
2013-10-07 16:55 ` Marc Zyngier
2013-10-08 11:26 ` Raghavendra KT
2013-10-08 11:26 ` Raghavendra KT
2013-10-08 12:43 ` Marc Zyngier [this message]
2013-10-08 12:43 ` Marc Zyngier
2013-10-08 15:02 ` Raghavendra K T
2013-10-08 15:02 ` Raghavendra K T
2013-10-08 15:06 ` Marc Zyngier
2013-10-08 15:06 ` Marc Zyngier
2013-10-08 15:13 ` Raghavendra K T
2013-10-08 15:13 ` Raghavendra K T
2013-10-08 16:09 ` Marc Zyngier
2013-10-08 16:09 ` Marc Zyngier
2013-10-07 15:40 ` [PATCH 2/2] arm64: " Marc Zyngier
2013-10-07 15:40 ` Marc Zyngier
2013-10-07 15:52 ` Bhushan Bharat-R65777
2013-10-07 15:52 ` Bhushan Bharat-R65777
2013-10-07 16:00 ` Marc Zyngier
2013-10-07 16:00 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5253FDDD.6050008@arm.com \
--to=marc.zyngier@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.