From: Christoffer Dall <christoffer.dall@linaro.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
Date: Tue, 9 Feb 2016 21:59:19 +0100 [thread overview]
Message-ID: <20160209205919.GF5171@cbox> (raw)
In-Reply-To: <1454931622-14902-1-git-send-email-marc.zyngier@arm.com>
On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
>
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
>
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
>
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.
I'm curious about the weight of these two? My guess based on the
measurement work I did is that the GIC is by far the worst sinner, but
that was exacerbated on X-Gene compared to Seattle.
>
> Methodology:
>
> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> and then a power-off:
>
> __start:
> mov x19, #(1 << 16)
> 1: mov x0, #0x84000000
> hvc #0
> sub x19, x19, #1
> cbnz x19, 1b
> mov x0, #0x84000000
> add x0, x0, #9
> hvc #0
> b .
>
> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>
> __start:
> mov x19, #(1 << 20)
> 1: mrs x0, PMSELR_EL0
> sub x19, x19, #1
> cbnz x19, 1b
> mov x0, #0x84000000
> add x0, x0, #9
> hvc #0
> b .
>
> * These guests are profiled using perf and kvmtool:
>
> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
these would be good to add to kvm-unit-tests so we can keep an eye on
this sort of thing...
>
> The result is then divided by the number of iterations (2^16 or 2^20).
>
> These tests have been run on Seattle, Mustang, and LS2085, and shown
> significant improvements in all cases. I've only touched the arm64
> GIC code, but obviously the 32bit code should use it as well once
> we've migrated it to C.
>
> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
>
Looks promising!
-Christoffer
WARNING: multiple messages have this Message-ID (diff)
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
Date: Tue, 9 Feb 2016 21:59:19 +0100 [thread overview]
Message-ID: <20160209205919.GF5171@cbox> (raw)
In-Reply-To: <1454931622-14902-1-git-send-email-marc.zyngier@arm.com>
On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
> I've recently been looking at our entry/exit costs, and profiling
> figures did show some very low hanging fruits.
>
> The most obvious cost is that accessing the GIC HW is slow. As in
> "deadly slow", specially when GICv2 is involved. So not hammering the
> HW when there is nothing to write is immediately beneficial, as this
> is the most common cases (whatever people seem to think, interrupts
> are a *rare* event).
>
> Another easy thing to fix is the way we handle trapped system
> registers. We do insist on (mostly) sorting them, but we do perform a
> linear search on trap. We can switch to a binary search for free, and
> get immediate benefits (the PMU code, being extremely trap-happy,
> benefits immediately from this).
>
> With these in place, I see an improvement of 20 to 30% (depending on
> the platform) on our world-switch cycle count when running a set of
> hand-crafted guests that are designed to only perform traps.
I'm curious about the weight of these two? My guess based on the
measurement work I did is that the GIC is by far the worst sinner, but
that was exacerbated on X-Gene compared to Seattle.
>
> Methodology:
>
> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
> and then a power-off:
>
> __start:
> mov x19, #(1 << 16)
> 1: mov x0, #0x84000000
> hvc #0
> sub x19, x19, #1
> cbnz x19, 1b
> mov x0, #0x84000000
> add x0, x0, #9
> hvc #0
> b .
>
> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>
> __start:
> mov x19, #(1 << 20)
> 1: mrs x0, PMSELR_EL0
> sub x19, x19, #1
> cbnz x19, 1b
> mov x0, #0x84000000
> add x0, x0, #9
> hvc #0
> b .
>
> * These guests are profiled using perf and kvmtool:
>
> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
these would be good to add to kvm-unit-tests so we can keep an eye on
this sort of thing...
>
> The result is then divided by the number of iterations (2^16 or 2^20).
>
> These tests have been run on Seattle, Mustang, and LS2085, and shown
> significant improvements in all cases. I've only touched the arm64
> GIC code, but obviously the 32bit code should use it as well once
> we've migrated it to C.
>
> I've pushed out a branch (kvm-arm64/suck-less) to the usual location.
>
Looks promising!
-Christoffer
next prev parent reply other threads:[~2016-02-09 20:52 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-08 11:40 [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 13:49 ` Alex Bennée
2016-02-10 13:49 ` Alex Bennée
2016-02-10 14:00 ` Marc Zyngier
2016-02-10 14:00 ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 2/8] ARM: KVM: Properly sort the invariant table Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:44 ` Christoffer Dall
2016-02-10 12:44 ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:45 ` Christoffer Dall
2016-02-10 12:45 ` Christoffer Dall
2016-02-10 13:34 ` Marc Zyngier
2016-02-10 13:34 ` Marc Zyngier
2016-02-10 17:30 ` Christoffer Dall
2016-02-10 17:30 ` Christoffer Dall
2016-02-10 17:43 ` Marc Zyngier
2016-02-10 17:43 ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-10 12:45 ` Christoffer Dall
2016-02-10 12:45 ` Christoffer Dall
2016-02-10 16:47 ` Marc Zyngier
2016-02-10 16:47 ` Marc Zyngier
2016-02-09 20:59 ` Christoffer Dall [this message]
2016-02-09 20:59 ` [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Christoffer Dall
2016-02-10 8:34 ` Marc Zyngier
2016-02-10 8:34 ` Marc Zyngier
2016-02-10 12:02 ` Andrew Jones
2016-02-10 12:02 ` Andrew Jones
2016-02-10 12:24 ` Marc Zyngier
2016-02-10 12:24 ` Marc Zyngier
2016-02-10 20:40 ` Christoffer Dall
2016-02-10 20:40 ` Christoffer Dall
2016-02-16 20:05 ` Marc Zyngier
2016-02-16 20:05 ` Marc Zyngier
2016-02-17 9:15 ` Christoffer Dall
2016-02-17 9:15 ` Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160209205919.GF5171@cbox \
--to=christoffer.dall@linaro.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.