From: Christoffer Dall <cdall@linaro.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvm@vger.kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
Date: Mon, 16 Oct 2017 22:59:16 +0200 [thread overview]
Message-ID: <20171016205916.GP1845@lvm> (raw)
In-Reply-To: <20171009152032.27804-1-marc.zyngier@arm.com>
On Mon, Oct 09, 2017 at 04:20:22PM +0100, Marc Zyngier wrote:
> It was recently reported that on a VM restore, we seem to spend a
> disproportionate amount of time invalidation the icache. This is
> partially due to some HW behaviour, but also because we're being a bit
> dumb and are invalidating the icache for every page we map at S2, even
> if that on a data access.
>
> The slightly better way of doing this is to mark the pages XN at S2,
> and wait for the the guest to execute something in that page, at which
> point we perform the invalidation. As it is likely that there is a lot
> less instruction than data, we win (or so we hope).
>
> We also take this opportunity to drop the extra dcache clean to the
> PoU which is pretty useless, as we already clean all the way to the
> PoC...
>
> Running a bare metal test that touches 1GB of memory (using a 4kB
> stride) leads to the following results on Seattle:
>
> 4.13:
> do_fault_read.bin: 0.565885992 seconds time elapsed
> do_fault_write.bin: 0.738296337 seconds time elapsed
> do_fault_read_write.bin: 1.241812231 seconds time elapsed
>
> 4.14-rc3+patches:
> do_fault_read.bin: 0.244961803 seconds time elapsed
> do_fault_write.bin: 0.422740092 seconds time elapsed
> do_fault_read_write.bin: 0.643402470 seconds time elapsed
>
> We're almost halving the time of something that more or less looks
> like a restore operation. Some larger systems will show much bigger
> benefits as they become less impacted by the icache invalidation
> (which is broadcast in the inner shareable domain).
>
> I've also given it a test run on both Cubietruck and Jetson-TK1.
>
> Tests are archived here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/
>
> I'd value some additional test results on HW I don't have access to.
>
What would also be interesting is some insight into how big the hit then
is on first execution, but that should in no way gate merging these
patches.
Thanks,
-Christoffer
WARNING: multiple messages have this Message-ID (diff)
From: cdall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
Date: Mon, 16 Oct 2017 22:59:16 +0200 [thread overview]
Message-ID: <20171016205916.GP1845@lvm> (raw)
In-Reply-To: <20171009152032.27804-1-marc.zyngier@arm.com>
On Mon, Oct 09, 2017 at 04:20:22PM +0100, Marc Zyngier wrote:
> It was recently reported that on a VM restore, we seem to spend a
> disproportionate amount of time invalidation the icache. This is
> partially due to some HW behaviour, but also because we're being a bit
> dumb and are invalidating the icache for every page we map at S2, even
> if that on a data access.
>
> The slightly better way of doing this is to mark the pages XN at S2,
> and wait for the the guest to execute something in that page, at which
> point we perform the invalidation. As it is likely that there is a lot
> less instruction than data, we win (or so we hope).
>
> We also take this opportunity to drop the extra dcache clean to the
> PoU which is pretty useless, as we already clean all the way to the
> PoC...
>
> Running a bare metal test that touches 1GB of memory (using a 4kB
> stride) leads to the following results on Seattle:
>
> 4.13:
> do_fault_read.bin: 0.565885992 seconds time elapsed
> do_fault_write.bin: 0.738296337 seconds time elapsed
> do_fault_read_write.bin: 1.241812231 seconds time elapsed
>
> 4.14-rc3+patches:
> do_fault_read.bin: 0.244961803 seconds time elapsed
> do_fault_write.bin: 0.422740092 seconds time elapsed
> do_fault_read_write.bin: 0.643402470 seconds time elapsed
>
> We're almost halving the time of something that more or less looks
> like a restore operation. Some larger systems will show much bigger
> benefits as they become less impacted by the icache invalidation
> (which is broadcast in the inner shareable domain).
>
> I've also given it a test run on both Cubietruck and Jetson-TK1.
>
> Tests are archived here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/
>
> I'd value some additional test results on HW I don't have access to.
>
What would also be interesting is some insight into how big the hit then
is on first execution, but that should in no way gate merging these
patches.
Thanks,
-Christoffer
next prev parent reply other threads:[~2017-10-16 20:58 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 15:20 [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:07 ` Christoffer Dall
2017-10-16 20:07 ` Christoffer Dall
2017-10-17 8:57 ` Marc Zyngier
2017-10-17 8:57 ` Marc Zyngier
2017-10-17 14:28 ` Christoffer Dall
2017-10-17 14:28 ` Christoffer Dall
2017-10-17 14:41 ` Marc Zyngier
2017-10-17 14:41 ` Marc Zyngier
2017-10-16 21:35 ` Roy Franz (Cavium)
2017-10-16 21:35 ` Roy Franz (Cavium)
2017-10-17 6:44 ` Christoffer Dall
2017-10-17 6:44 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:08 ` Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-19 16:47 ` Will Deacon
2017-10-19 16:47 ` Will Deacon
2017-10-20 13:41 ` Marc Zyngier
2017-10-20 13:41 ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:07 ` Christoffer Dall
2017-10-16 20:07 ` Christoffer Dall
2017-10-17 9:26 ` Marc Zyngier
2017-10-17 9:26 ` Marc Zyngier
2017-10-17 14:34 ` Christoffer Dall
2017-10-17 14:34 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:07 ` Christoffer Dall
2017-10-16 20:07 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:08 ` Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:08 ` Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-17 9:34 ` Marc Zyngier
2017-10-17 9:34 ` Marc Zyngier
2017-10-17 14:36 ` Christoffer Dall
2017-10-17 14:36 ` Christoffer Dall
2017-10-17 14:52 ` Marc Zyngier
2017-10-17 14:52 ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:08 ` Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-17 11:22 ` Marc Zyngier
2017-10-17 11:22 ` Marc Zyngier
2017-10-17 14:46 ` Christoffer Dall
2017-10-17 14:46 ` Christoffer Dall
2017-10-17 15:04 ` Marc Zyngier
2017-10-17 15:04 ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page Marc Zyngier
2017-10-09 15:20 ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d, i}cache_guest_page Marc Zyngier
2017-10-16 20:08 ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:08 ` Christoffer Dall
2017-10-16 20:08 ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-16 20:06 ` Christoffer Dall
2017-10-16 20:06 ` Christoffer Dall
2017-10-17 12:40 ` Marc Zyngier
2017-10-17 12:40 ` Marc Zyngier
2017-10-17 14:48 ` Christoffer Dall
2017-10-17 14:48 ` Christoffer Dall
2017-10-16 20:59 ` Christoffer Dall [this message]
2017-10-16 20:59 ` [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171016205916.GP1845@lvm \
--to=cdall@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.