All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Andrew Jones <drjones@redhat.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, andre.przywara@arm.com
Subject: Re: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
Date: Wed, 10 Feb 2016 12:24:06 +0000	[thread overview]
Message-ID: <56BB2BE6.8020600@arm.com> (raw)
In-Reply-To: <20160210120211.fjp2sxzivdhxug6p@hawk.localdomain>

On 10/02/16 12:02, Andrew Jones wrote:
> On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
>> On 09/02/16 20:59, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>>>> I've recently been looking at our entry/exit costs, and profiling
>>>> figures did show some very low hanging fruits.
>>>>
>>>> The most obvious cost is that accessing the GIC HW is slow. As in
>>>> "deadly slow", specially when GICv2 is involved. So not hammering the
>>>> HW when there is nothing to write is immediately beneficial, as this
>>>> is the most common cases (whatever people seem to think, interrupts
>>>> are a *rare* event).
>>>>
>>>> Another easy thing to fix is the way we handle trapped system
>>>> registers. We do insist on (mostly) sorting them, but we do perform a
>>>> linear search on trap. We can switch to a binary search for free, and
>>>> get immediate benefits (the PMU code, being extremely trap-happy,
>>>> benefits immediately from this).
>>>>
>>>> With these in place, I see an improvement of 20 to 30% (depending on
>>>> the platform) on our world-switch cycle count when running a set of
>>>> hand-crafted guests that are designed to only perform traps.
>>>
>>> I'm curious about the weight of these two?  My guess based on the
>>> measurement work I did is that the GIC is by far the worst sinner, but
>>> that was exacerbated on X-Gene compared to Seattle.
>>
>> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
>> accessing it when not absolutely required. The sysreg access is only
>> visible for workloads that are extremely trap-happy, but that's what
>> happens with as soon as you start exercising the PMU code.
>>
>>>>
>>>> Methodology:
>>>>
>>>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>>>> and then a power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 16)
>>>> 1:	mov	x0, #0x84000000
>>>> 	hvc	#0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 20)
>>>> 1:	mrs	x0, PMSELR_EL0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * These guests are profiled using perf and kvmtool:
>>>>
>>>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
>>>
>>> these would be good to add to kvm-unit-tests so we can keep an eye on
>>> this sort of thing...
> 
> I can work on that. (Actually I already had put this on my TODO when I
> saw this series. Your interest in it just bumped it up in priority :-)

Ah! You're in charge, then! ;-)

>>
>> Yeah, I was thinking of that too. In the meantime, I've also created a
>> GICv2 self-IPI test case, which has led to further improvement (a 10%
>> reduction in the number of cycles on Seattle). The ugly thing about that
>> test is that it knows where kvmtool places the GIC (I didn't fancy
>> parsing the DT in assembly code). Hopefully there is a way to abstract this.
> 
> I have a simple IPI test written for kvm-unit-tests already[*], but it's
> been laying around for a while. I can dust it off and make a self-IPI
> test out of it yet today though. I've been hesitating to post any gic
> related stuff to kvm-unit-tests, because I know Andre has been looking
> into it (and he has the gic expertise to do it more cleanly than I). I'll
> go ahead and post my little thing now though, as he can always review it
> and/or clean it up later :-)
> 
> [*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

For the record, the test case I've been running is this:

__start:
	mov	x19, #(1 << 20)

	mov	x0, #0x3fff0000		// Dist
	mov	x1, #0x3ffd0000		// CPU
	mov	w2, #1
	str	w2, [x0]		// Enable Group0
	ldr	w2, =0xa0a0a0a0
	str	w2, [x0, 0x400]		// A0 priority for SGI0-3
	mov	w2, #0x0f
	str	w2, [x0, #0x100]	// Enable SGI0-3
	mov	w2, #0xf0
	str	w2, [x1, #4]		// PMR
	mov	w2, #1
	str	w2, [x1]		// Enable CPU interface
	
1:
	mov	w2, #(2 << 24)		// Interrupt self with SGI0
	str	w2, [x0, #0xf00]

2:	ldr	w2, [x1, #0x0c]		// GICC_IAR
	cmp	w2, #0x3ff
	b.ne	3f

	wfi
	b	2b

3:	str	w2, [x1, #0x10]		// EOI

	sub	x19, x19, #1
	cbnz	x19, 1b

// Die
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

Feel free to adapt it so it fits in your framework if you find it useful
(but I guess you'll be inclined to rewrite it in C).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

WARNING: multiple messages have this Message-ID (diff)
From: marc.zyngier@arm.com (Marc Zyngier)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations
Date: Wed, 10 Feb 2016 12:24:06 +0000	[thread overview]
Message-ID: <56BB2BE6.8020600@arm.com> (raw)
In-Reply-To: <20160210120211.fjp2sxzivdhxug6p@hawk.localdomain>

On 10/02/16 12:02, Andrew Jones wrote:
> On Wed, Feb 10, 2016 at 08:34:21AM +0000, Marc Zyngier wrote:
>> On 09/02/16 20:59, Christoffer Dall wrote:
>>> On Mon, Feb 08, 2016 at 11:40:14AM +0000, Marc Zyngier wrote:
>>>> I've recently been looking at our entry/exit costs, and profiling
>>>> figures did show some very low hanging fruits.
>>>>
>>>> The most obvious cost is that accessing the GIC HW is slow. As in
>>>> "deadly slow", specially when GICv2 is involved. So not hammering the
>>>> HW when there is nothing to write is immediately beneficial, as this
>>>> is the most common cases (whatever people seem to think, interrupts
>>>> are a *rare* event).
>>>>
>>>> Another easy thing to fix is the way we handle trapped system
>>>> registers. We do insist on (mostly) sorting them, but we do perform a
>>>> linear search on trap. We can switch to a binary search for free, and
>>>> get immediate benefits (the PMU code, being extremely trap-happy,
>>>> benefits immediately from this).
>>>>
>>>> With these in place, I see an improvement of 20 to 30% (depending on
>>>> the platform) on our world-switch cycle count when running a set of
>>>> hand-crafted guests that are designed to only perform traps.
>>>
>>> I'm curious about the weight of these two?  My guess based on the
>>> measurement work I did is that the GIC is by far the worst sinner, but
>>> that was exacerbated on X-Gene compared to Seattle.
>>
>> Indeed, the GIC is the real pig. 80% of the benefit is provided by not
>> accessing it when not absolutely required. The sysreg access is only
>> visible for workloads that are extremely trap-happy, but that's what
>> happens with as soon as you start exercising the PMU code.
>>
>>>>
>>>> Methodology:
>>>>
>>>> * NULL-hypercall guest: Perform 65536 PSCI_0_2_FN_PSCI_VERSION calls,
>>>> and then a power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 16)
>>>> 1:	mov	x0, #0x84000000
>>>> 	hvc	#0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off:
>>>>
>>>> __start:
>>>> 	mov	x19, #(1 << 20)
>>>> 1:	mrs	x0, PMSELR_EL0
>>>> 	sub	x19, x19, #1
>>>> 	cbnz	x19, 1b
>>>> 	mov	x0, #0x84000000
>>>> 	add	x0, x0, #9
>>>> 	hvc	#0
>>>> 	b	.
>>>>
>>>> * These guests are profiled using perf and kvmtool:
>>>>
>>>> taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles
>>>
>>> these would be good to add to kvm-unit-tests so we can keep an eye on
>>> this sort of thing...
> 
> I can work on that. (Actually I already had put this on my TODO when I
> saw this series. Your interest in it just bumped it up in priority :-)

Ah! You're in charge, then! ;-)

>>
>> Yeah, I was thinking of that too. In the meantime, I've also created a
>> GICv2 self-IPI test case, which has led to further improvement (a 10%
>> reduction in the number of cycles on Seattle). The ugly thing about that
>> test is that it knows where kvmtool places the GIC (I didn't fancy
>> parsing the DT in assembly code). Hopefully there is a way to abstract this.
> 
> I have a simple IPI test written for kvm-unit-tests already[*], but it's
> been laying around for a while. I can dust it off and make a self-IPI
> test out of it yet today though. I've been hesitating to post any gic
> related stuff to kvm-unit-tests, because I know Andre has been looking
> into it (and he has the gic expertise to do it more cleanly than I). I'll
> go ahead and post my little thing now though, as he can always review it
> and/or clean it up later :-)
> 
> [*] https://github.com/rhdrjones/kvm-unit-tests/commit/05af9b0361ac5eab58f46e5451e585c9625c3b75

For the record, the test case I've been running is this:

__start:
	mov	x19, #(1 << 20)

	mov	x0, #0x3fff0000		// Dist
	mov	x1, #0x3ffd0000		// CPU
	mov	w2, #1
	str	w2, [x0]		// Enable Group0
	ldr	w2, =0xa0a0a0a0
	str	w2, [x0, 0x400]		// A0 priority for SGI0-3
	mov	w2, #0x0f
	str	w2, [x0, #0x100]	// Enable SGI0-3
	mov	w2, #0xf0
	str	w2, [x1, #4]		// PMR
	mov	w2, #1
	str	w2, [x1]		// Enable CPU interface
	
1:
	mov	w2, #(2 << 24)		// Interrupt self with SGI0
	str	w2, [x0, #0xf00]

2:	ldr	w2, [x1, #0x0c]		// GICC_IAR
	cmp	w2, #0x3ff
	b.ne	3f

	wfi
	b	2b

3:	str	w2, [x1, #0x10]		// EOI

	sub	x19, x19, #1
	cbnz	x19, 1b

// Die
	mov	x0, #0x84000000
	add	x0, x0, #9
	hvc	#0
	b	.

Feel free to adapt it so it fits in your framework if you find it useful
(but I guess you'll be inclined to rewrite it in C).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2016-02-10 12:24 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-08 11:40 [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Marc Zyngier
2016-02-08 11:40 ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 1/8] arm64: KVM: Switch the sys_reg search to be a binary search Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-10 13:49   ` Alex Bennée
2016-02-10 13:49     ` Alex Bennée
2016-02-10 14:00     ` Marc Zyngier
2016-02-10 14:00       ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 2/8] ARM: KVM: Properly sort the invariant table Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 3/8] ARM: KVM: Enforce sorting of all CP tables Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 4/8] ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 5/8] ARM: KVM: Switch the CP reg search to be a binary search Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 6/8] KVM: arm/arm64: timer: Add active state caching Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:44   ` Christoffer Dall
2016-02-10 12:44     ` Christoffer Dall
2016-02-08 11:40 ` [PATCH 7/8] KVM: arm/arm64: Avoid accessing GICH registers Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:45   ` Christoffer Dall
2016-02-10 12:45     ` Christoffer Dall
2016-02-10 13:34     ` Marc Zyngier
2016-02-10 13:34       ` Marc Zyngier
2016-02-10 17:30       ` Christoffer Dall
2016-02-10 17:30         ` Christoffer Dall
2016-02-10 17:43         ` Marc Zyngier
2016-02-10 17:43           ` Marc Zyngier
2016-02-08 11:40 ` [PATCH 8/8] KVM: arm64: Avoid accessing ICH registers Marc Zyngier
2016-02-08 11:40   ` Marc Zyngier
2016-02-10 12:45   ` Christoffer Dall
2016-02-10 12:45     ` Christoffer Dall
2016-02-10 16:47     ` Marc Zyngier
2016-02-10 16:47       ` Marc Zyngier
2016-02-09 20:59 ` [PATCH 0/8] KVM/ARM: Guest Entry/Exit optimizations Christoffer Dall
2016-02-09 20:59   ` Christoffer Dall
2016-02-10  8:34   ` Marc Zyngier
2016-02-10  8:34     ` Marc Zyngier
2016-02-10 12:02     ` Andrew Jones
2016-02-10 12:02       ` Andrew Jones
2016-02-10 12:24       ` Marc Zyngier [this message]
2016-02-10 12:24         ` Marc Zyngier
2016-02-10 20:40 ` Christoffer Dall
2016-02-10 20:40   ` Christoffer Dall
2016-02-16 20:05   ` Marc Zyngier
2016-02-16 20:05     ` Marc Zyngier
2016-02-17  9:15     ` Christoffer Dall
2016-02-17  9:15       ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56BB2BE6.8020600@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=drjones@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.