From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: marcel.a@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com
Subject: Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes
Date: Thu, 7 Nov 2013 20:54:13 +0200 [thread overview]
Message-ID: <20131107185413.GA4974@redhat.com> (raw)
In-Reply-To: <527BCE04.9020107@redhat.com>
On Thu, Nov 07, 2013 at 06:29:40PM +0100, Paolo Bonzini wrote:
> Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto:
> > That's on kvm with 52 bit address.
> > But where I would be concerned is systems with e.g. 36 bit address
> > space where we are doubling the cost of the lookup.
> > E.g. try i386 and not x86_64.
>
> Tried now...
>
> P_L2_LEVELS pre-patch post-patch
> i386 3 6
> x86_64 4 6
>
> I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG. With
> TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64
> (you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32 targets).
> These can be more or less entirely ascribed to phys_page_find:
>
> TCG | KVM
> pre-patch post-patch | pre-patch post-patch
> phys_page_find(i386) 13% 25% | 0.6% 1%
> inl_from_qemu cycles(i386) 153 173 | ~12000 ~12000
I'm a bit confused by the numbers above. The % of phys_page_find has
grown from 13% to 25% (almost double, which is kind of expected
give we have twice the # of levels). But overhead in # of cycles only went from 153 to
173? Maybe the test is a bit wrong for tcg - how about unrolling the
loop in kvm unit test?
diff --git a/x86/vmexit.c b/x86/vmexit.c
index 957d0cc..405d545 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -40,6 +40,15 @@ static unsigned int inl(unsigned short port)
{
unsigned int val;
asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+ asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
return val;
}
Then you have to divide the reported result by 10.
> phys_page_find(x86_64) 18% 25% | 0.8% 1%
> inl_from_qemu cycles(x86_64) 163 173 | ~12000 ~12000
>
> Thus this patch costs 0.4% in the worst case for KVM, 12% in the worst case
> for TCG. The cycle breakdown is:
>
> 60 phys_page_find
> 28 access_with_adjusted_size
> 24 address_space_translate_internal
> 20 address_space_rw
> 13 io_mem_read
> 11 address_space_translate
> 9 memory_region_read_accessor
> 6 memory_region_access_valid
> 4 helper_inl
> 4 memory_access_size
> 3 cpu_inl
>
> (This run reported 177 cycles per access; the total is 182 due to rounding).
> It is probably possible to shave at least 10 cycles from the functions below,
> or to make the depth of the tree dynamic so that you would save even more
> compared to 1.6.0.
>
> Also, compiling with "-fstack-protector" instead of "-fstack-protector-all",
> as suggested a while ago by rth, is already giving a savings of 20 cycles.
>
Is it true that with TCG this affects more than just MMIO
as phys_page_find will also sometimes run on CPU accesses to memory?
> And of course, if this were a realistic test, KVM's 60x penalty would
> be a severe problem---but it isn't, because this is not a realistic setting.
>
> Paolo
Well, for this argument to carry the day we'd need to design
a realistic test which isn't easy :)
--
MST
next prev parent reply other threads:[~2013-11-07 18:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-07 16:14 [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes Paolo Bonzini
2013-11-07 16:14 ` [Qemu-devel] [PATCH 1/2] split definitions for exec.c and translate-all.c radix trees Paolo Bonzini
2013-11-07 16:14 ` [Qemu-devel] [PATCH 2/2] exec: make address spaces 64-bit wide Paolo Bonzini
2013-11-10 10:31 ` Michael S. Tsirkin
2013-11-11 10:15 ` Paolo Bonzini
2013-11-07 16:21 ` [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes Michael S. Tsirkin
2013-11-07 16:29 ` Paolo Bonzini
2013-11-07 16:47 ` Michael S. Tsirkin
2013-11-07 17:29 ` Paolo Bonzini
2013-11-07 18:54 ` Michael S. Tsirkin [this message]
2013-11-07 19:12 ` Paolo Bonzini
2013-11-11 16:43 ` Michael S. Tsirkin
2013-11-11 16:57 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131107185413.GA4974@redhat.com \
--to=mst@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=marcel.a@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).