public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Gleb Natapov <gleb@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>, kvm list <kvm@vger.kernel.org>
Subject: Re: Seeking a KVM benchmark
Date: Mon, 10 Nov 2014 13:15:56 +0100	[thread overview]
Message-ID: <5460AC7C.8040409@redhat.com> (raw)
In-Reply-To: <20141110104531.GB26187@minantech.com>



On 10/11/2014 11:45, Gleb Natapov wrote:
> > I tried making also the other shared MSRs the same between guest and
> > host (STAR, LSTAR, CSTAR, SYSCALL_MASK), so that the user return notifier
> > has nothing to do.  That saves about 4-500 cycles on inl_from_qemu.  I
> > do want to dig out my old Core 2 and see how the new test fares, but it
> > really looks like your patch will be in 3.19.
>
> Please test on wide variety of HW before final decision.

Yes, definitely.

> Also it would
> be nice to ask Intel what is expected overhead. It is awesome if they
> mange to add EFER switching with non measurable overhead, but also hard
> to believe :)

So let's see what happens.  Sneak preview: the result is definitely worth
asking Intel about.

I ran these benchmarks with a stock 3.16.6 KVM.  Instead I patched
kvm-unit-tests to set EFER.SCE in enable_nx.  This makes it much simpler
for others to reproduce the results.  I only ran the inl_from_qemu test.

Perf stat reports that the processor goes from 0.65 to 0.46 
instructions per cycle, which is consistent with the improvement from 
19k to 12k cycles per iteration.

Unpatched KVM-unit-tests:

     3,385,586,563 cycles                    #    3.189 GHz                     [83.25%]
     2,475,979,685 stalled-cycles-frontend   #   73.13% frontend cycles idle    [83.37%]
     2,083,556,270 stalled-cycles-backend    #   61.54% backend  cycles idle    [66.71%]
     1,573,854,041 instructions              #    0.46  insns per cycle        
                                             #    1.57  stalled cycles per insn [83.20%]
       1.108486526 seconds time elapsed


Patched KVM-unit-tests:

     3,252,297,378 cycles                    #    3.147 GHz                     [83.32%]
     2,010,266,184 stalled-cycles-frontend   #   61.81% frontend cycles idle    [83.36%]
     1,560,371,769 stalled-cycles-backend    #   47.98% backend  cycles idle    [66.51%]
     2,133,698,018 instructions              #    0.66  insns per cycle        
                                             #    0.94  stalled cycles per insn [83.45%]
       1.072395697 seconds time elapsed

Playing with other events shows that the unpatched benchmark has an
awful load of TLB misses

Unpatched:

            30,311 iTLB-loads                                                  
       464,641,844 dTLB-loads                                                  
        10,813,839 dTLB-load-misses          #    2.33% of all dTLB cache hits 
        20436,027 iTLB-load-misses          #  67421.16% of all iTLB cache hits 

Patched:

         1,440,033 iTLB-loads                                                  
       640,970,836 dTLB-loads                                                  
         2,345,112 dTLB-load-misses          #    0.37% of all dTLB cache hits 
           270,884 iTLB-load-misses          #   18.81% of all iTLB cache hits 

This is 100% reproducible.  The meaning of the numbers is clearer if you
look up the raw event numbers in the Intel manuals:

- iTLB-loads is 85h/10h aka "perf -e r1085": "Number of cache load STLB [second-level
TLB] hits. No page walk."

- iTLB-load-misses is 85h/01h aka r185: "Misses in all ITLB levels that
cause page walks."

So for example event 85h/04h aka r485 ("Cycle PMH is busy with a walk.") and
friends show that the unpatched KVM wastes about 0.1 seconds more than
the patched KVM on page walks:

Unpatched:

        22,583,440 r449             (cycles on dTLB store miss page walks)
        40,452,018 r408             (cycles on dTLB load miss page walks)
         2,115,981 r485             (cycles on iTLB miss page walks)
------------------------
        65,151,439 total

Patched:

        24,430,676 r449             (cycles on dTLB store miss page walks)
       196,017,693 r408             (cycles on dTLB load miss page walks)
       213,266,243 r485             (cycles on iTLB miss page walks)
-------------------------
       433,714,612 total

These 0.1 seconds probably are all on instructions that would have been
fast, since the slow instructions responsible for the low IPC are the
microcoded instructions including VMX and other privileged stuff.

Similarly, BDh/20h counts STLB flushes, which are 3k in unpatched KVM
and 260k in patched KVM.  Let's see where they come from:

Unpatched:

+  98.97%  qemu-kvm  [kernel.kallsyms]  [k] native_write_msr_safe
+   0.70%  qemu-kvm  [kernel.kallsyms]  [k] page_fault

It's expected that most TLB misses happen just before a page fault (there
are also events to count how many TLB misses do result in a page fault,
if you care about that), and thus are accounted to the first instruction of the
exception handler.

We do not know what causes second-level TLB _flushes_ but it's quite
expected that you'll have a TLB miss after them and possibly a page fault.
And anyway 98.97% of them coming from native_write_msr_safe is totally
anomalous.

A patched benchmark shows no second-level TLB flush occurs after a WRMSR:

+  72.41%  qemu-kvm  [kernel.kallsyms]  [k] page_fault
+   9.07%  qemu-kvm  [kvm_intel]        [k] vmx_flush_tlb
+   6.60%  qemu-kvm  [kernel.kallsyms]  [k] set_pte_vaddr_pud
+   5.68%  qemu-kvm  [kernel.kallsyms]  [k] flush_tlb_mm_range
+   4.87%  qemu-kvm  [kernel.kallsyms]  [k] native_flush_tlb
+   1.36%  qemu-kvm  [kernel.kallsyms]  [k] flush_tlb_page


So basically VMX EFER writes are optimized, while non-VMX EFER writes
cause a TLB flush, at least on a Sandy Bridge.  Ouch!

I'll try to reproduce on the Core 2 Duo soon, and inquire Intel about it.

Paolo

  reply	other threads:[~2014-11-10 12:16 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07  6:27 Seeking a KVM benchmark Andy Lutomirski
2014-11-07  7:17 ` Paolo Bonzini
2014-11-07 17:59   ` Andy Lutomirski
2014-11-07 18:11     ` Andy Lutomirski
2014-11-08 12:01     ` Gleb Natapov
2014-11-08 16:00       ` Andy Lutomirski
2014-11-08 16:44         ` Andy Lutomirski
2014-11-09  8:52           ` Gleb Natapov
2014-11-09 16:36             ` Andy Lutomirski
2014-11-10 10:03               ` Paolo Bonzini
2014-11-10 10:45                 ` Gleb Natapov
2014-11-10 12:15                   ` Paolo Bonzini [this message]
2014-11-10 14:23                     ` Avi Kivity
2014-11-10 17:28                       ` Paolo Bonzini
2014-11-10 17:38                         ` Gleb Natapov
2014-11-12 11:33                           ` Paolo Bonzini
2014-11-12 15:22                             ` Gleb Natapov
2014-11-12 15:26                               ` Paolo Bonzini
2014-11-12 15:32                                 ` Gleb Natapov
2014-11-12 15:51                                   ` Paolo Bonzini
2014-11-12 16:07                                     ` Andy Lutomirski
2014-11-12 17:56                                       ` Paolo Bonzini
2014-11-17 11:17                         ` Wanpeng Li
2014-11-17 11:18                           ` Paolo Bonzini
2014-11-17 12:00                             ` Wanpeng Li
2014-11-17 12:04                               ` Paolo Bonzini
2014-11-17 12:14                                 ` Wanpeng Li
2014-11-17 12:22                                   ` Paolo Bonzini
2014-11-11 11:07                     ` Paolo Bonzini
2014-11-10 19:17                   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5460AC7C.8040409@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=luto@amacapital.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox