Re: [Qemu-devel] Benchmarking linux-user performance

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"Laurent Vivier" <laurent@vivier.eu>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Benn�e" <alex.bennee@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Benchmarking linux-user performance
Date: Thu, 16 Mar 2017 13:13:05 -0400	[thread overview]
Message-ID: <20170316171305.GA26528@flamenco> (raw)
In-Reply-To: <20170314170656.GO2445@work-vm>

On Tue, Mar 14, 2017 at 17:06:57 +0000, Dr. David Alan Gilbert wrote:
> * Emilio G. Cota (cota@braap.org) wrote:
> > It seems that a good benchmark to take translation overhead into account
> > would be gcc/perlbench from SPEC (see [1]; ~20% of exec time is spent
> > on translation). Unfortunately, none of them can be redistributed.
> > 
> > I'll consider other options. For instance, I looked today at using golang's
> > compilation tests, but they crash under qemu-user. I'll keep looking
> > at other options -- the requirement is to have something that is easy
> > to build (i.e. gcc is not an option) and that it runs fast.
> 
> Yes, needs to be self contained but large enough to be interesting.
> Isn't SPECs perlbench just a variant of a standard free benchmark
> that can be used?
> (Select alternative preferred language).

SPEC takes an old Perl distribution and a few standard Perl benchmarks.
These sources (with SPEC's modifications) are of course redistributable.
However, SPEC also adds scripts that are propietary.

What I've ended up doing is selecting a small subset of the tests in the
Perl distribution with a profile under QEMU similar to that of
SPEC's perlbench (see patch below). This requires building (and testing)
Perl, which takes a few minutes on a modern machine (ouch) but fortunately
it is only done once. After that, the tests themselves take only a
few seconds.

The bummer is that cross-compiling the Perl distro is not officially
supported. But well at least we have now an easy-to-run "compiler-like"
benchmark, if only for the host's ISA.

I updated the README with profile data -- I'm pasting that update below.
Grab the changes from https://github.com/cota/dbt-bench

Here are the numbers for the Perl benchmark, from QEMU v1.7 -> v2.8.
The Y axis is Execution Time in seconds, so lower is better:

                       x86_64 Perl Compilation Performance                     
                 Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz                
                                                                               
   10 +-+---+------+-----+-----+-----+------+-----+----***----+------+---+-+   
      |     +      +     +     +     +      +     +     *     +      +     |   
  9.8 +-+                                              #A                +-+   
      |                                          *** ## *#                 |   
  9.6 +-+                                         *##  ***#              +-+   
  9.4 +-+                                         A        #             +-+   
      |                                          #*         #***           |   
  9.2 +-+                                       #***         #*          +-+   
      |                                        #              A##          |   
    9 +-+  ***          ***         ***        #              *  #       +-+   
      |     A#####***    *    ***    *     ***#              ***  #        |   
  8.8 +-+   *     #*  ###A#####A#####*      *#                     #***  +-+   
  8.6 +-+  ***     A##   *     *     A######A                        *   +-+   
      |           ***   ***   ***    *     ***                       A     |   
  8.4 +-+                            *                               *   +-+   
      |     +      +     +     +    ***     +     +     +     +     ***    |   
  8.2 +-+---+------+-----+-----+-----+------+-----+-----+-----+------+---+-+   
         v1.7.0 v2.0.0v2.1.0v2.2.0v2.3.0 v2.4.0v2.5.0v2.6.0v2.7.0 v2.8.0       
                                  QEMU version 
PNGs for Perl + NBench here: http://imgur.com/a/LlpxE

Thanks,

		Emilio

commit f4ca2537bffe544779aa3f1814cec9d66dd9a17e
Author: Emilio G. Cota <cota@braap.org>
Date:   Thu Mar 16 12:48:44 2017 -0400

    README: document and quantify the difference between NBench and Perl
    
    While at it, also show how Perl's perf is very similar to SPEC06's perlbench.
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>

diff --git a/README.md b/README.md
index b6d4037..b4578d6 100644
--- a/README.md
+++ b/README.md
@@ -61,3 +61,111 @@ Other output formats are possible, see `Makefile`.
   valuable files that were never meant to be committed (e.g. scripts). For
   this reason it is best to just clone a fresh QEMU repo to be used with
   DBT-bench rather than using your development tree.
+
+## What is the difference between the benchmarks?
+
+NBench programs are small, with execution time dominated by small code loops. Thus,
+when run under a DBT engine, the resulting performance depends almost entirely
+on the quality of the output code.
+
+The Perl benchmarks compile Perl code. As is common for compilation workloads,
+they execute large amounts of code and show no particular code execution
+hotspots. Thus, the resulting DBT performance depends largely on code
+translation speed.
+
+Quantitatively, the differences can be clearly seen under a profiler. For QEMU
+v2.8.0, we get:
+
+* NBench:
+
+```
+# Samples: 1M of event 'cycles:pp'
+# Event count (approx.): 1111661663176
+#
+# Overhead  Command       Shared Object        Symbol
+# ........  ............  ...................  .........................................
+#
+     6.26%  qemu-x86_64   qemu-x86_64          [.] float64_mul
+     6.24%  qemu-x86_64   qemu-x86_64          [.] roundAndPackFloat64
+     4.18%  qemu-x86_64   qemu-x86_64          [.] subFloat64Sigs
+     2.72%  qemu-x86_64   qemu-x86_64          [.] addFloat64Sigs
+     2.29%  qemu-x86_64   qemu-x86_64          [.] cpu_exec
+     1.29%  qemu-x86_64   qemu-x86_64          [.] float64_add
+     1.12%  qemu-x86_64   qemu-x86_64          [.] float64_sub
+     0.79%  qemu-x86_64   qemu-x86_64          [.] object_class_dynamic_cast_assert
+     0.71%  qemu-x86_64   qemu-x86_64          [.] helper_mulsd
+     0.66%  qemu-x86_64   perf-23090.map       [.] 0x000055afd37d0b8a
+     0.64%  qemu-x86_64   perf-23090.map       [.] 0x000055afd377cd8f
+     0.59%  qemu-x86_64   perf-23090.map       [.] 0x000055afd37d019a
+     [...]
+```
+
+* Perl:
+
+```
+# Samples: 90K of event 'cycles:pp'
+# Event count (approx.): 97757063053
+#
+# Overhead  Command       Shared Object            Symbol
+# ........  ............  .......................  ...........................................
+#
+   22.93%  qemu-x86_64   [kernel.kallsyms]        [k] isolate_freepages_block
+    9.38%  qemu-x86_64   qemu-x86_64              [.] cpu_exec
+    5.69%  qemu-x86_64   qemu-x86_64              [.] tcg_gen_code
+    5.30%  qemu-x86_64   qemu-x86_64              [.] tcg_optimize
+    3.45%  qemu-x86_64   qemu-x86_64              [.] liveness_pass_1
+    3.24%  qemu-x86_64   [kernel.kallsyms]        [k] isolate_migratepages_block
+    2.39%  qemu-x86_64   qemu-x86_64              [.] object_class_dynamic_cast_assert
+    1.48%  qemu-x86_64   [kernel.kallsyms]        [k] unlock_page
+    1.29%  qemu-x86_64   [kernel.kallsyms]        [k] pageblock_pfn_to_page
+    1.29%  qemu-x86_64   qemu-x86_64              [.] tcg_out_opc.isra.13
+    1.11%  qemu-x86_64   qemu-x86_64              [.] tcg_gen_op2
+    0.98%  qemu-x86_64   [kernel.kallsyms]        [k] migrate_pages
+    0.87%  qemu-x86_64   qemu-x86_64              [.] qht_lookup
+    0.83%  qemu-x86_64   qemu-x86_64              [.] tcg_temp_new_internal
+    0.77%  qemu-x86_64   qemu-x86_64              [.] tcg_out_modrm_sib_offset.constprop.37
+    0.76%  qemu-x86_64   qemu-x86_64              [.] disas_insn.isra.49
+    0.70%  qemu-x86_64   [kernel.kallsyms]        [k] __wake_up_bit
+    0.55%  qemu-x86_64   [kernel.kallsyms]        [k] __reset_isolation_suitable
+    0.47%  qemu-x86_64   qemu-x86_64              [.] tcg_opt_gen_mov
+    [...]
+```
+
+### Why don't you just run SPEC06?
+
+SPEC's source code cannot be redistributed. Some of its benchmarks are based
+on free software, but the SPEC authors added on top of it non-free code
+(usually scripts) that cannot be redistributed.
+
+For this reason we use here benchmarks that are freely redistributable,
+while capturing different performance profiles: NBench represents "hotspot
+code" and Perl represents a typical "compiler" workload. In fact, Perl's
+performance profile under QEMU is very similar to that of SPEC06's perlbench;
+compare Perl's profile above with SPEC06 perlbench's below:
+
+```
+# Samples: 14K of event 'cycles:pp'
+# Event count (approx.): 15657871399
+#
+# Overhead  Command      Shared Object            Symbol
+# ........  ...........  .......................  ...........................................
+#
+   16.93%  qemu-x86_64  qemu-x86_64              [.] cpu_exec
+    9.16%  qemu-x86_64  [kernel.kallsyms]        [k] isolate_freepages_block
+    5.47%  qemu-x86_64  qemu-x86_64              [.] tcg_gen_code
+    4.82%  qemu-x86_64  qemu-x86_64              [.] tcg_optimize
+    4.15%  qemu-x86_64  qemu-x86_64              [.] object_class_dynamic_cast_assert
+    3.25%  qemu-x86_64  qemu-x86_64              [.] liveness_pass_1
+    1.55%  qemu-x86_64  qemu-x86_64              [.] qht_lookup
+    1.23%  qemu-x86_64  qemu-x86_64              [.] tcg_gen_op2
+    1.04%  qemu-x86_64  [kernel.kallsyms]        [k] copy_page
+    1.00%  qemu-x86_64  qemu-x86_64              [.] tcg_out_opc.isra.13
+    0.82%  qemu-x86_64  qemu-x86_64              [.] tcg_temp_new_internal
+    0.78%  qemu-x86_64  qemu-x86_64              [.] tcg_out_modrm_sib_offset.constprop.37
+    0.72%  qemu-x86_64  qemu-x86_64              [.] tb_cmp
+    0.69%  qemu-x86_64  [kernel.kallsyms]        [k] isolate_migratepages_block
+    0.67%  qemu-x86_64  qemu-x86_64              [.] disas_insn.isra.49
+    0.53%  qemu-x86_64  qemu-x86_64              [.] object_get_class
+    0.52%  qemu-x86_64  [kernel.kallsyms]        [k] __wake_up_bit
+    [...]
+```
--
2.7.4

     prev parent reply	other threads:[~2017-03-16 17:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-10  1:23 [Qemu-devel] Benchmarking linux-user performance Emilio G. Cota
2017-03-10 11:45 ` Dr. David Alan Gilbert
2017-03-10 11:48   ` Peter Maydell
2017-03-11  2:25     ` Emilio G. Cota
2017-03-11 15:02       ` Peter Maydell
2017-03-11  2:18   ` Emilio G. Cota
2017-03-14 17:06     ` Dr. David Alan Gilbert
2017-03-16 17:13       ` Emilio G. Cota [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b6d4037 dfblob:b4578d6 )
 OR (
bs:"Re: [Qemu-devel] Benchmarking linux-user performance" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316171305.GA26528@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=dgilbert@redhat.com \
    --cc=laurent@vivier.eu \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).