Re: [Qemu-devel] Benchmarking linux-user performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"Laurent Vivier" <laurent@vivier.eu>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Benn�e" <alex.bennee@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Benchmarking linux-user performance
Date: Thu, 16 Mar 2017 13:13:05 -0400	[thread overview]
Message-ID: <20170316171305.GA26528@flamenco> (raw)
In-Reply-To: <20170314170656.GO2445@work-vm>

On Tue, Mar 14, 2017 at 17:06:57 +0000, Dr. David Alan Gilbert wrote:
> * Emilio G. Cota (cota@braap.org) wrote:
> > It seems that a good benchmark to take translation overhead into account
> > would be gcc/perlbench from SPEC (see [1]; ~20% of exec time is spent
> > on translation). Unfortunately, none of them can be redistributed.
> > 
> > I'll consider other options. For instance, I looked today at using golang's
> > compilation tests, but they crash under qemu-user. I'll keep looking
> > at other options -- the requirement is to have something that is easy
> > to build (i.e. gcc is not an option) and that it runs fast.
> 
> Yes, needs to be self contained but large enough to be interesting.
> Isn't SPECs perlbench just a variant of a standard free benchmark
> that can be used?
> (Select alternative preferred language).

SPEC takes an old Perl distribution and a few standard Perl benchmarks.
These sources (with SPEC's modifications) are of course redistributable.
However, SPEC also adds scripts that are propietary.

What I've ended up doing is selecting a small subset of the tests in the
Perl distribution with a profile under QEMU similar to that of
SPEC's perlbench (see patch below). This requires building (and testing)
Perl, which takes a few minutes on a modern machine (ouch) but fortunately
it is only done once. After that, the tests themselves take only a
few seconds.

The bummer is that cross-compiling the Perl distro is not officially
supported. But well at least we have now an easy-to-run "compiler-like"
benchmark, if only for the host's ISA.

I updated the README with profile data -- I'm pasting that update below.
Grab the changes from https://github.com/cota/dbt-bench

Here are the numbers for the Perl benchmark, from QEMU v1.7 -> v2.8.
The Y axis is Execution Time in seconds, so lower is better:

                       x86_64 Perl Compilation Performance                     
                 Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz                
                                                                               
   10 +-+---+------+-----+-----+-----+------+-----+----***----+------+---+-+   
      |     +      +     +     +     +      +     +     *     +      +     |   
  9.8 +-+                                              #A                +-+   
      |                                          *** ## *#                 |   
  9.6 +-+                                         *##  ***#              +-+   
  9.4 +-+                                         A        #             +-+   
      |                                          #*         #***           |   
  9.2 +-+                                       #***         #*          +-+   
      |                                        #              A##          |   
    9 +-+  ***          ***         ***        #              *  #       +-+   
      |     A#####***    *    ***    *     ***#              ***  #        |   
  8.8 +-+   *     #*  ###A#####A#####*      *#                     #***  +-+   
  8.6 +-+  ***     A##   *     *     A######A                        *   +-+   
      |           ***   ***   ***    *     ***                       A     |   
  8.4 +-+                            *                               *   +-+   
      |     +      +     +     +    ***     +     +     +     +     ***    |   
  8.2 +-+---+------+-----+-----+-----+------+-----+-----+-----+------+---+-+   
         v1.7.0 v2.0.0v2.1.0v2.2.0v2.3.0 v2.4.0v2.5.0v2.6.0v2.7.0 v2.8.0       
                                  QEMU version 
PNGs for Perl + NBench here: http://imgur.com/a/LlpxE

Thanks,

		Emilio

commit f4ca2537bffe544779aa3f1814cec9d66dd9a17e
Author: Emilio G. Cota <cota@braap.org>
Date:   Thu Mar 16 12:48:44 2017 -0400

    README: document and quantify the difference between NBench and Perl
    
    While at it, also show how Perl's perf is very similar to SPEC06's perlbench.
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>

diff --git a/README.md b/README.md
index b6d4037..b4578d6 100644
--- a/README.md
+++ b/README.md
@@ -61,3 +61,111 @@ Other output formats are possible, see `Makefile`.
   valuable files that were never meant to be committed (e.g. scripts). For
   this reason it is best to just clone a fresh QEMU repo to be used with
   DBT-bench rather than using your development tree.
+
+## What is the difference between the benchmarks?
+
+NBench programs are small, with execution time dominated by small code loops. Thus,
+when run under a DBT engine, the resulting performance depends almost entirely
+on the quality of the output code.
+
+The Perl benchmarks compile Perl code. As is common for compilation workloads,
+they execute large amounts of code and show no particular code execution
+hotspots. Thus, the resulting DBT performance depends largely on code
+translation speed.
+
+Quantitatively, the differences can be clearly seen under a profiler. For QEMU
+v2.8.0, we get:
+
+* NBench:
+
+```
+# Samples: 1M of event 'cycles:pp'
+# Event count (approx.): 1111661663176
+#
+# Overhead  Command       Shared Object        Symbol
+# ........  ............  ...................  .........................................
+#
+     6.26%  qemu-x86_64   qemu-x86_64          [.] float64_mul
+     6.24%  qemu-x86_64   qemu-x86_64          [.] roundAndPackFloat64
+     4.18%  qemu-x86_64   qemu-x86_64          [.] subFloat64Sigs
+     2.72%  qemu-x86_64   qemu-x86_64          [.] addFloat64Sigs
+     2.29%  qemu-x86_64   qemu-x86_64          [.] cpu_exec
+     1.29%  qemu-x86_64   qemu-x86_64          [.] float64_add
+     1.12%  qemu-x86_64   qemu-x86_64          [.] float64_sub
+     0.79%  qemu-x86_64   qemu-x86_64          [.] object_class_dynamic_cast_assert
+     0.71%  qemu-x86_64   qemu-x86_64          [.] helper_mulsd
+     0.66%  qemu-x86_64   perf-23090.map       [.] 0x000055afd37d0b8a
+     0.64%  qemu-x86_64   perf-23090.map       [.] 0x000055afd377cd8f
+     0.59%  qemu-x86_64   perf-23090.map       [.] 0x000055afd37d019a
+     [...]
+```
+
+* Perl:
+
+```
+# Samples: 90K of event 'cycles:pp'
+# Event count (approx.): 97757063053
+#
+# Overhead  Command       Shared Object            Symbol
+# ........  ............  .......................  ...........................................
+#
+   22.93%  qemu-x86_64   [kernel.kallsyms]        [k] isolate_freepages_block
+    9.38%  qemu-x86_64   qemu-x86_64              [.] cpu_exec
+    5.69%  qemu-x86_64   qemu-x86_64              [.] tcg_gen_code
+    5.30%  qemu-x86_64   qemu-x86_64              [.] tcg_optimize
+    3.45%  qemu-x86_64   qemu-x86_64              [.] liveness_pass_1
+    3.24%  qemu-x86_64   [kernel.kallsyms]        [k] isolate_migratepages_block
+    2.39%  qemu-x86_64   qemu-x86_64              [.] object_class_dynamic_cast_assert
+    1.48%  qemu-x86_64   [kernel.kallsyms]        [k] unlock_page
+    1.29%  qemu-x86_64   [kernel.kallsyms]        [k] pageblock_pfn_to_page
+    1.29%  qemu-x86_64   qemu-x86_64              [.] tcg_out_opc.isra.13
+    1.11%  qemu-x86_64   qemu-x86_64              [.] tcg_gen_op2
+    0.98%  qemu-x86_64   [kernel.kallsyms]        [k] migrate_pages
+    0.87%  qemu-x86_64   qemu-x86_64              [.] qht_lookup
+    0.83%  qemu-x86_64   qemu-x86_64              [.] tcg_temp_new_internal
+    0.77%  qemu-x86_64   qemu-x86_64              [.] tcg_out_modrm_sib_offset.constprop.37
+    0.76%  qemu-x86_64   qemu-x86_64              [.] disas_insn.isra.49
+    0.70%  qemu-x86_64   [kernel.kallsyms]        [k] __wake_up_bit
+    0.55%  qemu-x86_64   [kernel.kallsyms]        [k] __reset_isolation_suitable
+    0.47%  qemu-x86_64   qemu-x86_64              [.] tcg_opt_gen_mov
+    [...]
+```
+
+### Why don't you just run SPEC06?
+
+SPEC's source code cannot be redistributed. Some of its benchmarks are based
+on free software, but the SPEC authors added on top of it non-free code
+(usually scripts) that cannot be redistributed.
+
+For this reason we use here benchmarks that are freely redistributable,
+while capturing different performance profiles: NBench represents "hotspot
+code" and Perl represents a typical "compiler" workload. In fact, Perl's
+performance profile under QEMU is very similar to that of SPEC06's perlbench;
+compare Perl's profile above with SPEC06 perlbench's below:
+
+```
+# Samples: 14K of event 'cycles:pp'
+# Event count (approx.): 15657871399
+#
+# Overhead  Command      Shared Object            Symbol
+# ........  ...........  .......................  ...........................................
+#
+   16.93%  qemu-x86_64  qemu-x86_64              [.] cpu_exec
+    9.16%  qemu-x86_64  [kernel.kallsyms]        [k] isolate_freepages_block
+    5.47%  qemu-x86_64  qemu-x86_64              [.] tcg_gen_code
+    4.82%  qemu-x86_64  qemu-x86_64              [.] tcg_optimize
+    4.15%  qemu-x86_64  qemu-x86_64              [.] object_class_dynamic_cast_assert
+    3.25%  qemu-x86_64  qemu-x86_64              [.] liveness_pass_1
+    1.55%  qemu-x86_64  qemu-x86_64              [.] qht_lookup
+    1.23%  qemu-x86_64  qemu-x86_64              [.] tcg_gen_op2
+    1.04%  qemu-x86_64  [kernel.kallsyms]        [k] copy_page
+    1.00%  qemu-x86_64  qemu-x86_64              [.] tcg_out_opc.isra.13
+    0.82%  qemu-x86_64  qemu-x86_64              [.] tcg_temp_new_internal
+    0.78%  qemu-x86_64  qemu-x86_64              [.] tcg_out_modrm_sib_offset.constprop.37
+    0.72%  qemu-x86_64  qemu-x86_64              [.] tb_cmp
+    0.69%  qemu-x86_64  [kernel.kallsyms]        [k] isolate_migratepages_block
+    0.67%  qemu-x86_64  qemu-x86_64              [.] disas_insn.isra.49
+    0.53%  qemu-x86_64  qemu-x86_64              [.] object_get_class
+    0.52%  qemu-x86_64  [kernel.kallsyms]        [k] __wake_up_bit
+    [...]
+```
--
2.7.4

     prev parent reply	other threads:[~2017-03-16 17:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-10  1:23 [Qemu-devel] Benchmarking linux-user performance Emilio G. Cota
2017-03-10 11:45 ` Dr. David Alan Gilbert
2017-03-10 11:48   ` Peter Maydell
2017-03-11  2:25     ` Emilio G. Cota
2017-03-11 15:02       ` Peter Maydell
2017-03-11  2:18   ` Emilio G. Cota
2017-03-14 17:06     ` Dr. David Alan Gilbert
2017-03-16 17:13       ` Emilio G. Cota [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b6d4037 dfblob:b4578d6 )
 OR (
bs:"Re: [Qemu-devel] Benchmarking linux-user performance" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316171305.GA26528@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=dgilbert@redhat.com \
    --cc=laurent@vivier.eu \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.