qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"Laurent Vivier" <laurent@vivier.eu>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Benn�e" <alex.bennee@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Benchmarking linux-user performance
Date: Fri, 10 Mar 2017 21:18:51 -0500	[thread overview]
Message-ID: <20170311021851.GA26530@flamenco> (raw)
In-Reply-To: <20170310114531.GB2480@work-vm>

On Fri, Mar 10, 2017 at 11:45:33 +0000, Dr. David Alan Gilbert wrote:
> * Emilio G. Cota (cota@braap.org) wrote:
> >   https://github.com/cota/dbt-bench
> > I'm using NBench because (1) it's just a few files and they take
> > very little time to run (~5min per QEMU version, if performance
> > on the host machine is stable), (2) AFAICT its sources are in the
> > public domain (whereas SPEC's sources cannot be redistributed),
> > and (3) with NBench I get results similar to SPEC's.
> 
> Does NBench include anything with lots of small processes, or a large
> chunk of code.  Using benchmarks with small code tends to skew DBT optimisations
> towards very heavy block optimisation that dont work in real applications where
> the cost of translation can hurt if it's too high.

Yes this is a valid point.

I haven't looked at the NBench code in detail, but I'd expect all programs
in the suite to be small and have hotspots (this is consistent with
the fact that performance doesn't change even if the TB hash table
isn't used, i.e. the loops are small enough to remain in tb_jmp_cache.)
IOW, we'd be mostly measuring the quality of the translated code,
not the translation overhead.

It seems that a good benchmark to take translation overhead into account
would be gcc/perlbench from SPEC (see [1]; ~20% of exec time is spent
on translation). Unfortunately, none of them can be redistributed.

I'll consider other options. For instance, I looked today at using golang's
compilation tests, but they crash under qemu-user. I'll keep looking
at other options -- the requirement is to have something that is easy
to build (i.e. gcc is not an option) and that it runs fast.

A hack that one can do to measure code translation as opposed to execution
is to disable caching with a 2-liner to avoid insertions to the TB hash
table and tb_jmp_cache. The problem is that then we basically just
measure code translation performance, which isn't really realistic
either.

In any case, note that most efforts I've seen to compile very good code
(with QEMU or other cross-ISA DBT), do some sort of profiling so that
only hot blocks are optimized -- see for example [1] and [2].

[1] "Characterization of Dynamic Binary Translation Overhead".
    Edson Borin and Youfeng Wu. IISWC 2009.
    http://amas-bt.cs.virginia.edu/2008proceedings/AmasBT2008.pdf#page=4

[2] "HQEMU: a multi-threaded and retargetable dynamic binary translator
    on multicores".
    Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu
    Pangfeng Liu, Chien-Min Wang and Yeh-Ching Chung. CGO 2012.
    http://www.iis.sinica.edu.tw/papers/dyhong/18239-F.pdf


> > Here are linux-user performance numbers from v1.0 to v2.8 (higher
> > is better):
> > 
> >                         x86_64 NBench Integer Performance
> >                  Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz                
> >                                                                                
> >   36 +-+-+---+---+---+--+---+---+---+---+---+---+---+---+--+---+---+---+-+-+   
> >      |   +   +   +   +  +   +   +   +   +   +   +   +   +  +   +   +  ***  |   
> >   34 +-+                                                             #*A*+-+   
> >      |                                                            *A*      |   
> >   32 +-+                                                          #      +-+   
> >   30 +-+                                                          #      +-+   
> >      |                                                           #         |   
> >   28 +-+                                                        #        +-+   
> >      |                                 *A*#*A*#*A*#*A*#*A*#     #          |   
> >   26 +-+                   *A*#*A*#***#    ***         ******#*A*        +-+   
> >      |                     #       *A*                    *A* ***          |   
> >   24 +-+                  #                                              +-+   
> >   22 +-+                 #                                               +-+   
> >      |             #*A**A*                                                 |   
> >   20 +-+       #*A*                                                      +-+   
> >      |  *A*#*A*  +   +  +   +   +   +   +   +   +   +   +  +   +   +   +   |   
> >   18 +-+-+---+---+---+--+---+---+---+---+---+---+---+---+--+---+---+---+-+-+   
> >        v1.v1.1v1.2v1.v1.4v1.5v1.6v1.7v2.0v2.1v2.2v2.3v2.v2.5v2.6v2.7v2.8.0     
> >                                   QEMU version                                 
> 
> Nice, there was someone on list complaining about 2.6 being slower for them.
> 
> >                      x86_64 NBench Floating Point Performance                  
> >                   Host: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz               
> >                                                                                
> >   1.88 +-+-+---+--+---+---+---+--+---+---+---+---+--+---+---+---+--+---+-+-+   
> >        |   +   +  +  *A*#*A*  +  +   +   +   +   +  +   +   +   +  +   +   |   
> >   1.86 +-+           *** ***                                             +-+   
> >        |            #       #   *A*#***                                    |   
> >        |      *A*# #         # ##   *A*                                    |   
> >   1.84 +-+    #  *A*         *A*      #                                  +-+   
> >        |      #                        #                              *A*  |   
> >   1.82 +-+   #                          #                            ##  +-+   
> >        |     #                          *A*#                        #      |   
> >    1.8 +-+  #                               #  #*A*               *A*    +-+   
> >        |    #                               *A*   #                #       |   
> >   1.78 +-+*A*                                      #       *A*    #      +-+   
> >        |                                           #   ***#  #    #        |   
> >        |                                           *A*#*A*    #  #         |   
> >   1.76 +-+                                         ***         # #       +-+   
> >        |   +   +  +   +   +   +  +   +   +   +   +  +   +   +  *A* +   +   |   
> >   1.74 +-+-+---+--+---+---+---+--+---+---+---+---+--+---+---+---+--+---+-+-+   
> >          v1.v1.v1.2v1.3v1.4v1.v1.6v1.7v2.0v2.1v2.v2.3v2.4v2.5v2.v2.7v2.8.0     
> >                                    QEMU version                                
> 
> I'm assuming the dips are where QEMU fixed something and cared about corner
> cases/accuracy?

It'd be hard to say why the numbers vary across versions without running
a profiler and git bisect. I only know the reason for v2.7, where most if not all
of the improvement is due to the removal of tb_lock() when executing
code in qemu-user thanks to the QHT work.

		E.

  parent reply	other threads:[~2017-03-11  2:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-10  1:23 [Qemu-devel] Benchmarking linux-user performance Emilio G. Cota
2017-03-10 11:45 ` Dr. David Alan Gilbert
2017-03-10 11:48   ` Peter Maydell
2017-03-11  2:25     ` Emilio G. Cota
2017-03-11 15:02       ` Peter Maydell
2017-03-11  2:18   ` Emilio G. Cota [this message]
2017-03-14 17:06     ` Dr. David Alan Gilbert
2017-03-16 17:13       ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170311021851.GA26530@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=dgilbert@redhat.com \
    --cc=laurent@vivier.eu \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).