Re: [Question] PARSEC benchmark has smaller execution time in VM than in native?

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
To: Meng Xu <xumengpanda@gmail.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Hyon-Young Choi <commani@gmail.com>
Subject: Re: [Question] PARSEC benchmark has smaller execution time in VM than in native?
Date: Tue, 1 Mar 2016 13:20:17 -0500	[thread overview]
Message-ID: <20160301182017.GA9344@elena.ufimtseva> (raw)
In-Reply-To: <CAENZ-+nrFpm3rzsmV-OOsSDga9QEHmE5XkuSeUHM0s2vLPH1cQ@mail.gmail.com>

On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >> > Hey!
> >> >
> >> > CC-ing Elena.
> >>
> >> I think you forgot you cc.ed her..
> >> Anyway, let's cc. her now... :-)
> >>
> >> >
> >> >> We are measuring the execution time between native machine environment
> >> >> and xen virtualization environment using PARSEC Benchmark [1].
> >> >>
> >> >> In virtualiztion environment, we run a domU with three VCPUs, each of
> >> >> them pinned to a core; we pin the dom0 to another core that is not
> >> >> used by the domU.
> >> >>
> >> >> Inside the Linux in domU in virtualization environment and in native
> >> >> environment,  We used the cpuset to isolate a core (or VCPU) for the
> >> >> system processors and to isolate a core for the benchmark processes.
> >> >> We also configured the Linux boot command line with isocpus= option to
> >> >> isolate the core for benchmark from other unnecessary processes.
> >> >
> >> > You may want to just offline them and also boot the machine with NUMA
> >> > disabled.
> >>
> >> Right, the machine is booted up with NUMA disabled.
> >> We will offline the unnecessary cores then.
> >>
> >> >
> >> >>
> >> >> We expect that execution time of benchmarks in xen virtualization
> >> >> environment is larger than the execution time in native machine
> >> >> environment. However, the evaluation gave us an opposite result.
> >> >>
> >> >> Below is the evaluation data for the canneal and streamcluster benchmarks:
> >> >>
> >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> >> >> Native: 6.387s
> >> >> Virtualization: 5.890s
> >> >>
> >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> >> >> Native: 5.276s
> >> >> Virtualization: 5.240s
> >> >>
> >> >> Is there anything wrong with our evaluation that lead to the abnormal
> >> >> performance results?
> >> >
> >> > Nothing is wrong. Virtualization is naturally faster than baremetal!
> >> >
> >> > :-)
> >> >
> >> > No clue sadly.
> >>
> >> Ah-ha. This is really surprising to me.... Why will it speed up the
> >> system by adding one more layer? Unless the virtualization disabled
> >> some services that occur in native and interfere with the benchmark.
> >>
> >> If virtualization is faster than baremetal by nature, why we can see
> >> that some experiment shows that virtualization introduces overhead?
> >
> > Elena told me that there were some weird regression in Linux 4.1 - where
> > CPU burning workloads were _slower_ on baremetal than as guests.
> 
> Hi Elena,
> Would you mind sharing with us some of your experience of how you
> found the real reason? Did you use some tool or some methodology to
> pin down the reason (i.e,  CPU burning workloads in native is _slower_
> on baremetal than as guests)?
>

Hi Meng

Yes, sure!

While working on performance tests for smt-exposing patches from Joao
I run CPU bound workload in HVM guest and using same kernel in baremetal
run same test.
While testing cpu-bound workload on baremetal linux (4.1.0-rc2)
I found that the time to complete the same test is few times more that
as it takes for the same under HVM guest.
I have tried tests where kernel threads pinned to cores and without pinning.
The execution times are most of the times take as twice longer, sometimes 4
times longer that HVM case.

Interesting is not only that it takes sometimes 3-4 times more
than HVM guest, but also that test with bound threads (to cores) takes almost
3 times longer
to execute than running same cpu-bound test under HVM (in all
configurations).

I run each test 5 times and here are the execution times (seconds):

-------------------------------------------------
        baremetal           |
thread_bind | thread unbind | HVM pinned to cores
----------- |---------------|---------------------
     74     |     83        |        28
     74     |     88        |        28
     74     |     38        |        28
     74     |     73        |        28
     74     |     87        |        28

Sometimes better times were on unbinded tests, but not often enough
to present it here. Some results are much worse and reach up to 120
seconds.

Each test has 8 kernel threads. In baremetal case I tried the following:
- numa off,on;
- all cpus are on;
- isolate cpus from first node;
- set intel_idle.max_cstate=1;
- disable intel_pstate;

I dont think I have exhausted all the options here, but it looked like
two last changes did improve performance, but was still not comparable to
HVM case.
I am trying to find where regression had happened. Performance on newer
kernel (I tried 4.5.0-rc4+) was close or better than HVM.

I am trying to find f there were some relevant regressions to understand
the reason of this.


What kernel you guys use?

Elena

See more description of the tests here:
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg02874.html
Joao patches are here:
http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg03115.html)




> 
> 
> >
> > Updating to a later kernel fixed that  -where one could see that
> > baremetal was faster (or on par) with the guest.
> 
> Thank you very much, Konrad! We are giving it a shot. :-D
> 
> Best Regards,
> 
> Meng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-03-01 18:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-26  5:02 [Question] PARSEC benchmark has smaller execution time in VM than in native? Meng Xu
2016-02-29 16:06 ` Konrad Rzeszutek Wilk
2016-02-29 17:29   ` Meng Xu
2016-02-29 17:59     ` Konrad Rzeszutek Wilk
2016-03-01 13:48       ` Meng Xu
2016-03-01 18:20         ` Elena Ufimtseva [this message]
2016-03-01 19:52           ` Meng Xu
2016-03-01 20:39             ` Elena Ufimtseva
2016-03-01 21:51               ` Sander Eikelenboom
2016-03-01 22:06                 ` Elena Ufimtseva
2016-03-01 22:12                 ` Dario Faggioli
2016-03-02 19:44                 ` Meng Xu
2016-03-02 19:41               ` Meng Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160301182017.GA9344@elena.ufimtseva \
    --to=elena.ufimtseva@oracle.com \
    --cc=commani@gmail.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xumengpanda@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).