From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57906)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1bXtgQ-0007Mv-Am
	for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1bXtgN-0001MN-Rr
	for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:17 -0400
Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:37561)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1bXtgN-0001MJ-HS
	for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:15 -0400
Received: by mail-wm0-x229.google.com with SMTP id i5so4941164wmg.0
	for <qemu-devel@nongnu.org>; Thu, 11 Aug 2016 10:22:15 -0700 (PDT)
References: <1470929064-4092-1-git-send-email-alex.bennee@linaro.org>
From: Alex =?utf-8?Q?Benn=C3=A9e?= <alex.bennee@linaro.org>
In-reply-to: <1470929064-4092-1-git-send-email-alex.bennee@linaro.org>
Date: Thu, 11 Aug 2016 18:22:23 +0100
Message-ID: <87twerb4q8.fsf@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org, fred.konrad@greensocs.com, a.rigo@virtualopensystems.com, cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com
Cc: mark.burton@greensocs.com, pbonzini@redhat.com, jan.kiszka@siemens.com, serge.fdrv@gmail.com, rth@twiddle.net, peter.maydell@linaro.org, claudio.fontana@huawei.com


Alex Bennée <alex.bennee@linaro.org> writes:

> This is the fourth iteration of the RFC patch set which aims to
> provide the basic framework for MTTCG. I hope this will provide a good
> base for discussion at KVM Forum later this month.
>
<snip>
>
> In practice the memory barrier problems don't show up with an x86
> host. In fact I have created a tree which merges in the Emilio's
> cmpxchg atomics which happily boots ARMv7 Debian systems without any
> additional changes. You can find that at:
>
>   https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>
<snip>
> Performance
> ===========
>
> You can't do full work-load testing on this tree due to the lack of
> atomic support (but I will run some numbers on
> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).

So here is a more real world work load run:

  retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
  run 1: ret=0 (PASS), time=261.794911 (1/1)
  run 2: ret=0 (PASS), time=257.290045 (2/2)
  run 3: ret=0 (PASS), time=256.536991 (3/3)
  run 4: ret=0 (PASS), time=254.036260 (4/4)
  run 5: ret=0 (PASS), time=256.539165 (5/5)
  Results summary:
  0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
  Ran command 5 times, 5 passes

  retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
  run 1: ret=0 (PASS), time=86.597459 (1/1)
  run 2: ret=0 (PASS), time=82.843904 (2/2)
  run 3: ret=0 (PASS), time=84.095910 (3/3)
  run 4: ret=0 (PASS), time=83.844595 (4/4)
  run 5: ret=0 (PASS), time=83.594768 (5/5)
  Results summary:
  0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
  Ran command 5 times, 5 passes

This shows a 30% overhead over the ideal for running multi-threaded but
still seeing a decent improvement in wall time.

So the test itself is booting the system, running the
benchmark-build.service:

  # A benchmark target
  #
  # This shutsdown once the boot has completed

  [Unit]
  Description=Default
  Requires=basic.target
  After=basic.target
  AllowIsolate=yes

  [Service]
  Type=oneshot
  ExecStart=/root/mysrc/testcases.git/build-dir.sh
  /root/src/stress-ng.git/
  ExecStartPost=/sbin/poweroff

  [Install]
  WantedBy=multi-user.target

And the build-dir script is a simple:

    #!/bin/sh
    #
    NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
    set -e
    cd $1
    make clean
    make -j${NR_CPUS}
    cd -

Measuring this over increasing -smp

| -smp |    time | time as bar  | theoretical | % of -smp 1 |
|------+---------+--------------+-------------+-------------|
|    1 | 238.184 | WWWWWWWWWWWW |     238.184 |             |
|    2 | 133.402 | WWWWWWh      |     119.092 |             |
|    3 |  99.531 | WWWWH        |   79.394667 |             |
|    4 |  82.760 | WWWW:        |      59.546 |             |
#+TBLFM: $3='(orgtbl-ascii-draw $2 0 238.184 12)::$4=@2$2/$1

--
Alex Bennée