All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com
Cc: mark.burton@greensocs.com, pbonzini@redhat.com,
	jan.kiszka@siemens.com, serge.fdrv@gmail.com, rth@twiddle.net,
	peter.maydell@linaro.org, claudio.fontana@huawei.com
Subject: Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
Date: Fri, 12 Aug 2016 09:02:07 +0100	[thread overview]
Message-ID: <87shuabekg.fsf@linaro.org> (raw)
In-Reply-To: <87twerb4q8.fsf@linaro.org>


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> This is the fourth iteration of the RFC patch set which aims to
>> provide the basic framework for MTTCG. I hope this will provide a good
>> base for discussion at KVM Forum later this month.
>>
> <snip>
>>
>> In practice the memory barrier problems don't show up with an x86
>> host. In fact I have created a tree which merges in the Emilio's
>> cmpxchg atomics which happily boots ARMv7 Debian systems without any
>> additional changes. You can find that at:
>>
>>   https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>
> <snip>
>> Performance
>> ===========
>>
>> You can't do full work-load testing on this tree due to the lack of
>> atomic support (but I will run some numbers on
>> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
>
> So here is a more real world work load run:
>
>   retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
>   run 1: ret=0 (PASS), time=261.794911 (1/1)
>   run 2: ret=0 (PASS), time=257.290045 (2/2)
>   run 3: ret=0 (PASS), time=256.536991 (3/3)
>   run 4: ret=0 (PASS), time=254.036260 (4/4)
>   run 5: ret=0 (PASS), time=256.539165 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
>   Ran command 5 times, 5 passes
>
>   retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
>   run 1: ret=0 (PASS), time=86.597459 (1/1)
>   run 2: ret=0 (PASS), time=82.843904 (2/2)
>   run 3: ret=0 (PASS), time=84.095910 (3/3)
>   run 4: ret=0 (PASS), time=83.844595 (4/4)
>   run 5: ret=0 (PASS), time=83.594768 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
>   Ran command 5 times, 5 passes
>
> This shows a 30% overhead over the ideal for running multi-threaded but
> still seeing a decent improvement in wall time.
>
> So the test itself is booting the system, running the
> benchmark-build.service:
>
>   # A benchmark target
>   #
>   # This shutsdown once the boot has completed
>
>   [Unit]
>   Description=Default
>   Requires=basic.target
>   After=basic.target
>   AllowIsolate=yes
>
>   [Service]
>   Type=oneshot
>   ExecStart=/root/mysrc/testcases.git/build-dir.sh
>   /root/src/stress-ng.git/
>   ExecStartPost=/sbin/poweroff
>
>   [Install]
>   WantedBy=multi-user.target
>
> And the build-dir script is a simple:
>
>     #!/bin/sh
>     #
>     NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
>     set -e
>     cd $1
>     make clean
>     make -j${NR_CPUS}
>     cd -
>
> Measuring this over increasing -smp


Measuring this over increasing -smp

 -smp     time  -smp 1 / smp  time as bar   x faster
-----------------------------------------------------
    1  238.184       238.184  WWWWWWWWWWWW     1.000
    2  133.402       119.092  WWWWWWh          1.785
    3   99.531        79.395  WWWWW            2.393
    4   82.760        59.546  WWWW.            2.878
    5   82.513        47.637  WWWW.            2.887
    6   78.922        39.697  WWWH             3.018
    7   87.181        34.026  WWWW;            2.732
    8   87.098        29.773  WWWW;            2.735

So a more complete analysis shows the benefits start to tail off as we
push past 4 vCPUs. However on my machine which is 4+4 hyperthreads that
could be just as much a feature of the host system. Indeed the results
start getting noisy at 7/8 vCPUs.

Interestingly a perf run against -smp 6 shows gic_update topping the
graph (3.14% of total execution time). That function does have a big
TODO for optimisation on it ;-)

--
Alex Bennée

  reply	other threads:[~2016-08-12 15:15 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
2016-09-07  2:21   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-09-07  2:30   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
2016-09-07  2:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
2016-09-07  2:41   ` Richard Henderson
2016-09-07  7:08     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
2016-09-07  2:48   ` Richard Henderson
2016-08-11 15:24 ` [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
2016-08-11 15:24   ` [Qemu-devel] " Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
2016-09-07  2:53   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
2016-09-07  3:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-09-07  3:25   ` Richard Henderson
2016-09-07  5:40     ` Paolo Bonzini
2016-09-07 10:15       ` Alex Bennée
2016-09-07 10:19     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-09-07  3:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
2016-09-07  4:03   ` Richard Henderson
2016-09-07  5:43     ` Paolo Bonzini
2016-09-07  6:43       ` Richard Henderson
2016-09-07 15:15         ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
2016-09-07  4:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
2016-09-07  4:11   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
2016-09-27 15:56   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
2016-09-07  4:22   ` Richard Henderson
2016-09-07 10:05   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
2016-09-07  4:26   ` Richard Henderson
2016-09-27 16:16   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
2016-09-08  0:12   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
2016-09-08 17:19   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
2016-09-07 10:08   ` Paolo Bonzini
2016-09-08 17:23   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2016-09-08 17:24   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
2016-09-08 17:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
2016-09-07 10:09   ` Paolo Bonzini
2016-09-08 17:54   ` Richard Henderson
2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
2016-08-12  8:02   ` Alex Bennée [this message]
2016-09-06  9:24 ` Alex Bennée
     [not found] <mailman.11856.1470929072.26858.qemu-devel@nongnu.org>
2016-08-11 16:43 ` G 3
2016-08-12 13:19   ` Alex Bennée
2016-08-12 13:31     ` G 3
2016-08-12 15:01       ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shuabekg.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=cota@braap.org \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.