From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57906) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXtgQ-0007Mv-Am for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bXtgN-0001MN-Rr for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:17 -0400 Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:37561) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXtgN-0001MJ-HS for qemu-devel@nongnu.org; Thu, 11 Aug 2016 13:22:15 -0400 Received: by mail-wm0-x229.google.com with SMTP id i5so4941164wmg.0 for ; Thu, 11 Aug 2016 10:22:15 -0700 (PDT) References: <1470929064-4092-1-git-send-email-alex.bennee@linaro.org> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <1470929064-4092-1-git-send-email-alex.bennee@linaro.org> Date: Thu, 11 Aug 2016 18:22:23 +0100 Message-ID: <87twerb4q8.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org, fred.konrad@greensocs.com, a.rigo@virtualopensystems.com, cota@braap.org, bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com Cc: mark.burton@greensocs.com, pbonzini@redhat.com, jan.kiszka@siemens.com, serge.fdrv@gmail.com, rth@twiddle.net, peter.maydell@linaro.org, claudio.fontana@huawei.com Alex Bennée writes: > This is the fourth iteration of the RFC patch set which aims to > provide the basic framework for MTTCG. I hope this will provide a good > base for discussion at KVM Forum later this month. > > > In practice the memory barrier problems don't show up with an x86 > host. In fact I have created a tree which merges in the Emilio's > cmpxchg atomics which happily boots ARMv7 Debian systems without any > additional changes. You can find that at: > > https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2 > > Performance > =========== > > You can't do full work-load testing on this tree due to the lack of > atomic support (but I will run some numbers on > mttcg/base-patches-v4-with-cmpxchg-atomics-v2). So here is a more real world work load run: retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single'] run 1: ret=0 (PASS), time=261.794911 (1/1) run 2: ret=0 (PASS), time=257.290045 (2/2) run 3: ret=0 (PASS), time=256.536991 (3/3) run 4: ret=0 (PASS), time=254.036260 (4/4) run 5: ret=0 (PASS), time=256.539165 (5/5) Results summary: 0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation) Ran command 5 times, 5 passes retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi'] run 1: ret=0 (PASS), time=86.597459 (1/1) run 2: ret=0 (PASS), time=82.843904 (2/2) run 3: ret=0 (PASS), time=84.095910 (3/3) run 4: ret=0 (PASS), time=83.844595 (4/4) run 5: ret=0 (PASS), time=83.594768 (5/5) Results summary: 0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation) Ran command 5 times, 5 passes This shows a 30% overhead over the ideal for running multi-threaded but still seeing a decent improvement in wall time. So the test itself is booting the system, running the benchmark-build.service: # A benchmark target # # This shutsdown once the boot has completed [Unit] Description=Default Requires=basic.target After=basic.target AllowIsolate=yes [Service] Type=oneshot ExecStart=/root/mysrc/testcases.git/build-dir.sh /root/src/stress-ng.git/ ExecStartPost=/sbin/poweroff [Install] WantedBy=multi-user.target And the build-dir script is a simple: #!/bin/sh # NR_CPUS=$(grep -c ^processor /proc/cpuinfo) set -e cd $1 make clean make -j${NR_CPUS} cd - Measuring this over increasing -smp | -smp | time | time as bar | theoretical | % of -smp 1 | |------+---------+--------------+-------------+-------------| | 1 | 238.184 | WWWWWWWWWWWW | 238.184 | | | 2 | 133.402 | WWWWWWh | 119.092 | | | 3 | 99.531 | WWWWH | 79.394667 | | | 4 | 82.760 | WWWW: | 59.546 | | #+TBLFM: $3='(orgtbl-ascii-draw $2 0 238.184 12)::$4=@2$2/$1 -- Alex Bennée