From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58982) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLN24-0004Z9-QW for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aLN1z-0006yg-QG for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49688) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLN1z-0006yb-Kf for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:31 -0500 Date: Tue, 19 Jan 2016 11:32:25 +0800 From: Fam Zheng Message-ID: <20160119033225.GC13438@ad.usersys.redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] virtio-blk-dataplane performance numbers List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Diana Madalina Craciun Cc: "qemu-devel@nongnu.org" On Fri, 01/15 12:35, Diana Madalina Craciun wrote: > Hi, > > I made some measurements guest vs bare metal (using virtio-data-plane) > and got some results I cannot fully explain. > > First some details about the setup: > > - I have an ARM v8 hardware + 1 SSD connected to SATA. > > I have run FIO using multiple block sizes and IO depths: > > for i in 1 2 4 8 16 32 > do > for j in 4 8 16 32 64 128 256 512 > do > echo "Test ${i}_${j}" > fio -filename=/dev/sda1 -direct=1 -iodepth $i -rw=write > -ioengine=libaio -bs=${j}k -size=8G -numjobs=4 -group_reporting > -name=mytest_write_${i}_${j} > /dev/out_write_${i}_${j} > done > done > > > I run the same script for both baremetal and guest. > > QEMU (QEMU 2.4 and kernel 4.1) command line is the following: > > qemu-system-aarch64 -enable-kvm -nographic -machine type=virt -cpu host > -kernel /boot/Image -append "root=/dev/ram rw console=ttyAMA0,115200 > ramdisk_size=1000000" -serial tcp::4444,server,telnet -initrd > /boot/rootfs.ext2.gz -m 1024 -mem-path /var/lib/hugetlbfs/pagesize-1GB > -object iothread,id=iothread0 -drive > if=none,id=drive0,cache=none,format=raw,file=/dev/sda,aio=native -device > virtio-blk-pci,drive=drive0,scsi=off,iothread=iothread0 > > I have pinned the I/O thread to physical CPU 0 and the VCPU thread to > physical CPU 1. > > When comparing bare metal vs guest I have noticed that in some > situations I get better results in guest. The table contains the results > for sequential read (lines: io depth and columns: block sizes, the > numbers are guest vs bare metal degradation percentage). For random read > and write the results do not show large variations, but for sequential > read and write I see important variations. > > 4k 8K 16K 32K 64K 128K 256K 512K > 1 50.28 37.19 36.08 -0.4 4.09 5.18 3.22 1.71 > 2 46.22 22.63 24.41 -0.45 1.72 2.17 2.37 -4.64 > 4 -10.82 15.60 11.64 5.21 0.09 2.86 -3.52 6.71 > 8 -18.05 5.96 8.82 0.26 0.95 4.30 -13.53 17.9 > 16 12.78 11.76 6.29 3.42 7.00 18.14 -0.4 5.59 > 32 16.99 7.98 4.70 7.67 -9.78 3.66 9.48 -3.55 > > The negative numbers may come from the benchmark variation, probably I > would have to run the benchmark multiple times to see the variation even > in host. > > However if somebody have an explanation of why I might get better > results in guest vs bare metal or at least in what direction to investigate. It's probably due to request merges in virtio-blk-pci. You can collect blktrace at host side to see - the I/O size sent to host device would be larger due to merges. Fam