From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58982)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1aLN24-0004Z9-QW
	for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:37 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1aLN1z-0006yg-QG
	for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:36 -0500
Received: from mx1.redhat.com ([209.132.183.28]:49688)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1aLN1z-0006yb-Kf
	for qemu-devel@nongnu.org; Mon, 18 Jan 2016 22:32:31 -0500
Date: Tue, 19 Jan 2016 11:32:25 +0800
From: Fam Zheng <famz@redhat.com>
Message-ID: <20160119033225.GC13438@ad.usersys.redhat.com>
References: <HE1PR04MB132162A06A81200D1D1F13D6FFCD0@HE1PR04MB1321.eurprd04.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <HE1PR04MB132162A06A81200D1D1F13D6FFCD0@HE1PR04MB1321.eurprd04.prod.outlook.com>
Subject: Re: [Qemu-devel] virtio-blk-dataplane performance numbers
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Diana Madalina Craciun <diana.craciun@nxp.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On Fri, 01/15 12:35, Diana Madalina Craciun wrote:
> Hi,
> 
> I made some measurements guest vs bare metal (using virtio-data-plane)
> and got some results I cannot fully explain.
> 
> First some details about the setup:
> 
> - I have an ARM v8 hardware + 1 SSD connected to SATA.
> 
> I have run FIO using multiple block sizes and IO depths:
> 
> for i in 1 2 4 8 16 32
> do
>     for j in 4 8 16 32 64 128 256 512
>     do
>         echo "Test ${i}_${j}"
>         fio -filename=/dev/sda1 -direct=1 -iodepth $i  -rw=write 
> -ioengine=libaio -bs=${j}k -size=8G -numjobs=4 -group_reporting
> -name=mytest_write_${i}_${j} > /dev/out_write_${i}_${j}
>     done
> done
> 
> 
> I run the same script for both baremetal and guest.
> 
> QEMU (QEMU 2.4 and kernel 4.1) command line is the following:
> 
> qemu-system-aarch64 -enable-kvm -nographic -machine type=virt -cpu host
> -kernel /boot/Image -append "root=/dev/ram rw console=ttyAMA0,115200
> ramdisk_size=1000000" -serial tcp::4444,server,telnet -initrd
> /boot/rootfs.ext2.gz -m 1024 -mem-path /var/lib/hugetlbfs/pagesize-1GB
> -object iothread,id=iothread0 -drive
> if=none,id=drive0,cache=none,format=raw,file=/dev/sda,aio=native -device
> virtio-blk-pci,drive=drive0,scsi=off,iothread=iothread0
> 
> I have pinned the I/O thread to physical CPU 0 and the VCPU thread to
> physical CPU 1.
> 
> When comparing bare metal vs guest I have noticed that in some
> situations I get better results in guest. The table contains the results
> for sequential read (lines: io depth and columns: block sizes, the
> numbers are guest vs bare metal degradation percentage). For random read
> and write the results do not show large variations, but for sequential
> read and write I see important variations.
> 
>    4k      8K      16K     32K     64K    128K    256K     512K
> 1  50.28   37.19   36.08  -0.4    4.09   5.18    3.22     1.71
> 2  46.22   22.63   24.41  -0.45   1.72   2.17    2.37    -4.64
> 4 -10.82   15.60   11.64   5.21   0.09   2.86   -3.52     6.71
> 8 -18.05   5.96    8.82    0.26   0.95   4.30   -13.53    17.9  
> 16 12.78   11.76   6.29    3.42   7.00   18.14  -0.4      5.59
> 32 16.99   7.98    4.70    7.67  -9.78   3.66    9.48    -3.55   
> 
> The negative numbers may come from the benchmark variation, probably I
> would have to run the benchmark multiple times to see the variation even
> in host.
> 
> However if somebody have an explanation of why I might get better
> results in guest vs bare metal or at least in what direction to investigate.

It's probably due to request merges in virtio-blk-pci. You can collect blktrace
at host side to see - the I/O size sent to host device would be larger due to
merges.

Fam