* [Qemu-devel] virtio-blk-dataplane performance numbers
@ 2016-01-15 12:35 Diana Madalina Craciun
2016-01-19 3:32 ` Fam Zheng
0 siblings, 1 reply; 2+ messages in thread
From: Diana Madalina Craciun @ 2016-01-15 12:35 UTC (permalink / raw)
To: qemu-devel@nongnu.org
Hi,
I made some measurements guest vs bare metal (using virtio-data-plane)
and got some results I cannot fully explain.
First some details about the setup:
- I have an ARM v8 hardware + 1 SSD connected to SATA.
I have run FIO using multiple block sizes and IO depths:
for i in 1 2 4 8 16 32
do
for j in 4 8 16 32 64 128 256 512
do
echo "Test ${i}_${j}"
fio -filename=/dev/sda1 -direct=1 -iodepth $i -rw=write
-ioengine=libaio -bs=${j}k -size=8G -numjobs=4 -group_reporting
-name=mytest_write_${i}_${j} > /dev/out_write_${i}_${j}
done
done
I run the same script for both baremetal and guest.
QEMU (QEMU 2.4 and kernel 4.1) command line is the following:
qemu-system-aarch64 -enable-kvm -nographic -machine type=virt -cpu host
-kernel /boot/Image -append "root=/dev/ram rw console=ttyAMA0,115200
ramdisk_size=1000000" -serial tcp::4444,server,telnet -initrd
/boot/rootfs.ext2.gz -m 1024 -mem-path /var/lib/hugetlbfs/pagesize-1GB
-object iothread,id=iothread0 -drive
if=none,id=drive0,cache=none,format=raw,file=/dev/sda,aio=native -device
virtio-blk-pci,drive=drive0,scsi=off,iothread=iothread0
I have pinned the I/O thread to physical CPU 0 and the VCPU thread to
physical CPU 1.
When comparing bare metal vs guest I have noticed that in some
situations I get better results in guest. The table contains the results
for sequential read (lines: io depth and columns: block sizes, the
numbers are guest vs bare metal degradation percentage). For random read
and write the results do not show large variations, but for sequential
read and write I see important variations.
4k 8K 16K 32K 64K 128K 256K 512K
1 50.28 37.19 36.08 -0.4 4.09 5.18 3.22 1.71
2 46.22 22.63 24.41 -0.45 1.72 2.17 2.37 -4.64
4 -10.82 15.60 11.64 5.21 0.09 2.86 -3.52 6.71
8 -18.05 5.96 8.82 0.26 0.95 4.30 -13.53 17.9
16 12.78 11.76 6.29 3.42 7.00 18.14 -0.4 5.59
32 16.99 7.98 4.70 7.67 -9.78 3.66 9.48 -3.55
The negative numbers may come from the benchmark variation, probably I
would have to run the benchmark multiple times to see the variation even
in host.
However if somebody have an explanation of why I might get better
results in guest vs bare metal or at least in what direction to investigate.
Thank you,
Diana
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Qemu-devel] virtio-blk-dataplane performance numbers
2016-01-15 12:35 [Qemu-devel] virtio-blk-dataplane performance numbers Diana Madalina Craciun
@ 2016-01-19 3:32 ` Fam Zheng
0 siblings, 0 replies; 2+ messages in thread
From: Fam Zheng @ 2016-01-19 3:32 UTC (permalink / raw)
To: Diana Madalina Craciun; +Cc: qemu-devel@nongnu.org
On Fri, 01/15 12:35, Diana Madalina Craciun wrote:
> Hi,
>
> I made some measurements guest vs bare metal (using virtio-data-plane)
> and got some results I cannot fully explain.
>
> First some details about the setup:
>
> - I have an ARM v8 hardware + 1 SSD connected to SATA.
>
> I have run FIO using multiple block sizes and IO depths:
>
> for i in 1 2 4 8 16 32
> do
> for j in 4 8 16 32 64 128 256 512
> do
> echo "Test ${i}_${j}"
> fio -filename=/dev/sda1 -direct=1 -iodepth $i -rw=write
> -ioengine=libaio -bs=${j}k -size=8G -numjobs=4 -group_reporting
> -name=mytest_write_${i}_${j} > /dev/out_write_${i}_${j}
> done
> done
>
>
> I run the same script for both baremetal and guest.
>
> QEMU (QEMU 2.4 and kernel 4.1) command line is the following:
>
> qemu-system-aarch64 -enable-kvm -nographic -machine type=virt -cpu host
> -kernel /boot/Image -append "root=/dev/ram rw console=ttyAMA0,115200
> ramdisk_size=1000000" -serial tcp::4444,server,telnet -initrd
> /boot/rootfs.ext2.gz -m 1024 -mem-path /var/lib/hugetlbfs/pagesize-1GB
> -object iothread,id=iothread0 -drive
> if=none,id=drive0,cache=none,format=raw,file=/dev/sda,aio=native -device
> virtio-blk-pci,drive=drive0,scsi=off,iothread=iothread0
>
> I have pinned the I/O thread to physical CPU 0 and the VCPU thread to
> physical CPU 1.
>
> When comparing bare metal vs guest I have noticed that in some
> situations I get better results in guest. The table contains the results
> for sequential read (lines: io depth and columns: block sizes, the
> numbers are guest vs bare metal degradation percentage). For random read
> and write the results do not show large variations, but for sequential
> read and write I see important variations.
>
> 4k 8K 16K 32K 64K 128K 256K 512K
> 1 50.28 37.19 36.08 -0.4 4.09 5.18 3.22 1.71
> 2 46.22 22.63 24.41 -0.45 1.72 2.17 2.37 -4.64
> 4 -10.82 15.60 11.64 5.21 0.09 2.86 -3.52 6.71
> 8 -18.05 5.96 8.82 0.26 0.95 4.30 -13.53 17.9
> 16 12.78 11.76 6.29 3.42 7.00 18.14 -0.4 5.59
> 32 16.99 7.98 4.70 7.67 -9.78 3.66 9.48 -3.55
>
> The negative numbers may come from the benchmark variation, probably I
> would have to run the benchmark multiple times to see the variation even
> in host.
>
> However if somebody have an explanation of why I might get better
> results in guest vs bare metal or at least in what direction to investigate.
It's probably due to request merges in virtio-blk-pci. You can collect blktrace
at host side to see - the I/O size sent to host device would be larger due to
merges.
Fam
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-01-19 3:32 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-15 12:35 [Qemu-devel] virtio-blk-dataplane performance numbers Diana Madalina Craciun
2016-01-19 3:32 ` Fam Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).