From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dongsu Park Subject: Re: virtio-blk performance regression and qemu-kvm Date: Tue, 21 Feb 2012 16:57:25 +0100 Message-ID: <20120221155725.GA950@gmail.com> References: <20120210143639.GA17883@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org To: Stefan Hajnoczi Return-path: Received: from mail-ey0-f174.google.com ([209.85.215.174]:63018 "EHLO mail-ey0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752032Ab2BUP5c (ORCPT ); Tue, 21 Feb 2012 10:57:32 -0500 Received: by eaah12 with SMTP id h12so2700261eaa.19 for ; Tue, 21 Feb 2012 07:57:31 -0800 (PST) Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Hi Stefan, see below. On 13.02.2012 11:57, Stefan Hajnoczi wrote: > On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park > wrote: > > =C2=A0Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0. > > > > =C2=A0- Sequential read (Running inside guest) > > =C2=A0 # fio -name iops -rw=3Dread -size=3D1G -iodepth 1 \ > > =C2=A0 =C2=A0-filename /dev/vdb -ioengine libaio -direct=3D1 -bs=3D= 4096 > > > > =C2=A0- Sequential write (Running inside guest) > > =C2=A0 # fio -name iops -rw=3Dwrite -size=3D1G -iodepth 1 \ > > =C2=A0 =C2=A0-filename /dev/vdb -ioengine libaio -direct=3D1 -bs=3D= 4096 > > > > =C2=A0For each one, I tested 3 times to get the average. > > > > =C2=A0Result: > > > > =C2=A0seqread with qemu-kvm 0.14.1 =C2=A0 67,0 MByte/s > > =C2=A0seqread with qemu-kvm 1.0 =C2=A0 =C2=A0 =C2=A030,9 MByte/s > > > > =C2=A0seqwrite with qemu-kvm 0.14.1 =C2=A065,8 MByte/s > > =C2=A0seqwrite with qemu-kvm 1.0 =C2=A0 =C2=A0 30,5 MByte/s >=20 > Please retry with the following commit or simply qemu-kvm.git/master. > Avi discovered a performance regression which was introduced when the > block layer was converted to use coroutines: >=20 > $ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8 > v1.0-327-g39a7a36 >=20 > (This commit is not in 1.0!) >=20 > Please post your qemu-kvm command-line. >=20 > 67 MB/s sequential 4 KB read means 67 * 1024 / 4 =3D 17152 requests p= er > second, so 58 microseconds per request. >=20 > Please post the fio output so we can double-check what is reported. As you mentioned above, I tested it again with the revision v1.0-327-g39a7a36, which includes the commit 39a7a36. Result is though still not good enough. seqread : 20.3 MByte/s seqwrite : 20.1 MByte/s randread : 20.5 MByte/s randwrite : 20.0 MByte/s My qemu-kvm commandline is like below: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 \ -smp 1,sockets=3D1,cores=3D1,threads=3D1 -name mydebian3_8gb \ -uuid d99ad012-2fcc-6f7e-fbb9-bc48b424a258 -nodefconfig -nodefaults \ -chardev socket,id=3Dcharmonitor,path=3D/var/lib/libvirt/qemu/mydebian3= _8gb.monitor,server,nowait \ -mon chardev=3Dcharmonitor,id=3Dmonitor,mode=3Dcontrol -rtc base=3Dutc = -no-shutdown \ -drive if=3Dnone,media=3Dcdrom,id=3Ddrive-ide0-1-0,readonly=3Don,format= =3Draw \ -device ide-drive,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3Dide0= -1-0 \ -drive file=3D/var/lib/libvirt/images/mydebian3_8gb.img,if=3Dnone,id=3D= drive-virtio-disk0,format=3Draw,cache=3Dnone,aio=3Dnative \ -device virtio-blk-pci,bus=3Dpci.0,addr=3D0x5,drive=3Ddrive-virtio-disk= 0,id=3Dvirtio-disk0,bootindex=3D1 \ -drive file=3D/dev/ram0,if=3Dnone,id=3Ddrive-virtio-disk1,format=3Draw,= cache=3Dnone,aio=3Dnative \ -device virtio-blk-pci,bus=3Dpci.0,addr=3D0x7,drive=3Ddrive-virtio-disk= 1,id=3Dvirtio-disk1 \ -netdev tap,fd=3D19,id=3Dhostnet0 \ -device virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D52:54:00:68:9f= :d0,bus=3Dpci.0,addr=3D0x3 \ -chardev pty,id=3Dcharserial0 -device isa-serial,chardev=3Dcharserial0,= id=3Dserial0 \ -usb -device usb-tablet,id=3Dinput0 -vnc 127.0.0.1:0 -vga cirrus \ -device AC97,id=3Dsound0,bus=3Dpci.0,addr=3D0x4 \ -device virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x6 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D As you see, /dev/ram0 is being mapped to /dev/vdb on the guest side, which is used for fio tests. Here is a sample of fio output: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D # fio -name iops -rw=3Dread -size=3D1G -iodepth 1 -filename /dev/vdb \ -ioengine libaio -direct=3D1 -bs=3D4096 iops: (g=3D0): rw=3Dread, bs=3D4K-4K/4K-4K, ioengine=3Dlibaio, iodepth=3D= 1 Starting 1 process Jobs: 1 (f=3D1): [R] [100.0% done] [21056K/0K /s] [5140/0 iops] [eta 00m:00s] iops: (groupid=3D0, jobs=3D1): err=3D 0: pid=3D1588 read : io=3D1024MB, bw=3D20101KB/s, iops=3D5025, runt=3D 52166msec slat (usec): min=3D4, max=3D6461, avg=3D24.00, stdev=3D19.75 clat (usec): min=3D0, max=3D11934, avg=3D169.49, stdev=3D113.91 bw (KB/s) : min=3D18200, max=3D23048, per=3D100.03%, avg=3D20106.31= , stdev=3D934.42 cpu : usr=3D5.43%, sys=3D23.25%, ctx=3D262363, majf=3D0, min= f=3D28 IO depths : 1=3D100.0%, 2=3D0.0%, 4=3D0.0%, 8=3D0.0%, 16=3D0.0%, 3= 2=3D0.0%, >=3D64=3D0.0% submit : 0=3D0.0%, 4=3D100.0%, 8=3D0.0%, 16=3D0.0%, 32=3D0.0%, = 64=3D0.0%, >=3D64=3D0.0% complete : 0=3D0.0%, 4=3D100.0%, 8=3D0.0%, 16=3D0.0%, 32=3D0.0%, = 64=3D0.0%, >=3D64=3D0.0% issued r/w: total=3D262144/0, short=3D0/0 lat (usec): 2=3D0.01%, 4=3D0.16%, 10=3D0.03%, 20=3D0.01%, 50=3D0.2= 7% lat (usec): 100=3D4.07%, 250=3D89.12%, 500=3D5.76%, 750=3D0.30%, 1= 000=3D0.13% lat (msec): 2=3D0.12%, 4=3D0.02%, 10=3D0.01%, 20=3D0.01% Run status group 0 (all jobs): READ: io=3D1024MB, aggrb=3D20100KB/s, minb=3D20583KB/s, maxb=3D20583= KB/s, mint=3D52166msec, maxt=3D52166msec Disk stats (read/write): vdb: ios=3D261308/0, merge=3D0/0, ticks=3D40210/0, in_queue=3D40110, = util=3D77.14% =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D So I think, the patch for coroutine-ucontext isn't about the bottleneck I'm looking for. Regards, Dongsu p.s. Sorry for the late reply. Last week I was on vacation.