From: Asias He <asias@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Anthony Liguori <aliguori@us.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>, Khoa Huynh <khoa@us.ibm.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane
Date: Wed, 21 Nov 2012 13:22:22 +0800 [thread overview]
Message-ID: <50AC650E.2080207@redhat.com> (raw)
In-Reply-To: <CAJSP0QXx5VVCU+zs-N1J5g7t1DkC5k7+S35pmfWGzytP1GL0Tg@mail.gmail.com>
On 11/20/2012 08:21 PM, Stefan Hajnoczi wrote:
> On Tue, Nov 20, 2012 at 10:02 AM, Asias He <asias@redhat.com> wrote:
>> Hello Stefan,
>>
>> On 11/15/2012 11:18 PM, Stefan Hajnoczi wrote:
>>> This series adds the -device virtio-blk-pci,x-data-plane=on property that
>>> enables a high performance I/O codepath. A dedicated thread is used to process
>>> virtio-blk requests outside the global mutex and without going through the QEMU
>>> block layer.
>>>
>>> Khoa Huynh <khoa@us.ibm.com> reported an increase from 140,000 IOPS to 600,000
>>> IOPS for a single VM using virtio-blk-data-plane in July:
>>>
>>> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580
>>>
>>> The virtio-blk-data-plane approach was originally presented at Linux Plumbers
>>> Conference 2010. The following slides contain a brief overview:
>>>
>>> http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf
>>>
>>> The basic approach is:
>>> 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
>>> signalling when the guest kicks the virtqueue.
>>> 2. Requests are processed without going through the QEMU block layer using
>>> Linux AIO directly.
>>> 3. Completion interrupts are injected via irqfd from the dedicated thread.
>>>
>>> To try it out:
>>>
>>> qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=...
>>> -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on
>>
>>
>> Is this the latest dataplane bits:
>> (git://github.com/stefanha/qemu.git virtio-blk-data-plane)
>>
>> commit 7872075c24fa01c925d4f41faa9d04ce69bf5328
>> Author: Stefan Hajnoczi <stefanha@redhat.com>
>> Date: Wed Nov 14 15:45:38 2012 +0100
>>
>> virtio-blk: add x-data-plane=on|off performance feature
>>
>>
>> With this commit on a ramdisk based box, I am seeing about 10K IOPS with
>> x-data-plane on and 90K IOPS with x-data-plane off.
>>
>> Any ideas?
>>
>> Command line I used:
>>
>> IMG=/dev/ram0
>> x86_64-softmmu/qemu-system-x86_64 \
>> -drive file=/root/img/sid.img,if=ide \
>> -drive file=${IMG},if=none,cache=none,aio=native,id=disk1 -device
>> virtio-blk-pci,x-data-plane=off,drive=disk1,scsi=off \
>> -kernel $KERNEL -append "root=/dev/sdb1 console=tty0" \
>> -L /tmp/qemu-dataplane/share/qemu/ -nographic -vnc :0 -enable-kvm -m
>> 2048 -smp 4 -cpu qemu64,+x2apic -M pc
>
> Was just about to send out the latest patch series which addresses
> review comments, so I have tested the latest code
> (61b70fef489ce51ecd18d69afb9622c110b9315c).
>
> I was unable to reproduce a ramdisk performance regression on Linux
> 3.6.6-3.fc18.x86_64 with Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz with
> 8 GB RAM.
I am using the latest upstream kernel.
> The ramdisk is 4 GB and I used your QEMU command-line with a RHEL 6.3 guest.
>
> Summary results:
> x-data-plane-on: iops=132856 aggrb=1039.1MB/s
> x-data-plane-off: iops=126236 aggrb=988.40MB/s
>
> virtio-blk-data-plane is ~5% faster in this benchmark.
>
> fio jobfile:
> [global]
> filename=/dev/vda
> blocksize=8k
> ioengine=libaio
> direct=1
> iodepth=8
> runtime=120
> time_based=1
>
> [reads]
> readwrite=randread
> numjobs=4
>
> Perf top (data-plane-on):
> 3.71% [kvm] [k] kvm_arch_vcpu_ioctl_run
> 3.27% [kernel] [k] memset <--- ramdisk
> 2.98% [kernel] [k] do_blockdev_direct_IO
> 2.82% [kvm_intel] [k] vmx_vcpu_run
> 2.66% [kernel] [k] _raw_spin_lock_irqsave
> 2.06% [kernel] [k] put_compound_page
> 2.06% [kernel] [k] __get_page_tail
> 1.83% [i915] [k] __gen6_gt_force_wake_mt_get
> 1.75% [kernel] [k] _raw_spin_unlock_irqrestore
> 1.33% qemu-system-x86_64 [.] vring_pop <--- virtio-blk-data-plane
> 1.19% [kernel] [k] compound_unlock_irqrestore
> 1.13% [kernel] [k] gup_huge_pmd
> 1.11% [kernel] [k] __audit_syscall_exit
> 1.07% [kernel] [k] put_page_testzero
> 1.01% [kernel] [k] fget
> 1.01% [kernel] [k] do_io_submit
>
> Since the ramdisk (memset and page-related functions) is so prominent
> in perf top, I also tried a 1-job 8k dd sequential write test on a
> Samsung 830 Series SSD where virtio-blk-data-plane was 9% faster than
> virtio-blk. Optimizing against ramdisk isn't a good idea IMO because
> it acts very differently from real hardware where the driver relies on
> mmio, DMA, and interrupts (vs synchronous memcpy/memset).
For the memset in ramdisk, you can simply patch drivers/block/brd.c to
do nop instead of memset for testing.
Yes, if you have fast SSD device (sometimes you need multiple which I
do not have), it makes more sense to test on real hardware. However,
ramdisk test is still useful. It gives rough performance numbers. If A
and B are both tested against ramdisk. The difference between A and B
are still useful.
> Full results:
> $ cat data-plane-off
> reads: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=8
> ...
> reads: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=8
> fio 1.57
> Starting 4 processes
>
> reads: (groupid=0, jobs=1): err= 0: pid=1851
> read : io=29408MB, bw=250945KB/s, iops=31368 , runt=120001msec
> slat (usec): min=2 , max=27829 , avg=11.06, stdev=78.05
> clat (usec): min=1 , max=28028 , avg=241.41, stdev=388.47
> lat (usec): min=33 , max=28035 , avg=253.17, stdev=396.66
> bw (KB/s) : min=197141, max=335365, per=24.78%, avg=250797.02,
> stdev=29376.35
> cpu : usr=6.55%, sys=31.34%, ctx=310932, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3764202/0/0, short=0/0/0
> lat (usec): 2=0.01%, 4=0.01%, 20=0.01%, 50=1.78%, 100=27.11%
> lat (usec): 250=38.97%, 500=27.11%, 750=2.09%, 1000=0.71%
> lat (msec): 2=1.32%, 4=0.70%, 10=0.20%, 20=0.01%, 50=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1852
> read : io=29742MB, bw=253798KB/s, iops=31724 , runt=120001msec
> slat (usec): min=2 , max=17007 , avg=10.61, stdev=67.51
> clat (usec): min=1 , max=41531 , avg=239.00, stdev=379.03
> lat (usec): min=32 , max=41547 , avg=250.33, stdev=385.21
> bw (KB/s) : min=194336, max=347497, per=25.02%, avg=253204.25,
> stdev=31172.37
> cpu : usr=6.66%, sys=32.58%, ctx=327250, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3806999/0/0, short=0/0/0
> lat (usec): 2=0.01%, 20=0.01%, 50=1.54%, 100=26.45%, 250=40.04%
> lat (usec): 500=27.15%, 750=1.95%, 1000=0.71%
> lat (msec): 2=1.29%, 4=0.68%, 10=0.18%, 20=0.01%, 50=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1853
> read : io=29859MB, bw=254797KB/s, iops=31849 , runt=120001msec
> slat (usec): min=2 , max=16821 , avg=11.35, stdev=76.54
> clat (usec): min=1 , max=17659 , avg=237.25, stdev=375.31
> lat (usec): min=31 , max=17673 , avg=249.27, stdev=383.62
> bw (KB/s) : min=194864, max=345280, per=25.15%, avg=254534.63,
> stdev=30549.32
> cpu : usr=6.52%, sys=31.84%, ctx=303763, majf=0, minf=39
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3821989/0/0, short=0/0/0
> lat (usec): 2=0.01%, 10=0.01%, 20=0.01%, 50=2.09%, 100=29.19%
> lat (usec): 250=37.31%, 500=26.41%, 750=2.08%, 1000=0.71%
> lat (msec): 2=1.32%, 4=0.70%, 10=0.20%, 20=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1854
> read : io=29598MB, bw=252565KB/s, iops=31570 , runt=120001msec
> slat (usec): min=2 , max=26413 , avg=11.21, stdev=78.32
> clat (usec): min=16 , max=27993 , avg=239.56, stdev=381.67
> lat (usec): min=34 , max=28006 , avg=251.49, stdev=390.13
> bw (KB/s) : min=194256, max=369424, per=24.94%, avg=252462.86,
> stdev=29420.58
> cpu : usr=6.57%, sys=31.33%, ctx=305623, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3788507/0/0, short=0/0/0
> lat (usec): 20=0.01%, 50=2.13%, 100=28.30%, 250=37.74%, 500=26.66%
> lat (usec): 750=2.17%, 1000=0.75%
> lat (msec): 2=1.35%, 4=0.70%, 10=0.19%, 20=0.01%, 50=0.01%
>
> Run status group 0 (all jobs):
> READ: io=118607MB, aggrb=988.40MB/s, minb=256967KB/s,
> maxb=260912KB/s, mint=120001msec, maxt=120001msec
>
> Disk stats (read/write):
> vda: ios=15148328/0, merge=0/0, ticks=1550570/0, in_queue=1536232, util=96.56%
>
> $ cat data-plane-on
> reads: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=8
> ...
> reads: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=8
> fio 1.57
> Starting 4 processes
>
> reads: (groupid=0, jobs=1): err= 0: pid=1796
> read : io=32081MB, bw=273759KB/s, iops=34219 , runt=120001msec
> slat (usec): min=1 , max=20404 , avg=21.08, stdev=125.49
> clat (usec): min=10 , max=135743 , avg=207.62, stdev=532.90
> lat (usec): min=21 , max=136055 , avg=229.60, stdev=556.82
> bw (KB/s) : min=56480, max=951952, per=25.49%, avg=271488.81,
> stdev=149773.57
> cpu : usr=7.01%, sys=43.26%, ctx=336854, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=4106413/0/0, short=0/0/0
> lat (usec): 20=0.01%, 50=2.46%, 100=61.13%, 250=21.58%, 500=3.11%
> lat (usec): 750=3.04%, 1000=3.88%
> lat (msec): 2=4.50%, 4=0.13%, 10=0.11%, 20=0.06%, 50=0.01%
> lat (msec): 250=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1797
> read : io=30104MB, bw=256888KB/s, iops=32110 , runt=120001msec
> slat (usec): min=1 , max=17595 , avg=22.20, stdev=120.29
> clat (usec): min=13 , max=136264 , avg=221.21, stdev=528.19
> lat (usec): min=22 , max=136280 , avg=244.35, stdev=551.73
> bw (KB/s) : min=57312, max=838880, per=23.93%, avg=254798.51,
> stdev=139546.57
> cpu : usr=6.82%, sys=41.87%, ctx=360348, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3853351/0/0, short=0/0/0
> lat (usec): 20=0.01%, 50=2.10%, 100=58.47%, 250=22.38%, 500=3.68%
> lat (usec): 750=3.69%, 1000=4.52%
> lat (msec): 2=4.87%, 4=0.14%, 10=0.11%, 20=0.05%, 250=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1798
> read : io=31698MB, bw=270487KB/s, iops=33810 , runt=120001msec
> slat (usec): min=1 , max=17457 , avg=20.93, stdev=125.33
> clat (usec): min=16 , max=134663 , avg=210.19, stdev=535.77
> lat (usec): min=21 , max=134671 , avg=232.02, stdev=559.27
> bw (KB/s) : min=57248, max=841952, per=25.29%, avg=269330.21,
> stdev=148661.08
> cpu : usr=6.92%, sys=42.81%, ctx=337799, majf=0, minf=39
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=4057340/0/0, short=0/0/0
> lat (usec): 20=0.01%, 50=1.98%, 100=62.00%, 250=20.70%, 500=3.22%
> lat (usec): 750=3.23%, 1000=4.16%
> lat (msec): 2=4.41%, 4=0.13%, 10=0.10%, 20=0.06%, 250=0.01%
> reads: (groupid=0, jobs=1): err= 0: pid=1799
> read : io=30913MB, bw=263789KB/s, iops=32973 , runt=120000msec
> slat (usec): min=1 , max=17565 , avg=21.52, stdev=120.17
> clat (usec): min=15 , max=136064 , avg=215.53, stdev=529.56
> lat (usec): min=27 , max=136070 , avg=237.99, stdev=552.50
> bw (KB/s) : min=57632, max=900896, per=24.74%, avg=263431.57,
> stdev=148379.15
> cpu : usr=6.90%, sys=42.56%, ctx=348217, majf=0, minf=41
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued r/w/d: total=3956830/0/0, short=0/0/0
> lat (usec): 20=0.01%, 50=1.76%, 100=59.96%, 250=22.21%, 500=3.45%
> lat (usec): 750=3.35%, 1000=4.33%
> lat (msec): 2=4.65%, 4=0.13%, 10=0.11%, 20=0.05%, 250=0.01%
>
> Run status group 0 (all jobs):
> READ: io=124796MB, aggrb=1039.1MB/s, minb=263053KB/s,
> maxb=280328KB/s, mint=120000msec, maxt=120001msec
>
> Disk stats (read/write):
> vda: ios=15942789/0, merge=0/0, ticks=336240/0, in_queue=317832, util=97.47%
>
--
Asias
next prev parent reply other threads:[~2012-11-21 5:21 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-15 15:18 [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 1/7] raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane Stefan Hajnoczi
2012-11-15 20:03 ` Anthony Liguori
2012-11-16 6:15 ` Stefan Hajnoczi
2012-11-16 8:22 ` Paolo Bonzini
2012-11-15 15:19 ` [Qemu-devel] [PATCH 2/7] configure: add CONFIG_VIRTIO_BLK_DATA_PLANE Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 3/7] dataplane: add virtqueue vring code Stefan Hajnoczi
2012-11-15 15:37 ` Paolo Bonzini
2012-11-15 20:09 ` Anthony Liguori
2012-11-16 6:24 ` Stefan Hajnoczi
2012-11-16 7:48 ` Christian Borntraeger
2012-11-16 8:13 ` Stefan Hajnoczi
2012-11-17 16:15 ` Blue Swirl
2012-11-18 9:27 ` Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 4/7] dataplane: add event loop Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 5/7] dataplane: add Linux AIO request queue Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 6/7] dataplane: add virtio-blk data plane code Stefan Hajnoczi
2012-11-15 15:19 ` [Qemu-devel] [PATCH 7/7] virtio-blk: add x-data-plane=on|off performance feature Stefan Hajnoczi
2012-11-15 18:48 ` Michael S. Tsirkin
2012-11-15 19:34 ` Khoa Huynh
2012-11-15 21:11 ` Anthony Liguori
2012-11-15 21:08 ` Anthony Liguori
2012-11-16 6:22 ` Stefan Hajnoczi
2012-11-19 10:38 ` Kevin Wolf
2012-11-19 10:51 ` Paolo Bonzini
2012-11-16 7:40 ` Paolo Bonzini
2012-11-20 9:02 ` [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane Asias He
2012-11-20 12:21 ` Stefan Hajnoczi
2012-11-20 12:25 ` Stefan Hajnoczi
2012-11-21 5:39 ` Asias He
2012-11-21 6:42 ` Asias He
2012-11-21 6:44 ` Stefan Hajnoczi
2012-11-21 7:00 ` Asias He
2012-11-22 12:12 ` Stefan Hajnoczi
2012-11-21 5:22 ` Asias He [this message]
2012-11-22 12:16 ` Stefan Hajnoczi
2012-11-20 15:03 ` Khoa Huynh
2012-11-21 5:22 ` Asias He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50AC650E.2080207@redhat.com \
--to=asias@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=khoa@us.ibm.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).