* [Qemu-devel] IO performance test on the tcm-vhost scsi @ 2012-06-13 10:13 mengcong 2012-06-13 10:35 ` Stefan Hajnoczi ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: mengcong @ 2012-06-13 10:13 UTC (permalink / raw) To: qemu-devel, target-devel Cc: stefanha, linuxram, Nicholas A. Bellinger, meng cong, Anthony Liguori, Paolo Bonzini Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to share the test result data here. seq-read seq-write rand-read rand-write 8k 256k 8k 256k 8k 256k 8k 256k ---------------------------------------------------------------------------- bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 unit: KB/s seq-read/write = sequential read/write rand-read/write = random read/write 8k,256k are blocksize of the IO In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. In virtio-blk test, cache=none,aio=native were set. In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. I also tried to do the test with a scsi-generic LUN (pass through the physical partition /dev/sgX device). But I couldn't setup it successfully. It's a pity. Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests. kvm vm: 2 cpus and 2G ram thanks. Meng Cong. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-13 10:13 [Qemu-devel] IO performance test on the tcm-vhost scsi mengcong @ 2012-06-13 10:35 ` Stefan Hajnoczi 2012-06-13 19:08 ` Nicholas A. Bellinger 2012-06-14 8:30 ` Stefan Hajnoczi 2 siblings, 0 replies; 12+ messages in thread From: Stefan Hajnoczi @ 2012-06-13 10:35 UTC (permalink / raw) To: mc Cc: stefanha, linuxram, qemu-devel, Nicholas A. Bellinger, target-devel, Anthony Liguori, Paolo Bonzini On Wed, Jun 13, 2012 at 11:13 AM, mengcong <mc@linux.vnet.ibm.com> wrote: > seq-read seq-write rand-read rand-write > 8k 256k 8k 256k 8k 256k 8k 256k > ---------------------------------------------------------------------------- > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 [...] > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. > In virtio-blk test, cache=none,aio=native were set. > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. If the LSI HBA was used then there are no benchmark results for userspace virtio-scsi? Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-13 10:13 [Qemu-devel] IO performance test on the tcm-vhost scsi mengcong 2012-06-13 10:35 ` Stefan Hajnoczi @ 2012-06-13 19:08 ` Nicholas A. Bellinger 2012-06-14 9:57 ` Cong Meng 2012-06-14 8:30 ` Stefan Hajnoczi 2 siblings, 1 reply; 12+ messages in thread From: Nicholas A. Bellinger @ 2012-06-13 19:08 UTC (permalink / raw) To: mc Cc: Jens Axboe, stefanha, linux-scsi, linuxram, qemu-devel, target-devel, Anthony Liguori, Paolo Bonzini On Wed, 2012-06-13 at 18:13 +0800, mengcong wrote: > Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to share > the test result data here. > > > seq-read seq-write rand-read rand-write > 8k 256k 8k 256k 8k 256k 8k 256k > ---------------------------------------------------------------------------- > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > unit: KB/s > seq-read/write = sequential read/write > rand-read/write = random read/write > 8k,256k are blocksize of the IO > > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. > In virtio-blk test, cache=none,aio=native were set. > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. > > I also tried to do the test with a scsi-generic LUN (pass through the > physical partition /dev/sgX device). But I couldn't setup it > successfully. It's a pity. > > Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests. > kvm vm: 2 cpus and 2G ram > These initial performance results look quite promising for virtio-scsi. I'd be really interested to see how a raw flash block device backend that locally can do ~100K 4k mixed R/W random IOPs compares with virtio-scsi guest performance as the random small block fio workload increases.. Also note there is a bottleneck wrt to random small block I/O performance (per LUN) on the Linux/SCSI initiator side that is effecting things here. We've run into this limitation numerous times with using SCSI LLDs as backend TCM devices, and I usually recommend using iblock export with raw block flash backends for achieving the best small block random I/O performance results. A number of high performance flash storage folks do something similar with raw block access (Jen's CC'ed) As per Stefan's earlier question, how does virtio-scsi to QEMU SCSI userspace compare with these results..? Is there a reason why these where not included in the initial results..? Thanks Meng! --nab ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-13 19:08 ` Nicholas A. Bellinger @ 2012-06-14 9:57 ` Cong Meng 2012-06-14 20:41 ` Nicholas A. Bellinger 0 siblings, 1 reply; 12+ messages in thread From: Cong Meng @ 2012-06-14 9:57 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Jens Axboe, stefanha, linux-scsi, linuxram, qemu-devel, target-devel, Anthony Liguori, Paolo Bonzini On Wed, 2012-06-13 at 12:08 -0700, Nicholas A. Bellinger wrote: > On Wed, 2012-06-13 at 18:13 +0800, mengcong wrote: > > Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to share > > the test result data here. > > > > > > seq-read seq-write rand-read rand-write > > 8k 256k 8k 256k 8k 256k 8k 256k > > ---------------------------------------------------------------------------- > > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > > > unit: KB/s > > seq-read/write = sequential read/write > > rand-read/write = random read/write > > 8k,256k are blocksize of the IO > > > > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. > > In virtio-blk test, cache=none,aio=native were set. > > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. > > > > I also tried to do the test with a scsi-generic LUN (pass through the > > physical partition /dev/sgX device). But I couldn't setup it > > successfully. It's a pity. > > > > Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests. > > kvm vm: 2 cpus and 2G ram > > > > These initial performance results look quite promising for virtio-scsi. > > I'd be really interested to see how a raw flash block device backend > that locally can do ~100K 4k mixed R/W random IOPs compares with > virtio-scsi guest performance as the random small block fio workload > increases.. flash block == Solid state disk? I have no one on hand. > > Also note there is a bottleneck wrt to random small block I/O > performance (per LUN) on the Linux/SCSI initiator side that is effecting > things here. We've run into this limitation numerous times with using > SCSI LLDs as backend TCM devices, and I usually recommend using iblock > export with raw block flash backends for achieving the best small block > random I/O performance results. A number of high performance flash > storage folks do something similar with raw block access (Jen's CC'ed) > > As per Stefan's earlier question, how does virtio-scsi to QEMU SCSI > userspace compare with these results..? Is there a reason why these > where not included in the initial results..? > This should be a mistake I made. I will do this pattern later. > Thanks Meng! > > --nab > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 9:57 ` Cong Meng @ 2012-06-14 20:41 ` Nicholas A. Bellinger 2012-06-15 10:35 ` Stefan Hajnoczi 0 siblings, 1 reply; 12+ messages in thread From: Nicholas A. Bellinger @ 2012-06-14 20:41 UTC (permalink / raw) To: mc Cc: Jens Axboe, stefanha, linux-scsi, linuxram, qemu-devel, target-devel, Anthony Liguori, Paolo Bonzini On Thu, 2012-06-14 at 17:57 +0800, Cong Meng wrote: > On Wed, 2012-06-13 at 12:08 -0700, Nicholas A. Bellinger wrote: > > On Wed, 2012-06-13 at 18:13 +0800, mengcong wrote: > > > Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to share > > > the test result data here. > > > > > > > > > seq-read seq-write rand-read rand-write > > > 8k 256k 8k 256k 8k 256k 8k 256k > > > ---------------------------------------------------------------------------- > > > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > > > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > > > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > > > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > > > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > > > > > unit: KB/s > > > seq-read/write = sequential read/write > > > rand-read/write = random read/write > > > 8k,256k are blocksize of the IO > > > > > > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. > > > In virtio-blk test, cache=none,aio=native were set. > > > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. > > > > > > I also tried to do the test with a scsi-generic LUN (pass through the > > > physical partition /dev/sgX device). But I couldn't setup it > > > successfully. It's a pity. > > > > > > Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests. > > > kvm vm: 2 cpus and 2G ram > > > > > > > These initial performance results look quite promising for virtio-scsi. > > > > I'd be really interested to see how a raw flash block device backend > > that locally can do ~100K 4k mixed R/W random IOPs compares with > > virtio-scsi guest performance as the random small block fio workload > > increases.. > flash block == Solid state disk? I have no one on hand. > > It just so happens there is a FusionIO HBA nearby.. ;) However, I'm quite busy with customer items the next days but am really looking forward to giving this a shot with some fast raw block flash backends soon.. Also, it's about time to convert tcm_vhost from using the old TFO->new_cmd_map() to native cmwq.. This will certainly help overall tcm_vhost performance, especially with shared backends across multiple VMs where parts of I/O backend execution can happen in separate kworker process context. Btw, I'll likely end up doing this conversion to realize the performance benefits when testing with raw flash backends, but I'm more than happy to take a patch ahead of that if you have the extra cycles to spare. > > Also note there is a bottleneck wrt to random small block I/O > > performance (per LUN) on the Linux/SCSI initiator side that is effecting > > things here. We've run into this limitation numerous times with using > > SCSI LLDs as backend TCM devices, and I usually recommend using iblock > > export with raw block flash backends for achieving the best small block > > random I/O performance results. A number of high performance flash > > storage folks do something similar with raw block access (Jen's CC'ed) > > > > As per Stefan's earlier question, how does virtio-scsi to QEMU SCSI > > userspace compare with these results..? Is there a reason why these > > where not included in the initial results..? > > > This should be a mistake I made. I will do this pattern later. > Thanks! --nab ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 20:41 ` Nicholas A. Bellinger @ 2012-06-15 10:35 ` Stefan Hajnoczi 0 siblings, 0 replies; 12+ messages in thread From: Stefan Hajnoczi @ 2012-06-15 10:35 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Jens Axboe, target-devel, stefanha, linux-scsi, linuxram, qemu-devel, mc, Anthony Liguori, Paolo Bonzini On Thu, Jun 14, 2012 at 9:41 PM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote: > Btw, I'll likely end up doing this conversion to realize the performance > benefits when testing with raw flash backends, but I'm more than happy > to take a patch ahead of that if you have the extra cycles to spare. This was mentioned on target-devel a little while ago. Unfortunately I'm busy, the chance of a patch appearing from me is low, sorry. If I do get some cycles to tackle it I'll let you know first so we don't duplicate work. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-13 10:13 [Qemu-devel] IO performance test on the tcm-vhost scsi mengcong 2012-06-13 10:35 ` Stefan Hajnoczi 2012-06-13 19:08 ` Nicholas A. Bellinger @ 2012-06-14 8:30 ` Stefan Hajnoczi 2012-06-14 9:45 ` Cong Meng 2 siblings, 1 reply; 12+ messages in thread From: Stefan Hajnoczi @ 2012-06-14 8:30 UTC (permalink / raw) To: mc Cc: stefanha, linuxram, qemu-devel, Nicholas A. Bellinger, target-devel, Anthony Liguori, Paolo Bonzini, Asias He On Wed, Jun 13, 2012 at 11:13 AM, mengcong <mc@linux.vnet.ibm.com> wrote: > seq-read seq-write rand-read rand-write > 8k 256k 8k 256k 8k 256k 8k 256k > ---------------------------------------------------------------------------- > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > unit: KB/s > seq-read/write = sequential read/write > rand-read/write = random read/write > 8k,256k are blocksize of the IO What strikes me is how virtio-blk performs significantly worse than bare metal and tcm_vhost for seq-read/seq-write 8k. The good tcm_vhost results suggest that the overhead is not the virtio interface itself, since tcm_vhost implements virtio-scsi. To drill down on the tcm_vhost vs userspace performance gap we need virtio-scsi userspace results. QEMU needs to use the same block device as the tcm-vhost-iblock benchmark. Cong: Is it possible to collect the virtio-scsi userspace results using the same block device as tcm-vhost-iblock and -drive format=raw,aio=native,cache=none? Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 8:30 ` Stefan Hajnoczi @ 2012-06-14 9:45 ` Cong Meng 2012-06-14 12:07 ` Stefan Hajnoczi 0 siblings, 1 reply; 12+ messages in thread From: Cong Meng @ 2012-06-14 9:45 UTC (permalink / raw) To: Stefan Hajnoczi Cc: stefanha, linuxram, qemu-devel, Nicholas A. Bellinger, target-devel, Anthony Liguori, Paolo Bonzini, Asias He On Thu, 2012-06-14 at 09:30 +0100, Stefan Hajnoczi wrote: > On Wed, Jun 13, 2012 at 11:13 AM, mengcong <mc@linux.vnet.ibm.com> wrote: > > seq-read seq-write rand-read rand-write > > 8k 256k 8k 256k 8k 256k 8k 256k > > ---------------------------------------------------------------------------- > > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > > > > unit: KB/s > > seq-read/write = sequential read/write > > rand-read/write = random read/write > > 8k,256k are blocksize of the IO > > What strikes me is how virtio-blk performs significantly worse than > bare metal and tcm_vhost for seq-read/seq-write 8k. The good > tcm_vhost results suggest that the overhead is not the virtio > interface itself, since tcm_vhost implements virtio-scsi. > > To drill down on the tcm_vhost vs userspace performance gap we need > virtio-scsi userspace results. QEMU needs to use the same block > device as the tcm-vhost-iblock benchmark. > > Cong: Is it possible to collect the virtio-scsi userspace results > using the same block device as tcm-vhost-iblock and -drive > format=raw,aio=native,cache=none? > virtio-scsi-raw 43065 69729 52052 67378 1757 29419 2024 28135 qemu ....\ -drive file=/dev/sdb,format=raw,if=none,id=sdb,cache=none,aio=native \ -device virtio-scsi-pci,id=mcbus \ -device scsi-disk,drive=sdb there is only one scsi HBA. /dev/sdb is the disk on which all tests have been done. Is this what you want? Cong Meng > Stefan > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 9:45 ` Cong Meng @ 2012-06-14 12:07 ` Stefan Hajnoczi 2012-06-14 12:27 ` Paolo Bonzini 2012-06-15 3:28 ` Asias He 0 siblings, 2 replies; 12+ messages in thread From: Stefan Hajnoczi @ 2012-06-14 12:07 UTC (permalink / raw) To: Cong Meng Cc: Stefan Hajnoczi, linuxram, qemu-devel, Nicholas A. Bellinger, target-devel, Anthony Liguori, Paolo Bonzini, Asias He On Thu, Jun 14, 2012 at 05:45:22PM +0800, Cong Meng wrote: > On Thu, 2012-06-14 at 09:30 +0100, Stefan Hajnoczi wrote: > > On Wed, Jun 13, 2012 at 11:13 AM, mengcong <mc@linux.vnet.ibm.com> wrote: > > > seq-read seq-write rand-read rand-write > > > 8k 256k 8k 256k 8k 256k 8k 256k > > > ---------------------------------------------------------------------------- > > > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > > > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > > > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > > > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > > > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > > > > > > > unit: KB/s > > > seq-read/write = sequential read/write > > > rand-read/write = random read/write > > > 8k,256k are blocksize of the IO > > > > What strikes me is how virtio-blk performs significantly worse than > > bare metal and tcm_vhost for seq-read/seq-write 8k. The good > > tcm_vhost results suggest that the overhead is not the virtio > > interface itself, since tcm_vhost implements virtio-scsi. > > > > To drill down on the tcm_vhost vs userspace performance gap we need > > virtio-scsi userspace results. QEMU needs to use the same block > > device as the tcm-vhost-iblock benchmark. > > > > Cong: Is it possible to collect the virtio-scsi userspace results > > using the same block device as tcm-vhost-iblock and -drive > > format=raw,aio=native,cache=none? > > > > virtio-scsi-raw 43065 69729 52052 67378 1757 29419 2024 28135 > > qemu ....\ > -drive file=/dev/sdb,format=raw,if=none,id=sdb,cache=none,aio=native \ > -device virtio-scsi-pci,id=mcbus \ > -device scsi-disk,drive=sdb > > there is only one scsi HBA. > /dev/sdb is the disk on which all tests have been done. > > Is this what you want? Perfect, thanks. virtio-scsi userspace is much better than virtio-blk here. That's unexpected since they both use the QEMU block layer. If anything, I would have expected virtio-blk to be faster! I wonder if the request patterns being sent through virtio-blk and virtio-scsi are different. Asias discovered that the guest I/O scheduler and request merging makes a big difference between QEMU and native KVM tool performance. It could be the same thing here which causes virtio-blk and virtio-scsi userspace to produce quite different results. The second question is why is tcm_vhost faster than virtio-scsi userspace. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 12:07 ` Stefan Hajnoczi @ 2012-06-14 12:27 ` Paolo Bonzini 2012-06-14 20:45 ` Nicholas A. Bellinger 2012-06-15 3:28 ` Asias He 1 sibling, 1 reply; 12+ messages in thread From: Paolo Bonzini @ 2012-06-14 12:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: target-devel, Stefan Hajnoczi, linuxram, qemu-devel, Nicholas A. Bellinger, Cong Meng, Anthony Liguori, Asias He Il 14/06/2012 14:07, Stefan Hajnoczi ha scritto: > Perfect, thanks. virtio-scsi userspace is much better than virtio-blk > here. That's unexpected since they both use the QEMU block layer. If > anything, I would have expected virtio-blk to be faster! Yes, I would have expected something similar. A blktrace would be useful here because Asias measured the opposite---virtio-scsi being much slower than virtio-blk. > The second question is why is tcm_vhost faster than virtio-scsi > userspace. I would expect a difference on more high-end benchmarks (i.e. lots of I/O to lots of disks), similar to vhost-blk. In this simple case I wonder how much it is due to the vagaries of the I/O scheduler, or even statistical noise. Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 12:27 ` Paolo Bonzini @ 2012-06-14 20:45 ` Nicholas A. Bellinger 0 siblings, 0 replies; 12+ messages in thread From: Nicholas A. Bellinger @ 2012-06-14 20:45 UTC (permalink / raw) To: Paolo Bonzini Cc: target-devel, Stefan Hajnoczi, Stefan Hajnoczi, linuxram, qemu-devel, Cong Meng, Anthony Liguori, Asias He On Thu, 2012-06-14 at 14:27 +0200, Paolo Bonzini wrote: > Il 14/06/2012 14:07, Stefan Hajnoczi ha scritto: > > Perfect, thanks. virtio-scsi userspace is much better than virtio-blk > > here. That's unexpected since they both use the QEMU block layer. If > > anything, I would have expected virtio-blk to be faster! > > Yes, I would have expected something similar. A blktrace would be > useful here because Asias measured the opposite---virtio-scsi being much > slower than virtio-blk. > > > The second question is why is tcm_vhost faster than virtio-scsi > > userspace. > > I would expect a difference on more high-end benchmarks (i.e. lots of > I/O to lots of disks), similar to vhost-blk. In this simple case I > wonder how much it is due to the vagaries of the I/O scheduler, or even > statistical noise. > Mmmm, good point wrt to the I/O scheduler. I wondering if the discrepancy between the two tests might be attributed one of the tests using noop (or something else..?) with virtio guest LUNs, or even the block scheduler used per default with the SCSI LUNs serving as the backends here.. --nab ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] IO performance test on the tcm-vhost scsi 2012-06-14 12:07 ` Stefan Hajnoczi 2012-06-14 12:27 ` Paolo Bonzini @ 2012-06-15 3:28 ` Asias He 1 sibling, 0 replies; 12+ messages in thread From: Asias He @ 2012-06-15 3:28 UTC (permalink / raw) To: Stefan Hajnoczi Cc: target-devel, Stefan Hajnoczi, linuxram, qemu-devel, Nicholas A. Bellinger, Cong Meng, Anthony Liguori, Paolo Bonzini On 06/14/2012 08:07 PM, Stefan Hajnoczi wrote: > On Thu, Jun 14, 2012 at 05:45:22PM +0800, Cong Meng wrote: >> On Thu, 2012-06-14 at 09:30 +0100, Stefan Hajnoczi wrote: >>> On Wed, Jun 13, 2012 at 11:13 AM, mengcong <mc@linux.vnet.ibm.com> wrote: >>>> seq-read seq-write rand-read rand-write >>>> 8k 256k 8k 256k 8k 256k 8k 256k >>>> ---------------------------------------------------------------------------- >>>> bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 >>>> tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 >>>> tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 >>>> virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 >>>> scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 >>> >>>> >>>> unit: KB/s >>>> seq-read/write = sequential read/write >>>> rand-read/write = random read/write >>>> 8k,256k are blocksize of the IO >>> >>> What strikes me is how virtio-blk performs significantly worse than >>> bare metal and tcm_vhost for seq-read/seq-write 8k. The good >>> tcm_vhost results suggest that the overhead is not the virtio >>> interface itself, since tcm_vhost implements virtio-scsi. >>> >>> To drill down on the tcm_vhost vs userspace performance gap we need >>> virtio-scsi userspace results. QEMU needs to use the same block >>> device as the tcm-vhost-iblock benchmark. >>> >>> Cong: Is it possible to collect the virtio-scsi userspace results >>> using the same block device as tcm-vhost-iblock and -drive >>> format=raw,aio=native,cache=none? >>> >> >> virtio-scsi-raw 43065 69729 52052 67378 1757 29419 2024 28135 >> >> qemu ....\ >> -drive file=/dev/sdb,format=raw,if=none,id=sdb,cache=none,aio=native \ >> -device virtio-scsi-pci,id=mcbus \ >> -device scsi-disk,drive=sdb >> >> there is only one scsi HBA. >> /dev/sdb is the disk on which all tests have been done. >> >> Is this what you want? > > Perfect, thanks. virtio-scsi userspace is much better than virtio-blk > here. That's unexpected since they both use the QEMU block layer. If > anything, I would have expected virtio-blk to be faster! > > I wonder if the request patterns being sent through virtio-blk and > virtio-scsi are different. Asias discovered that the guest I/O > scheduler and request merging makes a big difference between QEMU and > native KVM tool performance. It could be the same thing here which > causes virtio-blk and virtio-scsi userspace to produce quite different > results. Yes. Cong, can you try this: echo noop > /sys/block/$disk/queue/scheduler echo 2 > /sys/block/$disk/queue/nomerges This will disable the merge in guest kernel. The host side IO processing speed has a very large impact on the guest request pattern, especially for sequential read and write. > The second question is why is tcm_vhost faster than virtio-scsi > userspace. > > Stefan > -- Asias ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-06-15 10:35 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-13 10:13 [Qemu-devel] IO performance test on the tcm-vhost scsi mengcong 2012-06-13 10:35 ` Stefan Hajnoczi 2012-06-13 19:08 ` Nicholas A. Bellinger 2012-06-14 9:57 ` Cong Meng 2012-06-14 20:41 ` Nicholas A. Bellinger 2012-06-15 10:35 ` Stefan Hajnoczi 2012-06-14 8:30 ` Stefan Hajnoczi 2012-06-14 9:45 ` Cong Meng 2012-06-14 12:07 ` Stefan Hajnoczi 2012-06-14 12:27 ` Paolo Bonzini 2012-06-14 20:45 ` Nicholas A. Bellinger 2012-06-15 3:28 ` Asias He
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).