* performance of virtual functions compared to virtio
@ 2011-04-21 1:57 David Ahern
2011-04-21 2:35 ` Alex Williamson
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-21 1:57 UTC (permalink / raw)
To: KVM mailing list
In general should virtual functions outperform virtio+vhost for
networking performance - latency and throughput?
I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF
and the other going through virtio and a tap device like so:
------ ----
| |----------------| VF |---
| | ---- |
| VM 1 | |
| | ----- |
| |---| tap |--- |
------ ----- | ---
--- | e |
| b | | t |
| r | | h |
--- | 2 |
------ ----- | ---
| |---| tap |--- |
| | ----- |
| VM 2 | |
| | ---- |
| |----------------| VF |---
------ ----
The network arguments to qemu-kvm are:
-netdev type=tap,vhost=on,ifname=tap2,id=netdev1
-device virtio-net-pci,mac=${mac},netdev=netdev1
where ${mac} is unique to each VM and for the VF:
-device pci-assign,host=${pciid}
netserver is running within the VMs, and the netperf commands I am
running are:
netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024
netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_STREAM
where <ip> changes depending on which interface I want to send the
traffic through. To say the least results are a bit disappointing for
the VF:
latency throughput
(usec/Tran) (MB/sec)
Host-VM
over virtio 139.160 1199.40
over VF 488.124 209.22
VM-VM
over virtio 322.056 773.54
over VF 488.051 328.88
I am just getting started with VFs and could use some hints on how to
improve the performance.
Host:
Dell R410
2 quad core E5620@2.40 GHz processors
16 GB RAM
Intel 82576 NIC (Gigabit ET Quad Port)
Fedora 14
kernel: 2.6.35.12-88.fc14.x86_64
qemu-kvm-0.13.0-1.fc14.x86_64
VMs:
Fedora 14
kernel 2.6.35.11-83.fc14.x86_64
2 vcpus
1GB RAM
Thanks,
David
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: performance of virtual functions compared to virtio 2011-04-21 1:57 performance of virtual functions compared to virtio David Ahern @ 2011-04-21 2:35 ` Alex Williamson 2011-04-21 8:07 ` Avi Kivity 2011-04-25 17:39 ` David Ahern 0 siblings, 2 replies; 23+ messages in thread From: Alex Williamson @ 2011-04-21 2:35 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Wed, 2011-04-20 at 19:57 -0600, David Ahern wrote: > In general should virtual functions outperform virtio+vhost for > networking performance - latency and throughput? > > I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF > and the other going through virtio and a tap device like so: > > ------ ---- > | |----------------| VF |--- > | | ---- | > | VM 1 | | > | | ----- | > | |---| tap |--- | > ------ ----- | --- > --- | e | > | b | | t | > | r | | h | > --- | 2 | > ------ ----- | --- > | |---| tap |--- | > | | ----- | > | VM 2 | | > | | ---- | > | |----------------| VF |--- > ------ ---- > > The network arguments to qemu-kvm are: > -netdev type=tap,vhost=on,ifname=tap2,id=netdev1 > -device virtio-net-pci,mac=${mac},netdev=netdev1 > > where ${mac} is unique to each VM and for the VF: > -device pci-assign,host=${pciid} > > netserver is running within the VMs, and the netperf commands I am > running are: > > netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024 > netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_STREAM > > where <ip> changes depending on which interface I want to send the > traffic through. To say the least results are a bit disappointing for > the VF: > > latency throughput > (usec/Tran) (MB/sec) > Host-VM > over virtio 139.160 1199.40 > over VF 488.124 209.22 > > VM-VM > over virtio 322.056 773.54 > over VF 488.051 328.88 > > I am just getting started with VFs and could use some hints on how to > improve the performance. Device assignment via a VF provides the lowest latency and most bandwidth for *getting data off the host system*, though virtio/vhost is getting better. If all you care about is VM-VM on the same host or VM-host, then virtio is only limited by memory bandwidth/latency and host processor cycles. Your processor has 25GB/s of memory bandwidth. On the other hand, the VF has to send data all the way out to the wire and all the way back up through the NIC to get to the other VM/host. You're using a 1Gb/s NIC. Your results actually seem to indicate you're getting better than wire rate, so maybe you're only passing through an internal switch on the NIC, in any case, VFs are not optimal for communication within the same physical system. They are optimal for off host communication. Thanks, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 2:35 ` Alex Williamson @ 2011-04-21 8:07 ` Avi Kivity 2011-04-21 12:31 ` Stefan Hajnoczi 2011-04-25 17:46 ` David Ahern 2011-04-25 17:39 ` David Ahern 1 sibling, 2 replies; 23+ messages in thread From: Avi Kivity @ 2011-04-21 8:07 UTC (permalink / raw) To: Alex Williamson; +Cc: David Ahern, KVM mailing list On 04/21/2011 05:35 AM, Alex Williamson wrote: > Device assignment via a VF provides the lowest latency and most > bandwidth for *getting data off the host system*, though virtio/vhost is > getting better. If all you care about is VM-VM on the same host or > VM-host, then virtio is only limited by memory bandwidth/latency and > host processor cycles. Your processor has 25GB/s of memory bandwidth. > On the other hand, the VF has to send data all the way out to the wire > and all the way back up through the NIC to get to the other VM/host. > You're using a 1Gb/s NIC. Your results actually seem to indicate you're > getting better than wire rate, so maybe you're only passing through an > internal switch on the NIC, in any case, VFs are not optimal for > communication within the same physical system. They are optimal for off > host communication. Thanks, > Note I think in both cases we can make significant improvements: - for VFs, steer device interrupts to the cpus which run the vcpus that will receive the interrupts eventually (ISTR some work about this, but not sure) - for virtio, use a DMA engine to copy data (I think there exists code in upstream which does this, but has this been enabled/tuned?) -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 8:07 ` Avi Kivity @ 2011-04-21 12:31 ` Stefan Hajnoczi 2011-04-21 13:09 ` Avi Kivity 2011-04-25 17:46 ` David Ahern 1 sibling, 1 reply; 23+ messages in thread From: Stefan Hajnoczi @ 2011-04-21 12:31 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, David Ahern, KVM mailing list On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity <avi@redhat.com> wrote: > Note I think in both cases we can make significant improvements: > - for VFs, steer device interrupts to the cpus which run the vcpus that will > receive the interrupts eventually (ISTR some work about this, but not sure) > - for virtio, use a DMA engine to copy data (I think there exists code in > upstream which does this, but has this been enabled/tuned?) Which data copy in virtio? Is this a vhost-net specific thing you're thinking about? Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 12:31 ` Stefan Hajnoczi @ 2011-04-21 13:09 ` Avi Kivity 2011-04-25 17:49 ` David Ahern 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2011-04-21 13:09 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Alex Williamson, David Ahern, KVM mailing list On 04/21/2011 03:31 PM, Stefan Hajnoczi wrote: > On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity<avi@redhat.com> wrote: > > Note I think in both cases we can make significant improvements: > > - for VFs, steer device interrupts to the cpus which run the vcpus that will > > receive the interrupts eventually (ISTR some work about this, but not sure) > > - for virtio, use a DMA engine to copy data (I think there exists code in > > upstream which does this, but has this been enabled/tuned?) > > Which data copy in virtio? Is this a vhost-net specific thing you're > thinking about? There are several copies. qemu's virtio-net implementation incurs a copy on tx and on rx when calling the kernel; in addition there is also an internal copy: /* copy in packet. ugh */ len = iov_from_buf(sg, elem.in_num, buf + offset, size - offset); In principle vhost-net can avoid the tx copy, but I think now we have 1 copy on rx and tx each. If a host interface is dedicated to backing a vhost-net interface (say if you have an SR/IOV card) then you can in principle avoid the rx copy as well. An alternative to avoiding the copies is to use a dma engine, like I mentioned. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 13:09 ` Avi Kivity @ 2011-04-25 17:49 ` David Ahern 2011-04-26 8:19 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-25 17:49 UTC (permalink / raw) To: Avi Kivity; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list On 04/21/11 07:09, Avi Kivity wrote: > On 04/21/2011 03:31 PM, Stefan Hajnoczi wrote: >> On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity<avi@redhat.com> wrote: >> > Note I think in both cases we can make significant improvements: >> > - for VFs, steer device interrupts to the cpus which run the vcpus >> that will >> > receive the interrupts eventually (ISTR some work about this, but >> not sure) >> > - for virtio, use a DMA engine to copy data (I think there exists >> code in >> > upstream which does this, but has this been enabled/tuned?) >> >> Which data copy in virtio? Is this a vhost-net specific thing you're >> thinking about? > > There are several copies. > > qemu's virtio-net implementation incurs a copy on tx and on rx when > calling the kernel; in addition there is also an internal copy: > > /* copy in packet. ugh */ > len = iov_from_buf(sg, elem.in_num, > buf + offset, size - offset); > > In principle vhost-net can avoid the tx copy, but I think now we have 1 > copy on rx and tx each. So there is a copy internal to qemu, then from qemu to the host tap device and then tap device to a physical NIC if the packet is leaving the host? Is that what the zero-copy patch set is attempting - bypassing the transmit copy to the macvtap device? > > If a host interface is dedicated to backing a vhost-net interface (say > if you have an SR/IOV card) then you can in principle avoid the rx copy > as well. > > An alternative to avoiding the copies is to use a dma engine, like I > mentioned. > How does the DMA engine differ from the zero-copy patch set? David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 17:49 ` David Ahern @ 2011-04-26 8:19 ` Avi Kivity 2011-04-27 21:13 ` David Ahern 0 siblings, 1 reply; 23+ messages in thread From: Avi Kivity @ 2011-04-26 8:19 UTC (permalink / raw) To: David Ahern; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list On 04/25/2011 08:49 PM, David Ahern wrote: > > > > There are several copies. > > > > qemu's virtio-net implementation incurs a copy on tx and on rx when > > calling the kernel; in addition there is also an internal copy: > > > > /* copy in packet. ugh */ > > len = iov_from_buf(sg, elem.in_num, > > buf + offset, size - offset); > > > > In principle vhost-net can avoid the tx copy, but I think now we have 1 > > copy on rx and tx each. > > So there is a copy internal to qemu, then from qemu to the host tap > device and then tap device to a physical NIC if the packet is leaving > the host? There is no internal copy on tx, just rx. So: virtio-net: 1 internal rx, 1 kernel/user rx, 1 kernel/user tx vhost-net: 1 internal rx, 1 internal tx > Is that what the zero-copy patch set is attempting - bypassing the > transmit copy to the macvtap device? Yes. > > > > If a host interface is dedicated to backing a vhost-net interface (say > > if you have an SR/IOV card) then you can in principle avoid the rx copy > > as well. > > > > An alternative to avoiding the copies is to use a dma engine, like I > > mentioned. > > > > How does the DMA engine differ from the zero-copy patch set? The DMA engine does not avoid the copy, it merely uses a device other than the cpu to perform it. It offloads the cpu but still loads the interconnect. True zero-copy avoids both the cpu load and the interconnect load. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-26 8:19 ` Avi Kivity @ 2011-04-27 21:13 ` David Ahern 2011-04-28 8:07 ` Avi Kivity 0 siblings, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-27 21:13 UTC (permalink / raw) To: Avi Kivity; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list On 04/26/11 02:19, Avi Kivity wrote: > On 04/25/2011 08:49 PM, David Ahern wrote: >> > >> > There are several copies. >> > >> > qemu's virtio-net implementation incurs a copy on tx and on rx when >> > calling the kernel; in addition there is also an internal copy: >> > >> > /* copy in packet. ugh */ >> > len = iov_from_buf(sg, elem.in_num, >> > buf + offset, size - offset); >> > >> > In principle vhost-net can avoid the tx copy, but I think now we >> have 1 >> > copy on rx and tx each. >> >> So there is a copy internal to qemu, then from qemu to the host tap >> device and then tap device to a physical NIC if the packet is leaving >> the host? > > There is no internal copy on tx, just rx. > > So: > > virtio-net: 1 internal rx, 1 kernel/user rx, 1 kernel/user tx > vhost-net: 1 internal rx, 1 internal tx Is the following depict where copies are done for virtio-net? Packet Sends: .==========================================. | Host | | | | .-------------------------------. | | | qemu-kvm process | | | | | | | | .-------------------------. | | | | | Guest OS | | | | | | --------- | | | | | | ( netperf ) | | | | | | --------- | | | | | | user | | | | | | |-------------------------| | | | | | kernel | | | | | | | .-----------. | | | | | | | TCP stack | copy data from uspace to VM-based skb | | | '-----------' | | | | | | | | | | | | | .--------. | | | | | | | virtio | passes skb pointers to virtio device | | | | (eth0) | | | | | | '---------'--------'------' | | | | | | | | | .------------. | | | | | virtio-net | convert buffer addresses from | | | device | guest virtual to process (qemu)? | | '------------' | | | | | | | | '-------------------------------' | | | | | userspace | | |------------------------------------------| | kernel | | | .------. | | | tap0 | data copied from userspace | '------' to host kernel skbs | | | | .------. | | | br | | | '------' | | | | | .------. | | | eth0 | skbs sent to device for xmit '==========================================' Packet Receives .==========================================. | Host | | | | .-------------------------------. | | | qemu-kvm process | | | | | | | | .-------------------------. | | | | | Guest OS | | | | | | --------- | | | | | | ( netperf ) | | | | | | --------- | | | | | | user | | | | | | |-------------------------| | | | | | kernel | data copied from skb to userspace buf | | | .-----------. | | | | | | | TCP stack | skb attached to socket | | | '-----------' | | | | | | | | | | | | | .--------. | | | | | | | virtio | put skb onto net queue | | | | (eth0) | | | | | | '---------'--------'------' | | | | | copy here into devices' mapped skb? | | | this is the extra "internal" copy? | | .------------. | | | | | virtio-net | data copied from host | | | device | kernel to qemu process | | '------------' | | | | | | | | '-------------------------------' | | | | | userspace | | |------------------------------------------| | kernel | | | .------. | | | tap0 | skbs attached to tap device | '------' | | | | | .------. | | | br | | | '------' | | | | | .------. | | | eth0 | device writes data into mapped skbs '==========================================' David > >> Is that what the zero-copy patch set is attempting - bypassing the >> transmit copy to the macvtap device? > > Yes. > >> > >> > If a host interface is dedicated to backing a vhost-net interface (say >> > if you have an SR/IOV card) then you can in principle avoid the rx >> copy >> > as well. >> > >> > An alternative to avoiding the copies is to use a dma engine, like I >> > mentioned. >> > >> >> How does the DMA engine differ from the zero-copy patch set? > > The DMA engine does not avoid the copy, it merely uses a device other > than the cpu to perform it. It offloads the cpu but still loads the > interconnect. True zero-copy avoids both the cpu load and the > interconnect load. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-27 21:13 ` David Ahern @ 2011-04-28 8:07 ` Avi Kivity 0 siblings, 0 replies; 23+ messages in thread From: Avi Kivity @ 2011-04-28 8:07 UTC (permalink / raw) To: David Ahern; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list On 04/28/2011 12:13 AM, David Ahern wrote: > Is the following depict where copies are done for virtio-net? > Yes. You should have been an artist. <snip still life of a networked guest> -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 8:07 ` Avi Kivity 2011-04-21 12:31 ` Stefan Hajnoczi @ 2011-04-25 17:46 ` David Ahern 2011-04-26 8:20 ` Avi Kivity 1 sibling, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-25 17:46 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, KVM mailing list On 04/21/11 02:07, Avi Kivity wrote: > On 04/21/2011 05:35 AM, Alex Williamson wrote: >> Device assignment via a VF provides the lowest latency and most >> bandwidth for *getting data off the host system*, though virtio/vhost is >> getting better. If all you care about is VM-VM on the same host or >> VM-host, then virtio is only limited by memory bandwidth/latency and >> host processor cycles. Your processor has 25GB/s of memory bandwidth. >> On the other hand, the VF has to send data all the way out to the wire >> and all the way back up through the NIC to get to the other VM/host. >> You're using a 1Gb/s NIC. Your results actually seem to indicate you're >> getting better than wire rate, so maybe you're only passing through an >> internal switch on the NIC, in any case, VFs are not optimal for >> communication within the same physical system. They are optimal for off >> host communication. Thanks, >> > > Note I think in both cases we can make significant improvements: > - for VFs, steer device interrupts to the cpus which run the vcpus that > will receive the interrupts eventually (ISTR some work about this, but > not sure) I don't understand your point here. I thought interrupts for the VF were only delivered to the guest, not the host. David > - for virtio, use a DMA engine to copy data (I think there exists code > in upstream which does this, but has this been enabled/tuned?) > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 17:46 ` David Ahern @ 2011-04-26 8:20 ` Avi Kivity 0 siblings, 0 replies; 23+ messages in thread From: Avi Kivity @ 2011-04-26 8:20 UTC (permalink / raw) To: David Ahern; +Cc: Alex Williamson, KVM mailing list On 04/25/2011 08:46 PM, David Ahern wrote: > > > > Note I think in both cases we can make significant improvements: > > - for VFs, steer device interrupts to the cpus which run the vcpus that > > will receive the interrupts eventually (ISTR some work about this, but > > not sure) > > I don't understand your point here. I thought interrupts for the VF were > only delivered to the guest, not the host. > Interrupts are delivered to the host, which the forwards them to the guest. Virtualization hardware on x86 does not allow direct-to-guest interrupt delivery. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-21 2:35 ` Alex Williamson 2011-04-21 8:07 ` Avi Kivity @ 2011-04-25 17:39 ` David Ahern 2011-04-25 18:13 ` Alex Williamson 1 sibling, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-25 17:39 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/20/11 20:35, Alex Williamson wrote: > Device assignment via a VF provides the lowest latency and most > bandwidth for *getting data off the host system*, though virtio/vhost is > getting better. If all you care about is VM-VM on the same host or > VM-host, then virtio is only limited by memory bandwidth/latency and > host processor cycles. Your processor has 25GB/s of memory bandwidth. > On the other hand, the VF has to send data all the way out to the wire > and all the way back up through the NIC to get to the other VM/host. > You're using a 1Gb/s NIC. Your results actually seem to indicate you're > getting better than wire rate, so maybe you're only passing through an > internal switch on the NIC, in any case, VFs are not optimal for > communication within the same physical system. They are optimal for off > host communication. Thanks, Hi Alex: Host-host was the next focus for the tests. I have 2 of the aforementioned servers, each configured identically. As a reminder: Host: Dell R410 2 quad core E5620@2.40 GHz processors 16 GB RAM Intel 82576 NIC (Gigabit ET Quad Port) - devices eth2, eth3, eth4, eth5 Fedora 14 kernel: 2.6.35.12-88.fc14.x86_64 qemu-kvm.git, ffce28fe6 (18-April-11) VMs: Fedora 14 kernel 2.6.35.11-83.fc14.x86_64 2 vcpus 1GB RAM 2 NICs - 1 virtio, 1 VF The virtio network arguments to qemu-kvm are: -netdev type=tap,vhost=on,ifname=tap0,id=netdev1 -device virtio-net-pci,mac=${mac},netdev=netdev1 For this round of tests I have the following setup: .======================================. | Host - A | | | | .-------------------------. | | | Virtual Machine - C | | | | | | | | .------. .------. | | | '--| eth1 |-----| eth0 |--' | | '------' '------' | | 192.168. | | 192.168.103.71 | 102.71 | .------. | | | | tap0 | | | | '------' | | | | | | | .------. | | | | br | 192.168.103.79 | | '------' | | {VF} | | | .--------. .------. | '=======| eth2 |======| eth3 |=======' '--------' '------' 192.168.102.79 | | | point-to- | | point | | connections | 192.168.102.80 | | .--------. .------. .=======| eth2 |======| eth3 |=======. | '--------' '------' | | {VF} | | | | .------. | | | | br | 192.168.103.80 | | '------' | | | | | | | .------. | | | | tap0 | | | 192.168. | '------' | | 102.81 | | 192.168.103.81 | .------. .------. | | .--| eth1 |-----| eth0 |--. | | | '------' '------' | | | | | | | | Virtual Machine - D | | | '-------------------------' | | | | Host - B | '======================================' So, basically, 192.168.102 is the network where the VMs have a VF, and 192.168.103 is the network where the VMs use virtio for networking. The netperf commands are all run on either Host-A or VM-C: netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R latency throughput (usec) Mbps cross-host: A-B, eth2 185 932 A-B, eth3 185 935 same host, host-VM: A-C, using VF 488 1085 (seen as high as 1280's) A-C, virtio 150 4282 cross-host, host-VM: A-D, VF 489 938 A-D, virtio 288 889 cross-host, VM-VM: C-D, VF 488 934 C-D, virtio 490 933 While throughput for VFs is fine (near line-rate when crossing hosts), the latency is horrible. Any options to improve that? David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 17:39 ` David Ahern @ 2011-04-25 18:13 ` Alex Williamson 2011-04-25 19:07 ` David Ahern 0 siblings, 1 reply; 23+ messages in thread From: Alex Williamson @ 2011-04-25 18:13 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Mon, 2011-04-25 at 11:39 -0600, David Ahern wrote: > On 04/20/11 20:35, Alex Williamson wrote: > > Device assignment via a VF provides the lowest latency and most > > bandwidth for *getting data off the host system*, though virtio/vhost is > > getting better. If all you care about is VM-VM on the same host or > > VM-host, then virtio is only limited by memory bandwidth/latency and > > host processor cycles. Your processor has 25GB/s of memory bandwidth. > > On the other hand, the VF has to send data all the way out to the wire > > and all the way back up through the NIC to get to the other VM/host. > > You're using a 1Gb/s NIC. Your results actually seem to indicate you're > > getting better than wire rate, so maybe you're only passing through an > > internal switch on the NIC, in any case, VFs are not optimal for > > communication within the same physical system. They are optimal for off > > host communication. Thanks, > > Hi Alex: > > Host-host was the next focus for the tests. I have 2 of the > aforementioned servers, each configured identically. As a reminder: > > Host: > Dell R410 > 2 quad core E5620@2.40 GHz processors > 16 GB RAM > Intel 82576 NIC (Gigabit ET Quad Port) > - devices eth2, eth3, eth4, eth5 > Fedora 14 > kernel: 2.6.35.12-88.fc14.x86_64 > qemu-kvm.git, ffce28fe6 (18-April-11) > > VMs: > Fedora 14 > kernel 2.6.35.11-83.fc14.x86_64 > 2 vcpus > 1GB RAM > 2 NICs - 1 virtio, 1 VF > > The virtio network arguments to qemu-kvm are: > -netdev type=tap,vhost=on,ifname=tap0,id=netdev1 > -device virtio-net-pci,mac=${mac},netdev=netdev1 > > > For this round of tests I have the following setup: > > .======================================. > | Host - A | > | | > | .-------------------------. | > | | Virtual Machine - C | | > | | | | > | | .------. .------. | | > | '--| eth1 |-----| eth0 |--' | > | '------' '------' | > | 192.168. | | 192.168.103.71 > | 102.71 | .------. | > | | | tap0 | | > | | '------' | > | | | | > | | .------. | > | | | br | 192.168.103.79 > | | '------' | > | {VF} | | > | .--------. .------. | > '=======| eth2 |======| eth3 |=======' > '--------' '------' > 192.168.102.79 | | > | point-to- | > | point | > | connections | > 192.168.102.80 | | > .--------. .------. > .=======| eth2 |======| eth3 |=======. > | '--------' '------' | > | {VF} | | > | | .------. | > | | | br | 192.168.103.80 > | | '------' | > | | | | > | | .------. | > | | | tap0 | | > | 192.168. | '------' | > | 102.81 | | 192.168.103.81 > | .------. .------. | > | .--| eth1 |-----| eth0 |--. | > | | '------' '------' | | > | | | | > | | Virtual Machine - D | | > | '-------------------------' | > | | > | Host - B | > '======================================' > > > So, basically, 192.168.102 is the network where the VMs have a VF, and > 192.168.103 is the network where the VMs use virtio for networking. > > The netperf commands are all run on either Host-A or VM-C: > > netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R > netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R > > > latency throughput > (usec) Mbps > cross-host: > A-B, eth2 185 932 > A-B, eth3 185 935 This is actually PF-PF, right? It would be interesting to load igbvf on the hosts and determine VF-VF latency as well. > same host, host-VM: > A-C, using VF 488 1085 (seen as high as 1280's) > A-C, virtio 150 4282 We know virtio has a shorter path for this test. > cross-host, host-VM: > A-D, VF 489 938 > A-D, virtio 288 889 > > cross-host, VM-VM: > C-D, VF 488 934 > C-D, virtio 490 933 > > > While throughput for VFs is fine (near line-rate when crossing hosts), FWIW, it's not too difficult to get line rate on a 1Gbps network, even some of the emulated NICs can do it. There will be a difference in host CPU power to get it though, where it should theoretically be emulated > virtio > pci-assign. > the latency is horrible. Any options to improve that? If you don't mind testing, I'd like to see VF-VF between the hosts (to do this, don't assign eth2 an IP, just make sure it's up, then load the igbvf driver on the host and assign an IP to one of the VFs associated with the eth2 PF), and cross host testing using the PF for the guest instead of the VF. This should help narrow down how much of the latency is due to using the VF vs the PF, since all of the virtio tests are using the PF. I've been suspicious that the VF adds some latency, but haven't had a good test setup (or time) to dig very deep into it. Thanks, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 18:13 ` Alex Williamson @ 2011-04-25 19:07 ` David Ahern 2011-04-25 19:29 ` Alex Williamson 2011-05-02 18:58 ` David Ahern 0 siblings, 2 replies; 23+ messages in thread From: David Ahern @ 2011-04-25 19:07 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/25/11 12:13, Alex Williamson wrote: >> So, basically, 192.168.102 is the network where the VMs have a VF, and >> 192.168.103 is the network where the VMs use virtio for networking. >> >> The netperf commands are all run on either Host-A or VM-C: >> >> netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R >> netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R >> >> >> latency throughput >> (usec) Mbps >> cross-host: >> A-B, eth2 185 932 >> A-B, eth3 185 935 > > This is actually PF-PF, right? It would be interesting to load igbvf on > the hosts and determine VF-VF latency as well. yes, PF-PF. eth3 has the added bridge layer, but from what I can see the overhead is noise. I added host-to-host to put the host-to-VM numbers in perspective. > >> same host, host-VM: >> A-C, using VF 488 1085 (seen as high as 1280's) >> A-C, virtio 150 4282 > > We know virtio has a shorter path for this test. No complaints about the throughput numbers; the latency is the problem. > >> cross-host, host-VM: >> A-D, VF 489 938 >> A-D, virtio 288 889 >> >> cross-host, VM-VM: >> C-D, VF 488 934 >> C-D, virtio 490 933 >> >> >> While throughput for VFs is fine (near line-rate when crossing hosts), > > FWIW, it's not too difficult to get line rate on a 1Gbps network, even > some of the emulated NICs can do it. There will be a difference in host > CPU power to get it though, where it should theoretically be emulated > > virtio > pci-assign. 10GB is the goal; 1GB offers a cheaper learning environment. ;-) > >> the latency is horrible. Any options to improve that? > > If you don't mind testing, I'd like to see VF-VF between the hosts (to > do this, don't assign eth2 an IP, just make sure it's up, then load the > igbvf driver on the host and assign an IP to one of the VFs associated > with the eth2 PF), and cross host testing using the PF for the guest > instead of the VF. This should help narrow down how much of the latency > is due to using the VF vs the PF, since all of the virtio tests are > using the PF. I've been suspicious that the VF adds some latency, but > haven't had a good test setup (or time) to dig very deep into it. It's a quad nic, so I left eth2 and eth3 alone and added the VF-VF test using VFs on eth4. Indeed latency is 488 usec and throughput is 925 Mbps. This is host-to-host using VFs. David > Thanks, > > Alex > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 19:07 ` David Ahern @ 2011-04-25 19:29 ` Alex Williamson 2011-04-25 19:49 ` David Ahern 2011-05-02 18:58 ` David Ahern 1 sibling, 1 reply; 23+ messages in thread From: Alex Williamson @ 2011-04-25 19:29 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Mon, 2011-04-25 at 13:07 -0600, David Ahern wrote: > On 04/25/11 12:13, Alex Williamson wrote: > >> So, basically, 192.168.102 is the network where the VMs have a VF, and > >> 192.168.103 is the network where the VMs use virtio for networking. > >> > >> The netperf commands are all run on either Host-A or VM-C: > >> > >> netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R > >> netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R > >> > >> > >> latency throughput > >> (usec) Mbps > >> cross-host: > >> A-B, eth2 185 932 > >> A-B, eth3 185 935 > > > > This is actually PF-PF, right? It would be interesting to load igbvf on > > the hosts and determine VF-VF latency as well. > > yes, PF-PF. eth3 has the added bridge layer, but from what I can see the > overhead is noise. I added host-to-host to put the host-to-VM numbers in > perspective. > > > > >> same host, host-VM: > >> A-C, using VF 488 1085 (seen as high as 1280's) > >> A-C, virtio 150 4282 > > > > We know virtio has a shorter path for this test. > > No complaints about the throughput numbers; the latency is the problem. > > > > >> cross-host, host-VM: > >> A-D, VF 489 938 > >> A-D, virtio 288 889 > >> > >> cross-host, VM-VM: > >> C-D, VF 488 934 > >> C-D, virtio 490 933 > >> > >> > >> While throughput for VFs is fine (near line-rate when crossing hosts), > > > > FWIW, it's not too difficult to get line rate on a 1Gbps network, even > > some of the emulated NICs can do it. There will be a difference in host > > CPU power to get it though, where it should theoretically be emulated > > > virtio > pci-assign. > > 10GB is the goal; 1GB offers a cheaper learning environment. ;-) > > > > >> the latency is horrible. Any options to improve that? > > > > If you don't mind testing, I'd like to see VF-VF between the hosts (to > > do this, don't assign eth2 an IP, just make sure it's up, then load the > > igbvf driver on the host and assign an IP to one of the VFs associated > > with the eth2 PF), and cross host testing using the PF for the guest > > instead of the VF. This should help narrow down how much of the latency > > is due to using the VF vs the PF, since all of the virtio tests are > > using the PF. I've been suspicious that the VF adds some latency, but > > haven't had a good test setup (or time) to dig very deep into it. > > It's a quad nic, so I left eth2 and eth3 alone and added the VF-VF test > using VFs on eth4. > > Indeed latency is 488 usec and throughput is 925 Mbps. This is > host-to-host using VFs. So we're effectively getting host-host latency/throughput for the VF, it's just that in the 82576 implementation of SR-IOV, the VF takes a latency hit that puts it pretty close to virtio. Unfortunate. I think you'll find that passing the PF to the guests should be pretty close to that 185us latency. I would assume (hope) the higher end NICs reduce this, but it seems to be a hardware limitation, so it's hard to predict. Thanks, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 19:29 ` Alex Williamson @ 2011-04-25 19:49 ` David Ahern 2011-04-25 20:27 ` Alex Williamson 2011-04-25 20:49 ` Andrew Theurer 0 siblings, 2 replies; 23+ messages in thread From: David Ahern @ 2011-04-25 19:49 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/25/11 13:29, Alex Williamson wrote: > So we're effectively getting host-host latency/throughput for the VF, > it's just that in the 82576 implementation of SR-IOV, the VF takes a > latency hit that puts it pretty close to virtio. Unfortunate. I think For host-to-VM using VFs is worse than virtio which is counterintuitive. > you'll find that passing the PF to the guests should be pretty close to > that 185us latency. I would assume (hope) the higher end NICs reduce About that 185usec: do you know where the bottleneck is? It seems as if the packet is held in some queue waiting for an event/timeout before it is transmitted. David > this, but it seems to be a hardware limitation, so it's hard to predict. > Thanks, > > Alex > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 19:49 ` David Ahern @ 2011-04-25 20:27 ` Alex Williamson 2011-04-25 20:40 ` David Ahern 2011-04-25 20:49 ` Andrew Theurer 1 sibling, 1 reply; 23+ messages in thread From: Alex Williamson @ 2011-04-25 20:27 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: > > On 04/25/11 13:29, Alex Williamson wrote: > > So we're effectively getting host-host latency/throughput for the VF, > > it's just that in the 82576 implementation of SR-IOV, the VF takes a > > latency hit that puts it pretty close to virtio. Unfortunate. I think > > For host-to-VM using VFs is worse than virtio which is counterintuitive. On the same host, just think about the data path of one versus the other. On the guest side, there's virtio vs a physical NIC. virtio is designed to be virtualization friendly, so hopefully has less context switches in setting up and processing transactions. Once the packet leaves the assigned physical NIC, it has to come back up the entire host I/O stack, while the virtio device is connected to an internal bridge and bypasses all but the upper level network routing. > > you'll find that passing the PF to the guests should be pretty close to > > that 185us latency. I would assume (hope) the higher end NICs reduce > > About that 185usec: do you know where the bottleneck is? It seems as if > the packet is held in some queue waiting for an event/timeout before it > is transmitted. I don't know specifically, I don't do much network performance tuning. Interrupt coalescing could be a factor, along with various offload settings, and of course latency of the physical NIC hardware and interconnects. Thanks, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 20:27 ` Alex Williamson @ 2011-04-25 20:40 ` David Ahern 2011-04-25 21:02 ` Alex Williamson 0 siblings, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-25 20:40 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/25/11 14:27, Alex Williamson wrote: > On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: >> >> On 04/25/11 13:29, Alex Williamson wrote: >>> So we're effectively getting host-host latency/throughput for the VF, >>> it's just that in the 82576 implementation of SR-IOV, the VF takes a >>> latency hit that puts it pretty close to virtio. Unfortunate. I think >> >> For host-to-VM using VFs is worse than virtio which is counterintuitive. > > On the same host, just think about the data path of one versus the > other. On the guest side, there's virtio vs a physical NIC. virtio is > designed to be virtualization friendly, so hopefully has less context > switches in setting up and processing transactions. Once the packet > leaves the assigned physical NIC, it has to come back up the entire host > I/O stack, while the virtio device is connected to an internal bridge > and bypasses all but the upper level network routing. I get the virtio path, but you lost me on the physical NIC. I thought the point of VFs is to bypass the host from having to touch the packet, so the processing path with a VM using a VF would be the same as a non-VM. David > >>> you'll find that passing the PF to the guests should be pretty close to >>> that 185us latency. I would assume (hope) the higher end NICs reduce >> >> About that 185usec: do you know where the bottleneck is? It seems as if >> the packet is held in some queue waiting for an event/timeout before it >> is transmitted. > > I don't know specifically, I don't do much network performance tuning. > Interrupt coalescing could be a factor, along with various offload > settings, and of course latency of the physical NIC hardware and > interconnects. Thanks, > > Alex > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 20:40 ` David Ahern @ 2011-04-25 21:02 ` Alex Williamson 2011-04-25 21:14 ` David Ahern 0 siblings, 1 reply; 23+ messages in thread From: Alex Williamson @ 2011-04-25 21:02 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote: > > On 04/25/11 14:27, Alex Williamson wrote: > > On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: > >> > >> On 04/25/11 13:29, Alex Williamson wrote: > >>> So we're effectively getting host-host latency/throughput for the VF, > >>> it's just that in the 82576 implementation of SR-IOV, the VF takes a > >>> latency hit that puts it pretty close to virtio. Unfortunate. I think > >> > >> For host-to-VM using VFs is worse than virtio which is counterintuitive. > > > > On the same host, just think about the data path of one versus the > > other. On the guest side, there's virtio vs a physical NIC. virtio is > > designed to be virtualization friendly, so hopefully has less context > > switches in setting up and processing transactions. Once the packet > > leaves the assigned physical NIC, it has to come back up the entire host > > I/O stack, while the virtio device is connected to an internal bridge > > and bypasses all but the upper level network routing. > > I get the virtio path, but you lost me on the physical NIC. I thought > the point of VFs is to bypass the host from having to touch the packet, > so the processing path with a VM using a VF would be the same as a non-VM. In the VF case, the host is only involved in processing the packet on it's end of the connection, but the packet still has to go all the way out to the physical device and all the way back. Handled on one end by the VM and the other end by the host. An analogy might be sending a letter to an office coworker in a neighboring cube. You could just pass the letter over the wall (virtio) or you could go put it in the mailbox, signal the mail carrier, who comes and moves it to your neighbor's mailbox, who then gets signaled that they have a letter (device assignment). Since the networks stacks are completely separate from one another, there's very little difference in data path whether you're talking to the host, a remote system, or a remote VM, which is reflected in your performance data. Hope that helps, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 21:02 ` Alex Williamson @ 2011-04-25 21:14 ` David Ahern 2011-04-25 21:18 ` Alex Williamson 0 siblings, 1 reply; 23+ messages in thread From: David Ahern @ 2011-04-25 21:14 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/25/11 15:02, Alex Williamson wrote: > On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote: >> >> On 04/25/11 14:27, Alex Williamson wrote: >>> On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: >>>> >>>> On 04/25/11 13:29, Alex Williamson wrote: >>>>> So we're effectively getting host-host latency/throughput for the VF, >>>>> it's just that in the 82576 implementation of SR-IOV, the VF takes a >>>>> latency hit that puts it pretty close to virtio. Unfortunate. I think >>>> >>>> For host-to-VM using VFs is worse than virtio which is counterintuitive. >>> >>> On the same host, just think about the data path of one versus the >>> other. On the guest side, there's virtio vs a physical NIC. virtio is >>> designed to be virtualization friendly, so hopefully has less context >>> switches in setting up and processing transactions. Once the packet >>> leaves the assigned physical NIC, it has to come back up the entire host >>> I/O stack, while the virtio device is connected to an internal bridge >>> and bypasses all but the upper level network routing. >> >> I get the virtio path, but you lost me on the physical NIC. I thought >> the point of VFs is to bypass the host from having to touch the packet, >> so the processing path with a VM using a VF would be the same as a non-VM. > > In the VF case, the host is only involved in processing the packet on > it's end of the connection, but the packet still has to go all the way > out to the physical device and all the way back. Handled on one end by > the VM and the other end by the host. > > An analogy might be sending a letter to an office coworker in a > neighboring cube. You could just pass the letter over the wall (virtio) > or you could go put it in the mailbox, signal the mail carrier, who > comes and moves it to your neighbor's mailbox, who then gets signaled > that they have a letter (device assignment). > > Since the networks stacks are completely separate from one another, > there's very little difference in data path whether you're talking to > the host, a remote system, or a remote VM, which is reflected in your > performance data. Hope that helps, Got you. I was thinking host-VM as VM on separate host; I didn't make that clear. Thanks for clarifying - I like the letter example. David > > Alex > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 21:14 ` David Ahern @ 2011-04-25 21:18 ` Alex Williamson 0 siblings, 0 replies; 23+ messages in thread From: Alex Williamson @ 2011-04-25 21:18 UTC (permalink / raw) To: David Ahern; +Cc: KVM mailing list On Mon, 2011-04-25 at 15:14 -0600, David Ahern wrote: > > On 04/25/11 15:02, Alex Williamson wrote: > > On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote: > >> > >> On 04/25/11 14:27, Alex Williamson wrote: > >>> On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: > >>>> > >>>> On 04/25/11 13:29, Alex Williamson wrote: > >>>>> So we're effectively getting host-host latency/throughput for the VF, > >>>>> it's just that in the 82576 implementation of SR-IOV, the VF takes a > >>>>> latency hit that puts it pretty close to virtio. Unfortunate. I think > >>>> > >>>> For host-to-VM using VFs is worse than virtio which is counterintuitive. > >>> > >>> On the same host, just think about the data path of one versus the > >>> other. On the guest side, there's virtio vs a physical NIC. virtio is > >>> designed to be virtualization friendly, so hopefully has less context > >>> switches in setting up and processing transactions. Once the packet > >>> leaves the assigned physical NIC, it has to come back up the entire host > >>> I/O stack, while the virtio device is connected to an internal bridge > >>> and bypasses all but the upper level network routing. > >> > >> I get the virtio path, but you lost me on the physical NIC. I thought > >> the point of VFs is to bypass the host from having to touch the packet, > >> so the processing path with a VM using a VF would be the same as a non-VM. > > > > In the VF case, the host is only involved in processing the packet on > > it's end of the connection, but the packet still has to go all the way > > out to the physical device and all the way back. Handled on one end by > > the VM and the other end by the host. > > > > An analogy might be sending a letter to an office coworker in a > > neighboring cube. You could just pass the letter over the wall (virtio) > > or you could go put it in the mailbox, signal the mail carrier, who > > comes and moves it to your neighbor's mailbox, who then gets signaled > > that they have a letter (device assignment). > > > > Since the networks stacks are completely separate from one another, > > there's very little difference in data path whether you're talking to > > the host, a remote system, or a remote VM, which is reflected in your > > performance data. Hope that helps, > > Got you. I was thinking host-VM as VM on separate host; I didn't make > that clear. Thanks for clarifying - I like the letter example. I should probably also note that being able to "pass a letter over the wall" is possible because of the bridge/tap setup used for that communication path, so it's available to emulated NICs as well. virtio is just a paravirtualization layer that makes it lower overhead than emulation. To get a letter "out of the office" (ie. off host), all paths still eventually need to put the letter in the mailbox. Thanks, Alex ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 19:49 ` David Ahern 2011-04-25 20:27 ` Alex Williamson @ 2011-04-25 20:49 ` Andrew Theurer 1 sibling, 0 replies; 23+ messages in thread From: Andrew Theurer @ 2011-04-25 20:49 UTC (permalink / raw) To: David Ahern; +Cc: Alex Williamson, KVM mailing list On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote: > > On 04/25/11 13:29, Alex Williamson wrote: > > So we're effectively getting host-host latency/throughput for the VF, > > it's just that in the 82576 implementation of SR-IOV, the VF takes a > > latency hit that puts it pretty close to virtio. Unfortunate. I think > > For host-to-VM using VFs is worse than virtio which is counterintuitive. > > > you'll find that passing the PF to the guests should be pretty close to > > that 185us latency. I would assume (hope) the higher end NICs reduce > > About that 185usec: do you know where the bottleneck is? It seems as if > the packet is held in some queue waiting for an event/timeout before it > is transmitted. you might want to check the VF driver. I know versions of the ixgbevf driver have a throttled interrupt option which will increase latency with some settings. I don't remember if the igbvf driver has the same feature. If it does, you will want to turn this option off for best latency. > > David > > > > this, but it seems to be a hardware limitation, so it's hard to predict. > > Thanks, > > > > Alex -Andrew ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio 2011-04-25 19:07 ` David Ahern 2011-04-25 19:29 ` Alex Williamson @ 2011-05-02 18:58 ` David Ahern 1 sibling, 0 replies; 23+ messages in thread From: David Ahern @ 2011-05-02 18:58 UTC (permalink / raw) To: Alex Williamson; +Cc: KVM mailing list On 04/25/11 13:07, David Ahern wrote: >>> same host, host-VM: >>> A-C, using VF 488 1085 (seen as high as 1280's) >>> A-C, virtio 150 4282 >> >> We know virtio has a shorter path for this test. > > No complaints about the throughput numbers; the latency is the problem. rx-usecs is the magical parameter. It defaults to 3 for both the igb and igbvf drivers which is the 'magic' performance number -- i.e., the drivers dynamically adapt to the packet rate. Setting it to 10 in the *VM only* (lowest limit controlled by IGBVF_MIN_ITR_USECS) dramatically lowers latency with little-to-no impact to throughput (ie., mostly within the +-10% variation I see between netperf runs with system defaults everywhere). Latency in usecs: default rx-usec=10 host-host 97 105 same host, host-VM 488 158 cross host, host-VM 488 181 cross host, VM-VM 488 255 Changing the default in the host for the physical function kills throughput with no impact to latency. I'd still like to know why 100 usec is the baseline for even host-to-host packets. David ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-05-02 18:59 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 1:57 performance of virtual functions compared to virtio David Ahern 2011-04-21 2:35 ` Alex Williamson 2011-04-21 8:07 ` Avi Kivity 2011-04-21 12:31 ` Stefan Hajnoczi 2011-04-21 13:09 ` Avi Kivity 2011-04-25 17:49 ` David Ahern 2011-04-26 8:19 ` Avi Kivity 2011-04-27 21:13 ` David Ahern 2011-04-28 8:07 ` Avi Kivity 2011-04-25 17:46 ` David Ahern 2011-04-26 8:20 ` Avi Kivity 2011-04-25 17:39 ` David Ahern 2011-04-25 18:13 ` Alex Williamson 2011-04-25 19:07 ` David Ahern 2011-04-25 19:29 ` Alex Williamson 2011-04-25 19:49 ` David Ahern 2011-04-25 20:27 ` Alex Williamson 2011-04-25 20:40 ` David Ahern 2011-04-25 21:02 ` Alex Williamson 2011-04-25 21:14 ` David Ahern 2011-04-25 21:18 ` Alex Williamson 2011-04-25 20:49 ` Andrew Theurer 2011-05-02 18:58 ` David Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).