* performance of virtual functions compared to virtio
@ 2011-04-21 1:57 David Ahern
2011-04-21 2:35 ` Alex Williamson
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-21 1:57 UTC (permalink / raw)
To: KVM mailing list
In general should virtual functions outperform virtio+vhost for
networking performance - latency and throughput?
I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF
and the other going through virtio and a tap device like so:
------ ----
| |----------------| VF |---
| | ---- |
| VM 1 | |
| | ----- |
| |---| tap |--- |
------ ----- | ---
--- | e |
| b | | t |
| r | | h |
--- | 2 |
------ ----- | ---
| |---| tap |--- |
| | ----- |
| VM 2 | |
| | ---- |
| |----------------| VF |---
------ ----
The network arguments to qemu-kvm are:
-netdev type=tap,vhost=on,ifname=tap2,id=netdev1
-device virtio-net-pci,mac=${mac},netdev=netdev1
where ${mac} is unique to each VM and for the VF:
-device pci-assign,host=${pciid}
netserver is running within the VMs, and the netperf commands I am
running are:
netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024
netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_STREAM
where <ip> changes depending on which interface I want to send the
traffic through. To say the least results are a bit disappointing for
the VF:
latency throughput
(usec/Tran) (MB/sec)
Host-VM
over virtio 139.160 1199.40
over VF 488.124 209.22
VM-VM
over virtio 322.056 773.54
over VF 488.051 328.88
I am just getting started with VFs and could use some hints on how to
improve the performance.
Host:
Dell R410
2 quad core E5620@2.40 GHz processors
16 GB RAM
Intel 82576 NIC (Gigabit ET Quad Port)
Fedora 14
kernel: 2.6.35.12-88.fc14.x86_64
qemu-kvm-0.13.0-1.fc14.x86_64
VMs:
Fedora 14
kernel 2.6.35.11-83.fc14.x86_64
2 vcpus
1GB RAM
Thanks,
David
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 1:57 performance of virtual functions compared to virtio David Ahern
@ 2011-04-21 2:35 ` Alex Williamson
2011-04-21 8:07 ` Avi Kivity
2011-04-25 17:39 ` David Ahern
0 siblings, 2 replies; 23+ messages in thread
From: Alex Williamson @ 2011-04-21 2:35 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Wed, 2011-04-20 at 19:57 -0600, David Ahern wrote:
> In general should virtual functions outperform virtio+vhost for
> networking performance - latency and throughput?
>
> I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF
> and the other going through virtio and a tap device like so:
>
> ------ ----
> | |----------------| VF |---
> | | ---- |
> | VM 1 | |
> | | ----- |
> | |---| tap |--- |
> ------ ----- | ---
> --- | e |
> | b | | t |
> | r | | h |
> --- | 2 |
> ------ ----- | ---
> | |---| tap |--- |
> | | ----- |
> | VM 2 | |
> | | ---- |
> | |----------------| VF |---
> ------ ----
>
> The network arguments to qemu-kvm are:
> -netdev type=tap,vhost=on,ifname=tap2,id=netdev1
> -device virtio-net-pci,mac=${mac},netdev=netdev1
>
> where ${mac} is unique to each VM and for the VF:
> -device pci-assign,host=${pciid}
>
> netserver is running within the VMs, and the netperf commands I am
> running are:
>
> netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024
> netperf -p 12346 -H <ip> -l 20 -jcC -fM -v 2 -t TCP_STREAM
>
> where <ip> changes depending on which interface I want to send the
> traffic through. To say the least results are a bit disappointing for
> the VF:
>
> latency throughput
> (usec/Tran) (MB/sec)
> Host-VM
> over virtio 139.160 1199.40
> over VF 488.124 209.22
>
> VM-VM
> over virtio 322.056 773.54
> over VF 488.051 328.88
>
> I am just getting started with VFs and could use some hints on how to
> improve the performance.
Device assignment via a VF provides the lowest latency and most
bandwidth for *getting data off the host system*, though virtio/vhost is
getting better. If all you care about is VM-VM on the same host or
VM-host, then virtio is only limited by memory bandwidth/latency and
host processor cycles. Your processor has 25GB/s of memory bandwidth.
On the other hand, the VF has to send data all the way out to the wire
and all the way back up through the NIC to get to the other VM/host.
You're using a 1Gb/s NIC. Your results actually seem to indicate you're
getting better than wire rate, so maybe you're only passing through an
internal switch on the NIC, in any case, VFs are not optimal for
communication within the same physical system. They are optimal for off
host communication. Thanks,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 2:35 ` Alex Williamson
@ 2011-04-21 8:07 ` Avi Kivity
2011-04-21 12:31 ` Stefan Hajnoczi
2011-04-25 17:46 ` David Ahern
2011-04-25 17:39 ` David Ahern
1 sibling, 2 replies; 23+ messages in thread
From: Avi Kivity @ 2011-04-21 8:07 UTC (permalink / raw)
To: Alex Williamson; +Cc: David Ahern, KVM mailing list
On 04/21/2011 05:35 AM, Alex Williamson wrote:
> Device assignment via a VF provides the lowest latency and most
> bandwidth for *getting data off the host system*, though virtio/vhost is
> getting better. If all you care about is VM-VM on the same host or
> VM-host, then virtio is only limited by memory bandwidth/latency and
> host processor cycles. Your processor has 25GB/s of memory bandwidth.
> On the other hand, the VF has to send data all the way out to the wire
> and all the way back up through the NIC to get to the other VM/host.
> You're using a 1Gb/s NIC. Your results actually seem to indicate you're
> getting better than wire rate, so maybe you're only passing through an
> internal switch on the NIC, in any case, VFs are not optimal for
> communication within the same physical system. They are optimal for off
> host communication. Thanks,
>
Note I think in both cases we can make significant improvements:
- for VFs, steer device interrupts to the cpus which run the vcpus that
will receive the interrupts eventually (ISTR some work about this, but
not sure)
- for virtio, use a DMA engine to copy data (I think there exists code
in upstream which does this, but has this been enabled/tuned?)
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 8:07 ` Avi Kivity
@ 2011-04-21 12:31 ` Stefan Hajnoczi
2011-04-21 13:09 ` Avi Kivity
2011-04-25 17:46 ` David Ahern
1 sibling, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2011-04-21 12:31 UTC (permalink / raw)
To: Avi Kivity; +Cc: Alex Williamson, David Ahern, KVM mailing list
On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity <avi@redhat.com> wrote:
> Note I think in both cases we can make significant improvements:
> - for VFs, steer device interrupts to the cpus which run the vcpus that will
> receive the interrupts eventually (ISTR some work about this, but not sure)
> - for virtio, use a DMA engine to copy data (I think there exists code in
> upstream which does this, but has this been enabled/tuned?)
Which data copy in virtio? Is this a vhost-net specific thing you're
thinking about?
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 12:31 ` Stefan Hajnoczi
@ 2011-04-21 13:09 ` Avi Kivity
2011-04-25 17:49 ` David Ahern
0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2011-04-21 13:09 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Alex Williamson, David Ahern, KVM mailing list
On 04/21/2011 03:31 PM, Stefan Hajnoczi wrote:
> On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity<avi@redhat.com> wrote:
> > Note I think in both cases we can make significant improvements:
> > - for VFs, steer device interrupts to the cpus which run the vcpus that will
> > receive the interrupts eventually (ISTR some work about this, but not sure)
> > - for virtio, use a DMA engine to copy data (I think there exists code in
> > upstream which does this, but has this been enabled/tuned?)
>
> Which data copy in virtio? Is this a vhost-net specific thing you're
> thinking about?
There are several copies.
qemu's virtio-net implementation incurs a copy on tx and on rx when
calling the kernel; in addition there is also an internal copy:
/* copy in packet. ugh */
len = iov_from_buf(sg, elem.in_num,
buf + offset, size - offset);
In principle vhost-net can avoid the tx copy, but I think now we have 1
copy on rx and tx each.
If a host interface is dedicated to backing a vhost-net interface (say
if you have an SR/IOV card) then you can in principle avoid the rx copy
as well.
An alternative to avoiding the copies is to use a dma engine, like I
mentioned.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 2:35 ` Alex Williamson
2011-04-21 8:07 ` Avi Kivity
@ 2011-04-25 17:39 ` David Ahern
2011-04-25 18:13 ` Alex Williamson
1 sibling, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-25 17:39 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/20/11 20:35, Alex Williamson wrote:
> Device assignment via a VF provides the lowest latency and most
> bandwidth for *getting data off the host system*, though virtio/vhost is
> getting better. If all you care about is VM-VM on the same host or
> VM-host, then virtio is only limited by memory bandwidth/latency and
> host processor cycles. Your processor has 25GB/s of memory bandwidth.
> On the other hand, the VF has to send data all the way out to the wire
> and all the way back up through the NIC to get to the other VM/host.
> You're using a 1Gb/s NIC. Your results actually seem to indicate you're
> getting better than wire rate, so maybe you're only passing through an
> internal switch on the NIC, in any case, VFs are not optimal for
> communication within the same physical system. They are optimal for off
> host communication. Thanks,
Hi Alex:
Host-host was the next focus for the tests. I have 2 of the
aforementioned servers, each configured identically. As a reminder:
Host:
Dell R410
2 quad core E5620@2.40 GHz processors
16 GB RAM
Intel 82576 NIC (Gigabit ET Quad Port)
- devices eth2, eth3, eth4, eth5
Fedora 14
kernel: 2.6.35.12-88.fc14.x86_64
qemu-kvm.git, ffce28fe6 (18-April-11)
VMs:
Fedora 14
kernel 2.6.35.11-83.fc14.x86_64
2 vcpus
1GB RAM
2 NICs - 1 virtio, 1 VF
The virtio network arguments to qemu-kvm are:
-netdev type=tap,vhost=on,ifname=tap0,id=netdev1
-device virtio-net-pci,mac=${mac},netdev=netdev1
For this round of tests I have the following setup:
.======================================.
| Host - A |
| |
| .-------------------------. |
| | Virtual Machine - C | |
| | | |
| | .------. .------. | |
| '--| eth1 |-----| eth0 |--' |
| '------' '------' |
| 192.168. | | 192.168.103.71
| 102.71 | .------. |
| | | tap0 | |
| | '------' |
| | | |
| | .------. |
| | | br | 192.168.103.79
| | '------' |
| {VF} | |
| .--------. .------. |
'=======| eth2 |======| eth3 |======='
'--------' '------'
192.168.102.79 | |
| point-to- |
| point |
| connections |
192.168.102.80 | |
.--------. .------.
.=======| eth2 |======| eth3 |=======.
| '--------' '------' |
| {VF} | |
| | .------. |
| | | br | 192.168.103.80
| | '------' |
| | | |
| | .------. |
| | | tap0 | |
| 192.168. | '------' |
| 102.81 | | 192.168.103.81
| .------. .------. |
| .--| eth1 |-----| eth0 |--. |
| | '------' '------' | |
| | | |
| | Virtual Machine - D | |
| '-------------------------' |
| |
| Host - B |
'======================================'
So, basically, 192.168.102 is the network where the VMs have a VF, and
192.168.103 is the network where the VMs use virtio for networking.
The netperf commands are all run on either Host-A or VM-C:
netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R
netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R
latency throughput
(usec) Mbps
cross-host:
A-B, eth2 185 932
A-B, eth3 185 935
same host, host-VM:
A-C, using VF 488 1085 (seen as high as 1280's)
A-C, virtio 150 4282
cross-host, host-VM:
A-D, VF 489 938
A-D, virtio 288 889
cross-host, VM-VM:
C-D, VF 488 934
C-D, virtio 490 933
While throughput for VFs is fine (near line-rate when crossing hosts),
the latency is horrible. Any options to improve that?
David
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 8:07 ` Avi Kivity
2011-04-21 12:31 ` Stefan Hajnoczi
@ 2011-04-25 17:46 ` David Ahern
2011-04-26 8:20 ` Avi Kivity
1 sibling, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-25 17:46 UTC (permalink / raw)
To: Avi Kivity; +Cc: Alex Williamson, KVM mailing list
On 04/21/11 02:07, Avi Kivity wrote:
> On 04/21/2011 05:35 AM, Alex Williamson wrote:
>> Device assignment via a VF provides the lowest latency and most
>> bandwidth for *getting data off the host system*, though virtio/vhost is
>> getting better. If all you care about is VM-VM on the same host or
>> VM-host, then virtio is only limited by memory bandwidth/latency and
>> host processor cycles. Your processor has 25GB/s of memory bandwidth.
>> On the other hand, the VF has to send data all the way out to the wire
>> and all the way back up through the NIC to get to the other VM/host.
>> You're using a 1Gb/s NIC. Your results actually seem to indicate you're
>> getting better than wire rate, so maybe you're only passing through an
>> internal switch on the NIC, in any case, VFs are not optimal for
>> communication within the same physical system. They are optimal for off
>> host communication. Thanks,
>>
>
> Note I think in both cases we can make significant improvements:
> - for VFs, steer device interrupts to the cpus which run the vcpus that
> will receive the interrupts eventually (ISTR some work about this, but
> not sure)
I don't understand your point here. I thought interrupts for the VF were
only delivered to the guest, not the host.
David
> - for virtio, use a DMA engine to copy data (I think there exists code
> in upstream which does this, but has this been enabled/tuned?)
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-21 13:09 ` Avi Kivity
@ 2011-04-25 17:49 ` David Ahern
2011-04-26 8:19 ` Avi Kivity
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-25 17:49 UTC (permalink / raw)
To: Avi Kivity; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list
On 04/21/11 07:09, Avi Kivity wrote:
> On 04/21/2011 03:31 PM, Stefan Hajnoczi wrote:
>> On Thu, Apr 21, 2011 at 9:07 AM, Avi Kivity<avi@redhat.com> wrote:
>> > Note I think in both cases we can make significant improvements:
>> > - for VFs, steer device interrupts to the cpus which run the vcpus
>> that will
>> > receive the interrupts eventually (ISTR some work about this, but
>> not sure)
>> > - for virtio, use a DMA engine to copy data (I think there exists
>> code in
>> > upstream which does this, but has this been enabled/tuned?)
>>
>> Which data copy in virtio? Is this a vhost-net specific thing you're
>> thinking about?
>
> There are several copies.
>
> qemu's virtio-net implementation incurs a copy on tx and on rx when
> calling the kernel; in addition there is also an internal copy:
>
> /* copy in packet. ugh */
> len = iov_from_buf(sg, elem.in_num,
> buf + offset, size - offset);
>
> In principle vhost-net can avoid the tx copy, but I think now we have 1
> copy on rx and tx each.
So there is a copy internal to qemu, then from qemu to the host tap
device and then tap device to a physical NIC if the packet is leaving
the host?
Is that what the zero-copy patch set is attempting - bypassing the
transmit copy to the macvtap device?
>
> If a host interface is dedicated to backing a vhost-net interface (say
> if you have an SR/IOV card) then you can in principle avoid the rx copy
> as well.
>
> An alternative to avoiding the copies is to use a dma engine, like I
> mentioned.
>
How does the DMA engine differ from the zero-copy patch set?
David
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 17:39 ` David Ahern
@ 2011-04-25 18:13 ` Alex Williamson
2011-04-25 19:07 ` David Ahern
0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2011-04-25 18:13 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Mon, 2011-04-25 at 11:39 -0600, David Ahern wrote:
> On 04/20/11 20:35, Alex Williamson wrote:
> > Device assignment via a VF provides the lowest latency and most
> > bandwidth for *getting data off the host system*, though virtio/vhost is
> > getting better. If all you care about is VM-VM on the same host or
> > VM-host, then virtio is only limited by memory bandwidth/latency and
> > host processor cycles. Your processor has 25GB/s of memory bandwidth.
> > On the other hand, the VF has to send data all the way out to the wire
> > and all the way back up through the NIC to get to the other VM/host.
> > You're using a 1Gb/s NIC. Your results actually seem to indicate you're
> > getting better than wire rate, so maybe you're only passing through an
> > internal switch on the NIC, in any case, VFs are not optimal for
> > communication within the same physical system. They are optimal for off
> > host communication. Thanks,
>
> Hi Alex:
>
> Host-host was the next focus for the tests. I have 2 of the
> aforementioned servers, each configured identically. As a reminder:
>
> Host:
> Dell R410
> 2 quad core E5620@2.40 GHz processors
> 16 GB RAM
> Intel 82576 NIC (Gigabit ET Quad Port)
> - devices eth2, eth3, eth4, eth5
> Fedora 14
> kernel: 2.6.35.12-88.fc14.x86_64
> qemu-kvm.git, ffce28fe6 (18-April-11)
>
> VMs:
> Fedora 14
> kernel 2.6.35.11-83.fc14.x86_64
> 2 vcpus
> 1GB RAM
> 2 NICs - 1 virtio, 1 VF
>
> The virtio network arguments to qemu-kvm are:
> -netdev type=tap,vhost=on,ifname=tap0,id=netdev1
> -device virtio-net-pci,mac=${mac},netdev=netdev1
>
>
> For this round of tests I have the following setup:
>
> .======================================.
> | Host - A |
> | |
> | .-------------------------. |
> | | Virtual Machine - C | |
> | | | |
> | | .------. .------. | |
> | '--| eth1 |-----| eth0 |--' |
> | '------' '------' |
> | 192.168. | | 192.168.103.71
> | 102.71 | .------. |
> | | | tap0 | |
> | | '------' |
> | | | |
> | | .------. |
> | | | br | 192.168.103.79
> | | '------' |
> | {VF} | |
> | .--------. .------. |
> '=======| eth2 |======| eth3 |======='
> '--------' '------'
> 192.168.102.79 | |
> | point-to- |
> | point |
> | connections |
> 192.168.102.80 | |
> .--------. .------.
> .=======| eth2 |======| eth3 |=======.
> | '--------' '------' |
> | {VF} | |
> | | .------. |
> | | | br | 192.168.103.80
> | | '------' |
> | | | |
> | | .------. |
> | | | tap0 | |
> | 192.168. | '------' |
> | 102.81 | | 192.168.103.81
> | .------. .------. |
> | .--| eth1 |-----| eth0 |--. |
> | | '------' '------' | |
> | | | |
> | | Virtual Machine - D | |
> | '-------------------------' |
> | |
> | Host - B |
> '======================================'
>
>
> So, basically, 192.168.102 is the network where the VMs have a VF, and
> 192.168.103 is the network where the VMs use virtio for networking.
>
> The netperf commands are all run on either Host-A or VM-C:
>
> netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R
> netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R
>
>
> latency throughput
> (usec) Mbps
> cross-host:
> A-B, eth2 185 932
> A-B, eth3 185 935
This is actually PF-PF, right? It would be interesting to load igbvf on
the hosts and determine VF-VF latency as well.
> same host, host-VM:
> A-C, using VF 488 1085 (seen as high as 1280's)
> A-C, virtio 150 4282
We know virtio has a shorter path for this test.
> cross-host, host-VM:
> A-D, VF 489 938
> A-D, virtio 288 889
>
> cross-host, VM-VM:
> C-D, VF 488 934
> C-D, virtio 490 933
>
>
> While throughput for VFs is fine (near line-rate when crossing hosts),
FWIW, it's not too difficult to get line rate on a 1Gbps network, even
some of the emulated NICs can do it. There will be a difference in host
CPU power to get it though, where it should theoretically be emulated >
virtio > pci-assign.
> the latency is horrible. Any options to improve that?
If you don't mind testing, I'd like to see VF-VF between the hosts (to
do this, don't assign eth2 an IP, just make sure it's up, then load the
igbvf driver on the host and assign an IP to one of the VFs associated
with the eth2 PF), and cross host testing using the PF for the guest
instead of the VF. This should help narrow down how much of the latency
is due to using the VF vs the PF, since all of the virtio tests are
using the PF. I've been suspicious that the VF adds some latency, but
haven't had a good test setup (or time) to dig very deep into it.
Thanks,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 18:13 ` Alex Williamson
@ 2011-04-25 19:07 ` David Ahern
2011-04-25 19:29 ` Alex Williamson
2011-05-02 18:58 ` David Ahern
0 siblings, 2 replies; 23+ messages in thread
From: David Ahern @ 2011-04-25 19:07 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/25/11 12:13, Alex Williamson wrote:
>> So, basically, 192.168.102 is the network where the VMs have a VF, and
>> 192.168.103 is the network where the VMs use virtio for networking.
>>
>> The netperf commands are all run on either Host-A or VM-C:
>>
>> netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R
>> netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R
>>
>>
>> latency throughput
>> (usec) Mbps
>> cross-host:
>> A-B, eth2 185 932
>> A-B, eth3 185 935
>
> This is actually PF-PF, right? It would be interesting to load igbvf on
> the hosts and determine VF-VF latency as well.
yes, PF-PF. eth3 has the added bridge layer, but from what I can see the
overhead is noise. I added host-to-host to put the host-to-VM numbers in
perspective.
>
>> same host, host-VM:
>> A-C, using VF 488 1085 (seen as high as 1280's)
>> A-C, virtio 150 4282
>
> We know virtio has a shorter path for this test.
No complaints about the throughput numbers; the latency is the problem.
>
>> cross-host, host-VM:
>> A-D, VF 489 938
>> A-D, virtio 288 889
>>
>> cross-host, VM-VM:
>> C-D, VF 488 934
>> C-D, virtio 490 933
>>
>>
>> While throughput for VFs is fine (near line-rate when crossing hosts),
>
> FWIW, it's not too difficult to get line rate on a 1Gbps network, even
> some of the emulated NICs can do it. There will be a difference in host
> CPU power to get it though, where it should theoretically be emulated >
> virtio > pci-assign.
10GB is the goal; 1GB offers a cheaper learning environment. ;-)
>
>> the latency is horrible. Any options to improve that?
>
> If you don't mind testing, I'd like to see VF-VF between the hosts (to
> do this, don't assign eth2 an IP, just make sure it's up, then load the
> igbvf driver on the host and assign an IP to one of the VFs associated
> with the eth2 PF), and cross host testing using the PF for the guest
> instead of the VF. This should help narrow down how much of the latency
> is due to using the VF vs the PF, since all of the virtio tests are
> using the PF. I've been suspicious that the VF adds some latency, but
> haven't had a good test setup (or time) to dig very deep into it.
It's a quad nic, so I left eth2 and eth3 alone and added the VF-VF test
using VFs on eth4.
Indeed latency is 488 usec and throughput is 925 Mbps. This is
host-to-host using VFs.
David
> Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 19:07 ` David Ahern
@ 2011-04-25 19:29 ` Alex Williamson
2011-04-25 19:49 ` David Ahern
2011-05-02 18:58 ` David Ahern
1 sibling, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2011-04-25 19:29 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Mon, 2011-04-25 at 13:07 -0600, David Ahern wrote:
> On 04/25/11 12:13, Alex Williamson wrote:
> >> So, basically, 192.168.102 is the network where the VMs have a VF, and
> >> 192.168.103 is the network where the VMs use virtio for networking.
> >>
> >> The netperf commands are all run on either Host-A or VM-C:
> >>
> >> netperf -H $ip -jcC -v 2 -t TCP_RR -- -r 1024 -D L,R
> >> netperf -H $ip -jcC -v 2 -t TCP_STREAM -- -m 1024 -D L,R
> >>
> >>
> >> latency throughput
> >> (usec) Mbps
> >> cross-host:
> >> A-B, eth2 185 932
> >> A-B, eth3 185 935
> >
> > This is actually PF-PF, right? It would be interesting to load igbvf on
> > the hosts and determine VF-VF latency as well.
>
> yes, PF-PF. eth3 has the added bridge layer, but from what I can see the
> overhead is noise. I added host-to-host to put the host-to-VM numbers in
> perspective.
>
> >
> >> same host, host-VM:
> >> A-C, using VF 488 1085 (seen as high as 1280's)
> >> A-C, virtio 150 4282
> >
> > We know virtio has a shorter path for this test.
>
> No complaints about the throughput numbers; the latency is the problem.
>
> >
> >> cross-host, host-VM:
> >> A-D, VF 489 938
> >> A-D, virtio 288 889
> >>
> >> cross-host, VM-VM:
> >> C-D, VF 488 934
> >> C-D, virtio 490 933
> >>
> >>
> >> While throughput for VFs is fine (near line-rate when crossing hosts),
> >
> > FWIW, it's not too difficult to get line rate on a 1Gbps network, even
> > some of the emulated NICs can do it. There will be a difference in host
> > CPU power to get it though, where it should theoretically be emulated >
> > virtio > pci-assign.
>
> 10GB is the goal; 1GB offers a cheaper learning environment. ;-)
>
> >
> >> the latency is horrible. Any options to improve that?
> >
> > If you don't mind testing, I'd like to see VF-VF between the hosts (to
> > do this, don't assign eth2 an IP, just make sure it's up, then load the
> > igbvf driver on the host and assign an IP to one of the VFs associated
> > with the eth2 PF), and cross host testing using the PF for the guest
> > instead of the VF. This should help narrow down how much of the latency
> > is due to using the VF vs the PF, since all of the virtio tests are
> > using the PF. I've been suspicious that the VF adds some latency, but
> > haven't had a good test setup (or time) to dig very deep into it.
>
> It's a quad nic, so I left eth2 and eth3 alone and added the VF-VF test
> using VFs on eth4.
>
> Indeed latency is 488 usec and throughput is 925 Mbps. This is
> host-to-host using VFs.
So we're effectively getting host-host latency/throughput for the VF,
it's just that in the 82576 implementation of SR-IOV, the VF takes a
latency hit that puts it pretty close to virtio. Unfortunate. I think
you'll find that passing the PF to the guests should be pretty close to
that 185us latency. I would assume (hope) the higher end NICs reduce
this, but it seems to be a hardware limitation, so it's hard to predict.
Thanks,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 19:29 ` Alex Williamson
@ 2011-04-25 19:49 ` David Ahern
2011-04-25 20:27 ` Alex Williamson
2011-04-25 20:49 ` Andrew Theurer
0 siblings, 2 replies; 23+ messages in thread
From: David Ahern @ 2011-04-25 19:49 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/25/11 13:29, Alex Williamson wrote:
> So we're effectively getting host-host latency/throughput for the VF,
> it's just that in the 82576 implementation of SR-IOV, the VF takes a
> latency hit that puts it pretty close to virtio. Unfortunate. I think
For host-to-VM using VFs is worse than virtio which is counterintuitive.
> you'll find that passing the PF to the guests should be pretty close to
> that 185us latency. I would assume (hope) the higher end NICs reduce
About that 185usec: do you know where the bottleneck is? It seems as if
the packet is held in some queue waiting for an event/timeout before it
is transmitted.
David
> this, but it seems to be a hardware limitation, so it's hard to predict.
> Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 19:49 ` David Ahern
@ 2011-04-25 20:27 ` Alex Williamson
2011-04-25 20:40 ` David Ahern
2011-04-25 20:49 ` Andrew Theurer
1 sibling, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2011-04-25 20:27 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
>
> On 04/25/11 13:29, Alex Williamson wrote:
> > So we're effectively getting host-host latency/throughput for the VF,
> > it's just that in the 82576 implementation of SR-IOV, the VF takes a
> > latency hit that puts it pretty close to virtio. Unfortunate. I think
>
> For host-to-VM using VFs is worse than virtio which is counterintuitive.
On the same host, just think about the data path of one versus the
other. On the guest side, there's virtio vs a physical NIC. virtio is
designed to be virtualization friendly, so hopefully has less context
switches in setting up and processing transactions. Once the packet
leaves the assigned physical NIC, it has to come back up the entire host
I/O stack, while the virtio device is connected to an internal bridge
and bypasses all but the upper level network routing.
> > you'll find that passing the PF to the guests should be pretty close to
> > that 185us latency. I would assume (hope) the higher end NICs reduce
>
> About that 185usec: do you know where the bottleneck is? It seems as if
> the packet is held in some queue waiting for an event/timeout before it
> is transmitted.
I don't know specifically, I don't do much network performance tuning.
Interrupt coalescing could be a factor, along with various offload
settings, and of course latency of the physical NIC hardware and
interconnects. Thanks,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 20:27 ` Alex Williamson
@ 2011-04-25 20:40 ` David Ahern
2011-04-25 21:02 ` Alex Williamson
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-25 20:40 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/25/11 14:27, Alex Williamson wrote:
> On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
>>
>> On 04/25/11 13:29, Alex Williamson wrote:
>>> So we're effectively getting host-host latency/throughput for the VF,
>>> it's just that in the 82576 implementation of SR-IOV, the VF takes a
>>> latency hit that puts it pretty close to virtio. Unfortunate. I think
>>
>> For host-to-VM using VFs is worse than virtio which is counterintuitive.
>
> On the same host, just think about the data path of one versus the
> other. On the guest side, there's virtio vs a physical NIC. virtio is
> designed to be virtualization friendly, so hopefully has less context
> switches in setting up and processing transactions. Once the packet
> leaves the assigned physical NIC, it has to come back up the entire host
> I/O stack, while the virtio device is connected to an internal bridge
> and bypasses all but the upper level network routing.
I get the virtio path, but you lost me on the physical NIC. I thought
the point of VFs is to bypass the host from having to touch the packet,
so the processing path with a VM using a VF would be the same as a non-VM.
David
>
>>> you'll find that passing the PF to the guests should be pretty close to
>>> that 185us latency. I would assume (hope) the higher end NICs reduce
>>
>> About that 185usec: do you know where the bottleneck is? It seems as if
>> the packet is held in some queue waiting for an event/timeout before it
>> is transmitted.
>
> I don't know specifically, I don't do much network performance tuning.
> Interrupt coalescing could be a factor, along with various offload
> settings, and of course latency of the physical NIC hardware and
> interconnects. Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 19:49 ` David Ahern
2011-04-25 20:27 ` Alex Williamson
@ 2011-04-25 20:49 ` Andrew Theurer
1 sibling, 0 replies; 23+ messages in thread
From: Andrew Theurer @ 2011-04-25 20:49 UTC (permalink / raw)
To: David Ahern; +Cc: Alex Williamson, KVM mailing list
On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
>
> On 04/25/11 13:29, Alex Williamson wrote:
> > So we're effectively getting host-host latency/throughput for the VF,
> > it's just that in the 82576 implementation of SR-IOV, the VF takes a
> > latency hit that puts it pretty close to virtio. Unfortunate. I think
>
> For host-to-VM using VFs is worse than virtio which is counterintuitive.
>
> > you'll find that passing the PF to the guests should be pretty close to
> > that 185us latency. I would assume (hope) the higher end NICs reduce
>
> About that 185usec: do you know where the bottleneck is? It seems as if
> the packet is held in some queue waiting for an event/timeout before it
> is transmitted.
you might want to check the VF driver. I know versions of the ixgbevf
driver have a throttled interrupt option which will increase latency
with some settings. I don't remember if the igbvf driver has the same
feature. If it does, you will want to turn this option off for best
latency.
>
> David
>
>
> > this, but it seems to be a hardware limitation, so it's hard to predict.
> > Thanks,
> >
> > Alex
-Andrew
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 20:40 ` David Ahern
@ 2011-04-25 21:02 ` Alex Williamson
2011-04-25 21:14 ` David Ahern
0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2011-04-25 21:02 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote:
>
> On 04/25/11 14:27, Alex Williamson wrote:
> > On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
> >>
> >> On 04/25/11 13:29, Alex Williamson wrote:
> >>> So we're effectively getting host-host latency/throughput for the VF,
> >>> it's just that in the 82576 implementation of SR-IOV, the VF takes a
> >>> latency hit that puts it pretty close to virtio. Unfortunate. I think
> >>
> >> For host-to-VM using VFs is worse than virtio which is counterintuitive.
> >
> > On the same host, just think about the data path of one versus the
> > other. On the guest side, there's virtio vs a physical NIC. virtio is
> > designed to be virtualization friendly, so hopefully has less context
> > switches in setting up and processing transactions. Once the packet
> > leaves the assigned physical NIC, it has to come back up the entire host
> > I/O stack, while the virtio device is connected to an internal bridge
> > and bypasses all but the upper level network routing.
>
> I get the virtio path, but you lost me on the physical NIC. I thought
> the point of VFs is to bypass the host from having to touch the packet,
> so the processing path with a VM using a VF would be the same as a non-VM.
In the VF case, the host is only involved in processing the packet on
it's end of the connection, but the packet still has to go all the way
out to the physical device and all the way back. Handled on one end by
the VM and the other end by the host.
An analogy might be sending a letter to an office coworker in a
neighboring cube. You could just pass the letter over the wall (virtio)
or you could go put it in the mailbox, signal the mail carrier, who
comes and moves it to your neighbor's mailbox, who then gets signaled
that they have a letter (device assignment).
Since the networks stacks are completely separate from one another,
there's very little difference in data path whether you're talking to
the host, a remote system, or a remote VM, which is reflected in your
performance data. Hope that helps,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 21:02 ` Alex Williamson
@ 2011-04-25 21:14 ` David Ahern
2011-04-25 21:18 ` Alex Williamson
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-25 21:14 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/25/11 15:02, Alex Williamson wrote:
> On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote:
>>
>> On 04/25/11 14:27, Alex Williamson wrote:
>>> On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
>>>>
>>>> On 04/25/11 13:29, Alex Williamson wrote:
>>>>> So we're effectively getting host-host latency/throughput for the VF,
>>>>> it's just that in the 82576 implementation of SR-IOV, the VF takes a
>>>>> latency hit that puts it pretty close to virtio. Unfortunate. I think
>>>>
>>>> For host-to-VM using VFs is worse than virtio which is counterintuitive.
>>>
>>> On the same host, just think about the data path of one versus the
>>> other. On the guest side, there's virtio vs a physical NIC. virtio is
>>> designed to be virtualization friendly, so hopefully has less context
>>> switches in setting up and processing transactions. Once the packet
>>> leaves the assigned physical NIC, it has to come back up the entire host
>>> I/O stack, while the virtio device is connected to an internal bridge
>>> and bypasses all but the upper level network routing.
>>
>> I get the virtio path, but you lost me on the physical NIC. I thought
>> the point of VFs is to bypass the host from having to touch the packet,
>> so the processing path with a VM using a VF would be the same as a non-VM.
>
> In the VF case, the host is only involved in processing the packet on
> it's end of the connection, but the packet still has to go all the way
> out to the physical device and all the way back. Handled on one end by
> the VM and the other end by the host.
>
> An analogy might be sending a letter to an office coworker in a
> neighboring cube. You could just pass the letter over the wall (virtio)
> or you could go put it in the mailbox, signal the mail carrier, who
> comes and moves it to your neighbor's mailbox, who then gets signaled
> that they have a letter (device assignment).
>
> Since the networks stacks are completely separate from one another,
> there's very little difference in data path whether you're talking to
> the host, a remote system, or a remote VM, which is reflected in your
> performance data. Hope that helps,
Got you. I was thinking host-VM as VM on separate host; I didn't make
that clear. Thanks for clarifying - I like the letter example.
David
>
> Alex
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 21:14 ` David Ahern
@ 2011-04-25 21:18 ` Alex Williamson
0 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2011-04-25 21:18 UTC (permalink / raw)
To: David Ahern; +Cc: KVM mailing list
On Mon, 2011-04-25 at 15:14 -0600, David Ahern wrote:
>
> On 04/25/11 15:02, Alex Williamson wrote:
> > On Mon, 2011-04-25 at 14:40 -0600, David Ahern wrote:
> >>
> >> On 04/25/11 14:27, Alex Williamson wrote:
> >>> On Mon, 2011-04-25 at 13:49 -0600, David Ahern wrote:
> >>>>
> >>>> On 04/25/11 13:29, Alex Williamson wrote:
> >>>>> So we're effectively getting host-host latency/throughput for the VF,
> >>>>> it's just that in the 82576 implementation of SR-IOV, the VF takes a
> >>>>> latency hit that puts it pretty close to virtio. Unfortunate. I think
> >>>>
> >>>> For host-to-VM using VFs is worse than virtio which is counterintuitive.
> >>>
> >>> On the same host, just think about the data path of one versus the
> >>> other. On the guest side, there's virtio vs a physical NIC. virtio is
> >>> designed to be virtualization friendly, so hopefully has less context
> >>> switches in setting up and processing transactions. Once the packet
> >>> leaves the assigned physical NIC, it has to come back up the entire host
> >>> I/O stack, while the virtio device is connected to an internal bridge
> >>> and bypasses all but the upper level network routing.
> >>
> >> I get the virtio path, but you lost me on the physical NIC. I thought
> >> the point of VFs is to bypass the host from having to touch the packet,
> >> so the processing path with a VM using a VF would be the same as a non-VM.
> >
> > In the VF case, the host is only involved in processing the packet on
> > it's end of the connection, but the packet still has to go all the way
> > out to the physical device and all the way back. Handled on one end by
> > the VM and the other end by the host.
> >
> > An analogy might be sending a letter to an office coworker in a
> > neighboring cube. You could just pass the letter over the wall (virtio)
> > or you could go put it in the mailbox, signal the mail carrier, who
> > comes and moves it to your neighbor's mailbox, who then gets signaled
> > that they have a letter (device assignment).
> >
> > Since the networks stacks are completely separate from one another,
> > there's very little difference in data path whether you're talking to
> > the host, a remote system, or a remote VM, which is reflected in your
> > performance data. Hope that helps,
>
> Got you. I was thinking host-VM as VM on separate host; I didn't make
> that clear. Thanks for clarifying - I like the letter example.
I should probably also note that being able to "pass a letter over the
wall" is possible because of the bridge/tap setup used for that
communication path, so it's available to emulated NICs as well. virtio
is just a paravirtualization layer that makes it lower overhead than
emulation. To get a letter "out of the office" (ie. off host), all
paths still eventually need to put the letter in the mailbox. Thanks,
Alex
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 17:49 ` David Ahern
@ 2011-04-26 8:19 ` Avi Kivity
2011-04-27 21:13 ` David Ahern
0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2011-04-26 8:19 UTC (permalink / raw)
To: David Ahern; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list
On 04/25/2011 08:49 PM, David Ahern wrote:
> >
> > There are several copies.
> >
> > qemu's virtio-net implementation incurs a copy on tx and on rx when
> > calling the kernel; in addition there is also an internal copy:
> >
> > /* copy in packet. ugh */
> > len = iov_from_buf(sg, elem.in_num,
> > buf + offset, size - offset);
> >
> > In principle vhost-net can avoid the tx copy, but I think now we have 1
> > copy on rx and tx each.
>
> So there is a copy internal to qemu, then from qemu to the host tap
> device and then tap device to a physical NIC if the packet is leaving
> the host?
There is no internal copy on tx, just rx.
So:
virtio-net: 1 internal rx, 1 kernel/user rx, 1 kernel/user tx
vhost-net: 1 internal rx, 1 internal tx
> Is that what the zero-copy patch set is attempting - bypassing the
> transmit copy to the macvtap device?
Yes.
> >
> > If a host interface is dedicated to backing a vhost-net interface (say
> > if you have an SR/IOV card) then you can in principle avoid the rx copy
> > as well.
> >
> > An alternative to avoiding the copies is to use a dma engine, like I
> > mentioned.
> >
>
> How does the DMA engine differ from the zero-copy patch set?
The DMA engine does not avoid the copy, it merely uses a device other
than the cpu to perform it. It offloads the cpu but still loads the
interconnect. True zero-copy avoids both the cpu load and the
interconnect load.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 17:46 ` David Ahern
@ 2011-04-26 8:20 ` Avi Kivity
0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2011-04-26 8:20 UTC (permalink / raw)
To: David Ahern; +Cc: Alex Williamson, KVM mailing list
On 04/25/2011 08:46 PM, David Ahern wrote:
> >
> > Note I think in both cases we can make significant improvements:
> > - for VFs, steer device interrupts to the cpus which run the vcpus that
> > will receive the interrupts eventually (ISTR some work about this, but
> > not sure)
>
> I don't understand your point here. I thought interrupts for the VF were
> only delivered to the guest, not the host.
>
Interrupts are delivered to the host, which the forwards them to the
guest. Virtualization hardware on x86 does not allow direct-to-guest
interrupt delivery.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-26 8:19 ` Avi Kivity
@ 2011-04-27 21:13 ` David Ahern
2011-04-28 8:07 ` Avi Kivity
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2011-04-27 21:13 UTC (permalink / raw)
To: Avi Kivity; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list
On 04/26/11 02:19, Avi Kivity wrote:
> On 04/25/2011 08:49 PM, David Ahern wrote:
>> >
>> > There are several copies.
>> >
>> > qemu's virtio-net implementation incurs a copy on tx and on rx when
>> > calling the kernel; in addition there is also an internal copy:
>> >
>> > /* copy in packet. ugh */
>> > len = iov_from_buf(sg, elem.in_num,
>> > buf + offset, size - offset);
>> >
>> > In principle vhost-net can avoid the tx copy, but I think now we
>> have 1
>> > copy on rx and tx each.
>>
>> So there is a copy internal to qemu, then from qemu to the host tap
>> device and then tap device to a physical NIC if the packet is leaving
>> the host?
>
> There is no internal copy on tx, just rx.
>
> So:
>
> virtio-net: 1 internal rx, 1 kernel/user rx, 1 kernel/user tx
> vhost-net: 1 internal rx, 1 internal tx
Is the following depict where copies are done for virtio-net?
Packet Sends:
.==========================================.
| Host |
| |
| .-------------------------------. |
| | qemu-kvm process | |
| | | |
| | .-------------------------. | |
| | | Guest OS | | |
| | | --------- | | |
| | | ( netperf ) | | |
| | | --------- | | |
| | | user | | | |
| | |-------------------------| | |
| | | kernel | | | |
| | | .-----------. | | |
| | | | TCP stack | copy data from uspace to VM-based skb
| | | '-----------' | | |
| | | | | | |
| | | .--------. | | |
| | | | virtio | passes skb pointers to virtio device
| | | | (eth0) | | | |
| | '---------'--------'------' | |
| | | | |
| | .------------. | |
| | | virtio-net | convert buffer addresses from
| | | device | guest virtual to process (qemu)?
| | '------------' | |
| | | | |
| '-------------------------------' |
| | |
| userspace | |
|------------------------------------------|
| kernel | |
| .------. |
| | tap0 | data copied from userspace
| '------' to host kernel skbs
| | |
| .------. |
| | br | |
| '------' |
| | |
| .------. |
| | eth0 | skbs sent to device for xmit
'=========================================='
Packet Receives
.==========================================.
| Host |
| |
| .-------------------------------. |
| | qemu-kvm process | |
| | | |
| | .-------------------------. | |
| | | Guest OS | | |
| | | --------- | | |
| | | ( netperf ) | | |
| | | --------- | | |
| | | user | | | |
| | |-------------------------| | |
| | | kernel | data copied from skb to userspace buf
| | | .-----------. | | |
| | | | TCP stack | skb attached to socket
| | | '-----------' | | |
| | | | | | |
| | | .--------. | | |
| | | | virtio | put skb onto net queue
| | | | (eth0) | | | |
| | '---------'--------'------' | |
| | | copy here into devices' mapped skb?
| | | this is the extra "internal" copy?
| | .------------. | |
| | | virtio-net | data copied from host
| | | device | kernel to qemu process
| | '------------' | |
| | | | |
| '-------------------------------' |
| | |
| userspace | |
|------------------------------------------|
| kernel | |
| .------. |
| | tap0 | skbs attached to tap device
| '------' |
| | |
| .------. |
| | br | |
| '------' |
| | |
| .------. |
| | eth0 | device writes data into mapped skbs
'=========================================='
David
>
>> Is that what the zero-copy patch set is attempting - bypassing the
>> transmit copy to the macvtap device?
>
> Yes.
>
>> >
>> > If a host interface is dedicated to backing a vhost-net interface (say
>> > if you have an SR/IOV card) then you can in principle avoid the rx
>> copy
>> > as well.
>> >
>> > An alternative to avoiding the copies is to use a dma engine, like I
>> > mentioned.
>> >
>>
>> How does the DMA engine differ from the zero-copy patch set?
>
> The DMA engine does not avoid the copy, it merely uses a device other
> than the cpu to perform it. It offloads the cpu but still loads the
> interconnect. True zero-copy avoids both the cpu load and the
> interconnect load.
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-27 21:13 ` David Ahern
@ 2011-04-28 8:07 ` Avi Kivity
0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2011-04-28 8:07 UTC (permalink / raw)
To: David Ahern; +Cc: Stefan Hajnoczi, Alex Williamson, KVM mailing list
On 04/28/2011 12:13 AM, David Ahern wrote:
> Is the following depict where copies are done for virtio-net?
>
Yes. You should have been an artist.
<snip still life of a networked guest>
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance of virtual functions compared to virtio
2011-04-25 19:07 ` David Ahern
2011-04-25 19:29 ` Alex Williamson
@ 2011-05-02 18:58 ` David Ahern
1 sibling, 0 replies; 23+ messages in thread
From: David Ahern @ 2011-05-02 18:58 UTC (permalink / raw)
To: Alex Williamson; +Cc: KVM mailing list
On 04/25/11 13:07, David Ahern wrote:
>>> same host, host-VM:
>>> A-C, using VF 488 1085 (seen as high as 1280's)
>>> A-C, virtio 150 4282
>>
>> We know virtio has a shorter path for this test.
>
> No complaints about the throughput numbers; the latency is the problem.
rx-usecs is the magical parameter. It defaults to 3 for both the igb and
igbvf drivers which is the 'magic' performance number -- i.e., the
drivers dynamically adapt to the packet rate.
Setting it to 10 in the *VM only* (lowest limit controlled by
IGBVF_MIN_ITR_USECS) dramatically lowers latency with little-to-no
impact to throughput (ie., mostly within the +-10% variation I see
between netperf runs with system defaults everywhere).
Latency in usecs:
default rx-usec=10
host-host 97 105
same host, host-VM 488 158
cross host, host-VM 488 181
cross host, VM-VM 488 255
Changing the default in the host for the physical function kills
throughput with no impact to latency.
I'd still like to know why 100 usec is the baseline for even
host-to-host packets.
David
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-05-02 18:59 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 1:57 performance of virtual functions compared to virtio David Ahern
2011-04-21 2:35 ` Alex Williamson
2011-04-21 8:07 ` Avi Kivity
2011-04-21 12:31 ` Stefan Hajnoczi
2011-04-21 13:09 ` Avi Kivity
2011-04-25 17:49 ` David Ahern
2011-04-26 8:19 ` Avi Kivity
2011-04-27 21:13 ` David Ahern
2011-04-28 8:07 ` Avi Kivity
2011-04-25 17:46 ` David Ahern
2011-04-26 8:20 ` Avi Kivity
2011-04-25 17:39 ` David Ahern
2011-04-25 18:13 ` Alex Williamson
2011-04-25 19:07 ` David Ahern
2011-04-25 19:29 ` Alex Williamson
2011-04-25 19:49 ` David Ahern
2011-04-25 20:27 ` Alex Williamson
2011-04-25 20:40 ` David Ahern
2011-04-25 21:02 ` Alex Williamson
2011-04-25 21:14 ` David Ahern
2011-04-25 21:18 ` Alex Williamson
2011-04-25 20:49 ` Andrew Theurer
2011-05-02 18:58 ` David Ahern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).