* Is veth in net-next reordering traffic?
@ 2015-05-07 2:04 Rick Jones
2015-05-07 3:16 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Rick Jones @ 2015-05-07 2:04 UTC (permalink / raw)
To: netdev
I've been messing about with a setup approximating what an OpenStack
Nova Compute node creates for the private networking plumbing when using
OVX+VxLAN. Just without the VM. So, I have a linux bridge (named qbr),
a veth pair (named qvb and qvo) joining that to an OVS switch (called
br-int) which then has a patch pair joining that OVS bridge to another
OVS bridge (br-tun) which has a vxlan tunnel defined.
I've assigned an IP address to the bare interface (an ixgbe-driven Intel
83599ES - HP 560FLR), to each of the OVS switches and the Linux bridge.
A second system is setup the same way.
The kernel is 4.1.0-rc1+ out of a davem net-next tree from within the
last two days.
I've set hostnames for the second system on the first as:
10.14.12.22 bareinterface
192.168.2.22 through-br-tun-vxlan
192.168.1.22 through-br-int
192.168.0.22 through-nearfull-stack
So for bareinterface, the traffic is just through the bare interface on
each side. through-nearfull-stack goes through the linux bridge and ovs
switches and then the bare interface and up the corresponding path on
the receiver:
And then run netperf:
root@qu-stbaz1-perf0000:~# netstat -s > before;HDR="-P 1"; for i in
bareinterface through-br-tun-vxlan through-br-int
through-nearfull-stack; do netperf $HDR -c -H $i -t TCP_stream -B $i --
-O
result_brand,throughput,throughput_units,elapsed_time,local_transport_retrans;
netstat -s > after; ~raj/beforeafter before after | grep -i reord; mv
after before; HDR="-P 1"; done
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
bareinterface () port 0 AF_INET : demo
Result Throughput Throughput Elapsed Local
Tag Units Time Transport
(sec) Retransmissions
"bareinterface" 8311.99 10^6bits/s 10.00 66
Detected reordering 1 times using FACK
Detected reordering 0 times using SACK
Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
through-br-tun-vxlan () port 0 AF_INET : demo
Result Throughput Throughput Elapsed Local
Tag Units Time Transport
(sec) Retransmissions
"through-br-tun-vxlan" 2845.10 10^6bits/s 10.07 32799
Detected reordering 0 times using FACK
Detected reordering 0 times using SACK
Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
through-br-int () port 0 AF_INET : demo
Result Throughput Throughput Elapsed Local
Tag Units Time Transport
(sec) Retransmissions
"through-br-int" 3412.14 10^6bits/s 10.01 30141
Detected reordering 0 times using FACK
Detected reordering 0 times using SACK
Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
through-nearfull-stack () port 0 AF_INET : demo
Result Throughput Throughput Elapsed Local
Tag Units Time Transport
(sec) Retransmissions
"through-nearfull-stack" 2515.64 10^6bits/s 10.01 15339
Detected reordering 0 times using FACK
Detected reordering 310608 times using SACK
Detected reordering 21621 times using time stamp
Once the traffic is through the "nearfull" stack a boatload of
reordering is detected.
To see if it was on the sending side or the receiving side I gave br-tun
on the receiving side the ip associated with "through-nearfull-stack"
and ran that netperf again:
root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H
192.168.0.22 -l 10 -- -O throughput,local_transport_retrans; netstat -s
> afterstat;~raj/beforeafter beforestat afterstat | grep -i reord
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.0.22 () port 0 AF_INET : demo
Throughput Local
Transport
Retransmissions
2697.98 6208
Detected reordering 0 times using FACK
Detected reordering 166519 times using SACK
Detected reordering 8457 times using time stamp
which makes it seem like the reordering is on the sending side though I
suppose it is not entirely conclusive. I have pcaps of a netperf
through the "nearfull" stack on both sender and receiver, for the
physical interface (which will be VxLAN encapsulated), the qvo, qvb and
qvb (i've not yet learned how to get a trace off the OVS patch
interfaces). I've put them on netperf.org. for anonymous FTP - the file
is veth_reordering.tgz and when unpacked will create a veth_reordering/
with all six captures in it.
Looking at the send_qvb trace in tcptrace/xplot looks "clean" as far as
out of order is concerned. looking at the send_qvo trace suggests out
of order but is a little confusing. There may be some packets in the
trace out of timestamp order. Can't really look at the physical
interface trace with tcptrace because it doesn't know about VxLAN. When
I looked briefly with stone knives and bear skins at the sending side
physical interface trace it did seem to show reordering though.
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Is veth in net-next reordering traffic?
2015-05-07 2:04 Is veth in net-next reordering traffic? Rick Jones
@ 2015-05-07 3:16 ` Eric Dumazet
2015-05-07 16:01 ` Rick Jones
0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2015-05-07 3:16 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
On Wed, 2015-05-06 at 19:04 -0700, Rick Jones wrote:
> I've been messing about with a setup approximating what an OpenStack
> Nova Compute node creates for the private networking plumbing when using
> OVX+VxLAN. Just without the VM. So, I have a linux bridge (named qbr),
> a veth pair (named qvb and qvo) joining that to an OVS switch (called
> br-int) which then has a patch pair joining that OVS bridge to another
> OVS bridge (br-tun) which has a vxlan tunnel defined.
veth can certainly reorder traffic, unless you use cpu binding with your
netperf (sender side)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Is veth in net-next reordering traffic?
2015-05-07 3:16 ` Eric Dumazet
@ 2015-05-07 16:01 ` Rick Jones
0 siblings, 0 replies; 3+ messages in thread
From: Rick Jones @ 2015-05-07 16:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 05/06/2015 08:16 PM, Eric Dumazet wrote:
> On Wed, 2015-05-06 at 19:04 -0700, Rick Jones wrote:
>> I've been messing about with a setup approximating what an OpenStack
>> Nova Compute node creates for the private networking plumbing when using
>> OVX+VxLAN. Just without the VM. So, I have a linux bridge (named qbr),
>> a veth pair (named qvb and qvo) joining that to an OVS switch (called
>> br-int) which then has a patch pair joining that OVS bridge to another
>> OVS bridge (br-tun) which has a vxlan tunnel defined.
>
> veth can certainly reorder traffic, unless you use cpu binding with your
> netperf (sender side)
Is the seemingly high proportion of spurious retransmissions a concern?
(Assuming I'm looking at an interpreting correct stats):
Unbound:
root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H
192.168.0.22 -l 30 -- -O throughput,local_transport_retrans; netstat -s
> afterstat;~raj/beforeafter beforestat afterstat | grep -i -e reord -e
dsack
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.0.22 () port 0 AF_INET : demo
Throughput Local
Transport
Retransmissions
2864.44 8892
Detected reordering 0 times using FACK
Detected reordering 334059 times using SACK
Detected reordering 9722 times using time stamp
5 congestion windows recovered without slow start by DSACK
0 DSACKs sent for old packets
8114 DSACKs received
0 DSACKs for out of order packets received
TCPDSACKIgnoredOld: 26
TCPDSACKIgnoredNoUndo: 6153
Bound (CPU 1 picked arbitrarily):
root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H
192.168.0.22 -l 30 -T 1 -- -O throughput,local_transport_retrans;
netstat -s > afterstat;~raj/beforeafter beforestat afterstat | grep -i
-e reord -e dsack
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.0.22 () port 0 AF_INET : demo : cpu bind
Throughput Local
Transport
Retransmissions
3278.14 4099
Detected reordering 0 times using FACK
Detected reordering 8154 times using SACK
Detected reordering 3 times using time stamp
1 congestion windows recovered without slow start by DSACK
0 DSACKs sent for old packets
669 DSACKs received
169 DSACKs for out of order packets received
TCPDSACKIgnoredOld: 0
TCPDSACKIgnoredNoUndo: 37
I suppose then that is also why I see so many tx queues getting involved
in ixgbe for just a single stream?
(ethtool stats over a 5 second interval run through beforeafter)
5
NIC statistics:
rx_packets: 541461
tx_packets: 1010748
rx_bytes: 63833156
tx_bytes: 1529215668
rx_pkts_nic: 541461
tx_pkts_nic: 1010748
rx_bytes_nic: 65998760
tx_bytes_nic: 1533258678
multicast: 14
fdir_match: 9
fdir_miss: 541460
tx_restart_queue: 150
tx_queue_0_packets: 927983
tx_queue_0_bytes: 1404085816
tx_queue_1_packets: 19872
tx_queue_1_bytes: 30086064
tx_queue_2_packets: 10650
tx_queue_2_bytes: 16121144
tx_queue_3_packets: 1200
tx_queue_3_bytes: 1815402
tx_queue_4_packets: 409
tx_queue_4_bytes: 619226
tx_queue_5_packets: 459
tx_queue_5_bytes: 694926
tx_queue_8_packets: 49715
tx_queue_8_bytes: 75096650
tx_queue_16_packets: 460
tx_queue_16_bytes: 696440
rx_queue_0_packets: 10
rx_queue_0_bytes: 654
rx_queue_3_packets: 541437
rx_queue_3_bytes: 63830248
rx_queue_6_packets: 14
rx_queue_6_bytes: 2254
Versus a bound netperf:
5
NIC statistics:
rx_packets: 1123827
tx_packets: 1619156
rx_bytes: 140008757
tx_bytes: 2450188854
rx_pkts_nic: 1123816
tx_pkts_nic: 1619197
rx_bytes_nic: 144502745
tx_bytes_nic: 2456723162
multicast: 13
fdir_match: 4
fdir_miss: 1123834
tx_restart_queue: 757
tx_queue_0_packets: 1373194
tx_queue_0_bytes: 2078088490
tx_queue_1_packets: 245959
tx_queue_1_bytes: 372099706
tx_queue_13_packets: 3
tx_queue_13_bytes: 658
rx_queue_0_packets: 4
rx_queue_0_bytes: 264
rx_queue_3_packets: 1123810
rx_queue_3_bytes: 140006400
rx_queue_6_packets: 13
rx_queue_6_bytes: 2093
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-05-07 16:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-07 2:04 Is veth in net-next reordering traffic? Rick Jones
2015-05-07 3:16 ` Eric Dumazet
2015-05-07 16:01 ` Rick Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).