netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is veth in net-next reordering traffic?
@ 2015-05-07  2:04 Rick Jones
  2015-05-07  3:16 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Rick Jones @ 2015-05-07  2:04 UTC (permalink / raw)
  To: netdev

I've been messing about with a setup approximating what an OpenStack 
Nova Compute node creates for the private networking plumbing when using 
OVX+VxLAN.  Just without the VM.  So, I have a linux bridge (named qbr), 
a veth pair (named qvb and qvo) joining that to an OVS switch (called 
br-int) which then has a patch pair joining that OVS bridge to another 
OVS bridge (br-tun) which has a vxlan tunnel defined.

I've assigned an IP address to the bare interface (an ixgbe-driven Intel 
83599ES - HP 560FLR), to each of the OVS switches and the Linux bridge.

A second system is setup the same way.

The kernel is 4.1.0-rc1+ out of a davem net-next tree from within the 
last two days.

I've set hostnames for the second system on the first as:

10.14.12.22	bareinterface
192.168.2.22	through-br-tun-vxlan
192.168.1.22	through-br-int
192.168.0.22	through-nearfull-stack

So for bareinterface, the traffic is just through the bare interface on 
each side.  through-nearfull-stack goes through the linux bridge and ovs 
switches and then the bare interface and up the corresponding path on 
the receiver:

And then run netperf:

root@qu-stbaz1-perf0000:~# netstat -s > before;HDR="-P 1"; for i in 
bareinterface through-br-tun-vxlan through-br-int 
through-nearfull-stack; do netperf $HDR -c -H $i -t TCP_stream -B $i -- 
-O 
result_brand,throughput,throughput_units,elapsed_time,local_transport_retrans; 
netstat -s > after; ~raj/beforeafter before after | grep -i reord; mv 
after before; HDR="-P 1"; done
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
bareinterface () port 0 AF_INET : demo
Result           Throughput Throughput  Elapsed Local
Tag                         Units       Time    Transport
                                         (sec)   Retransmissions

"bareinterface"  8311.99    10^6bits/s  10.00   66
     Detected reordering 1 times using FACK
     Detected reordering 0 times using SACK
     Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
through-br-tun-vxlan () port 0 AF_INET : demo
Result                  Throughput Throughput  Elapsed Local
Tag                                Units       Time    Transport
                                                (sec)   Retransmissions

"through-br-tun-vxlan"  2845.10    10^6bits/s  10.07   32799
     Detected reordering 0 times using FACK
     Detected reordering 0 times using SACK
     Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
through-br-int () port 0 AF_INET : demo
Result            Throughput Throughput  Elapsed Local
Tag                          Units       Time    Transport
                                          (sec)   Retransmissions

"through-br-int"  3412.14    10^6bits/s  10.01   30141
     Detected reordering 0 times using FACK
     Detected reordering 0 times using SACK
     Detected reordering 0 times using time stamp
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
through-nearfull-stack () port 0 AF_INET : demo
Result                    Throughput Throughput  Elapsed Local
Tag                                  Units       Time    Transport
                                                  (sec)   Retransmissions

"through-nearfull-stack"  2515.64    10^6bits/s  10.01   15339
     Detected reordering 0 times using FACK
     Detected reordering 310608 times using SACK
     Detected reordering 21621 times using time stamp

Once the traffic is through the "nearfull" stack a boatload of 
reordering is detected.

To see if it was on the sending side or the receiving side I gave br-tun 
on the receiving side the ip associated with "through-nearfull-stack" 
and ran that netperf again:

root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H 
192.168.0.22 -l 10 -- -O throughput,local_transport_retrans; netstat -s 
 > afterstat;~raj/beforeafter beforestat afterstat | grep -i reord
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.0.22 () port 0 AF_INET : demo
Throughput Local
            Transport
            Retransmissions

2697.98    6208
     Detected reordering 0 times using FACK
     Detected reordering 166519 times using SACK
     Detected reordering 8457 times using time stamp

which makes it seem like the reordering is on the sending side though I 
suppose it is not entirely conclusive.  I have pcaps of a netperf 
through the "nearfull" stack on both sender and receiver, for the 
physical interface (which will be VxLAN encapsulated), the qvo, qvb and 
qvb (i've not yet learned how to get a trace off the OVS patch 
interfaces).  I've put them on netperf.org. for anonymous FTP - the file 
is veth_reordering.tgz and when unpacked will create a veth_reordering/ 
with all six captures in it.

Looking at the send_qvb trace in tcptrace/xplot looks "clean" as far as 
out of order is concerned.  looking at the send_qvo trace suggests out 
of order but is a little confusing.  There may be some packets in the 
trace out of timestamp order.  Can't really look at the physical 
interface trace with tcptrace because it doesn't know about VxLAN.  When 
I looked briefly with stone knives and bear skins at the sending side 
physical interface trace it did seem to show reordering though.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is veth in net-next reordering traffic?
  2015-05-07  2:04 Is veth in net-next reordering traffic? Rick Jones
@ 2015-05-07  3:16 ` Eric Dumazet
  2015-05-07 16:01   ` Rick Jones
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2015-05-07  3:16 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

On Wed, 2015-05-06 at 19:04 -0700, Rick Jones wrote:
> I've been messing about with a setup approximating what an OpenStack 
> Nova Compute node creates for the private networking plumbing when using 
> OVX+VxLAN.  Just without the VM.  So, I have a linux bridge (named qbr), 
> a veth pair (named qvb and qvo) joining that to an OVS switch (called 
> br-int) which then has a patch pair joining that OVS bridge to another 
> OVS bridge (br-tun) which has a vxlan tunnel defined.

veth can certainly reorder traffic, unless you use cpu binding with your
netperf (sender side)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is veth in net-next reordering traffic?
  2015-05-07  3:16 ` Eric Dumazet
@ 2015-05-07 16:01   ` Rick Jones
  0 siblings, 0 replies; 3+ messages in thread
From: Rick Jones @ 2015-05-07 16:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 05/06/2015 08:16 PM, Eric Dumazet wrote:
> On Wed, 2015-05-06 at 19:04 -0700, Rick Jones wrote:
>> I've been messing about with a setup approximating what an OpenStack
>> Nova Compute node creates for the private networking plumbing when using
>> OVX+VxLAN.  Just without the VM.  So, I have a linux bridge (named qbr),
>> a veth pair (named qvb and qvo) joining that to an OVS switch (called
>> br-int) which then has a patch pair joining that OVS bridge to another
>> OVS bridge (br-tun) which has a vxlan tunnel defined.
>
> veth can certainly reorder traffic, unless you use cpu binding with your
> netperf (sender side)

Is the seemingly high proportion of spurious retransmissions a concern? 
  (Assuming I'm looking at an interpreting correct stats):

Unbound:
root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H 
192.168.0.22 -l 30 -- -O throughput,local_transport_retrans; netstat -s 
 > afterstat;~raj/beforeafter beforestat afterstat | grep -i -e reord -e 
dsack
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.0.22 () port 0 AF_INET : demo
Throughput Local
            Transport
            Retransmissions

2864.44    8892
     Detected reordering 0 times using FACK
     Detected reordering 334059 times using SACK
     Detected reordering 9722 times using time stamp
     5 congestion windows recovered without slow start by DSACK
     0 DSACKs sent for old packets
     8114 DSACKs received
     0 DSACKs for out of order packets received
     TCPDSACKIgnoredOld: 26
     TCPDSACKIgnoredNoUndo: 6153


Bound (CPU 1 picked arbitrarily):
root@qu-stbaz1-perf0000:~# netstat -s > beforestat; netperf -H 
192.168.0.22 -l 30 -T 1 -- -O throughput,local_transport_retrans; 
netstat -s > afterstat;~raj/beforeafter beforestat afterstat | grep -i 
-e reord -e dsack
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.0.22 () port 0 AF_INET : demo : cpu bind
Throughput Local
            Transport
            Retransmissions

3278.14    4099
     Detected reordering 0 times using FACK
     Detected reordering 8154 times using SACK
     Detected reordering 3 times using time stamp
     1 congestion windows recovered without slow start by DSACK
     0 DSACKs sent for old packets
     669 DSACKs received
     169 DSACKs for out of order packets received
     TCPDSACKIgnoredOld: 0
     TCPDSACKIgnoredNoUndo: 37

I suppose then that is also why I see so many tx queues getting involved 
in ixgbe for just a single stream?

(ethtool stats over a 5 second interval run through beforeafter)

5
NIC statistics:
      rx_packets: 541461
      tx_packets: 1010748
      rx_bytes: 63833156
      tx_bytes: 1529215668
      rx_pkts_nic: 541461
      tx_pkts_nic: 1010748
      rx_bytes_nic: 65998760
      tx_bytes_nic: 1533258678
      multicast: 14
      fdir_match: 9
      fdir_miss: 541460
      tx_restart_queue: 150
      tx_queue_0_packets: 927983
      tx_queue_0_bytes: 1404085816
      tx_queue_1_packets: 19872
      tx_queue_1_bytes: 30086064
      tx_queue_2_packets: 10650
      tx_queue_2_bytes: 16121144
      tx_queue_3_packets: 1200
      tx_queue_3_bytes: 1815402
      tx_queue_4_packets: 409
      tx_queue_4_bytes: 619226
      tx_queue_5_packets: 459
      tx_queue_5_bytes: 694926
      tx_queue_8_packets: 49715
      tx_queue_8_bytes: 75096650
      tx_queue_16_packets: 460
      tx_queue_16_bytes: 696440
      rx_queue_0_packets: 10
      rx_queue_0_bytes: 654
      rx_queue_3_packets: 541437
      rx_queue_3_bytes: 63830248
      rx_queue_6_packets: 14
      rx_queue_6_bytes: 2254

Versus a bound netperf:

5
NIC statistics:
      rx_packets: 1123827
      tx_packets: 1619156
      rx_bytes: 140008757
      tx_bytes: 2450188854
      rx_pkts_nic: 1123816
      tx_pkts_nic: 1619197
      rx_bytes_nic: 144502745
      tx_bytes_nic: 2456723162
      multicast: 13
      fdir_match: 4
      fdir_miss: 1123834
      tx_restart_queue: 757
      tx_queue_0_packets: 1373194
      tx_queue_0_bytes: 2078088490
      tx_queue_1_packets: 245959
      tx_queue_1_bytes: 372099706
      tx_queue_13_packets: 3
      tx_queue_13_bytes: 658
      rx_queue_0_packets: 4
      rx_queue_0_bytes: 264
      rx_queue_3_packets: 1123810
      rx_queue_3_bytes: 140006400
      rx_queue_6_packets: 13
      rx_queue_6_bytes: 2093

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-07 16:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-07  2:04 Is veth in net-next reordering traffic? Rick Jones
2015-05-07  3:16 ` Eric Dumazet
2015-05-07 16:01   ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).