TSO on veth device slows transmission to a crawl

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TSO on veth device slows transmission to a crawl
@ 2015-04-06 22:45 Jan Engelhardt
  2015-04-07  2:48 ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-06 22:45 UTC (permalink / raw)
  To: Linux Networking Developer Mailing List


I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
device makes IPv4-TCP transfers going into that veth-connected container 
progress slowly.


Host side (hv03):
hv03# ip l
2: ge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP mode DEFAULT group default qlen 1000 [Intel 82579LM]
7: ve-build01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
pfifo_fast state UP mode DEFAULT group default qlen 1000 [veth]
hv03# ethtool -k ve-build01
Features for ve-build01:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]


Guest side (build01):
build01# ip l
2: host0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP mode DEFAULT group default qlen 1000
build01# ethtool -k host0
Features for host0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]


Using an independent machine, I query a xinetd-chargen sample service
to send a sufficient number of bytes through the pipe.

ares40# traceroute build01
traceroute to build01 (x), 30 hops max, 60 byte packets
 1  hv03 ()  0.713 ms  0.663 ms  0.636 ms
 2  build01 ()  0.905 ms  0.882 ms  0.858 ms

ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
 480KiB 0:00:05 [91.5KiB/s] [    <=>             ]
1.01MiB 0:00:11 [91.1KiB/s] [          <=>       ]
1.64MiB 0:00:18 [ 110KiB/s] [                <=> ]

(PV is the Pipe Viewer, showing throughput.)

It hovers between 80 and 110 kilobytes/sec, which is 600-fold lower
than what I would normally see. Once TSO is turned off on the
container-side interface:

build01# ethtool -K host0 tso off
(must be host0 // doing it on ve-build01 has no effect)

I observe restoration of expected throughput:

ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
 182MiB 0:02:05 [66.1MiB/s] [                       <=> ]


This problem does not manifest when using IPv6.
The problem also does not manifest if the TCP4 connection is kernel-local,
e.g. hv03->build01.
The problem also does not manifest if the TCP4 connection is outgoing, 
e.g. build01->ares40.
IOW, the tcp4 listening socket needs to be inside a veth-connected 
container.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-06 22:45 TSO on veth device slows transmission to a crawl Jan Engelhardt
@ 2015-04-07  2:48 ` Eric Dumazet
  2015-04-07  9:54   ` Jan Engelhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2015-04-07  2:48 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote:
> I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
> device makes IPv4-TCP transfers going into that veth-connected container 
> progress slowly.
> 
> 
> Host side (hv03):
> hv03# ip l
> 2: ge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
> state UP mode DEFAULT group default qlen 1000 [Intel 82579LM]
> 7: ve-build01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> pfifo_fast state UP mode DEFAULT group default qlen 1000 [veth]
> hv03# ethtool -k ve-build01
> Features for ve-build01:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: on
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: on
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-ipip-segmentation: on
> tx-sit-segmentation: on
> tx-udp_tnl-segmentation: on
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: on
> rx-vlan-stag-hw-parse: on
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> 
> 
> Guest side (build01):
> build01# ip l
> 2: host0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
> state UP mode DEFAULT group default qlen 1000
> build01# ethtool -k host0
> Features for host0:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: on
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: on
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-ipip-segmentation: on
> tx-sit-segmentation: on
> tx-udp_tnl-segmentation: on
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: on
> rx-vlan-stag-hw-parse: on
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> 
> 
> Using an independent machine, I query a xinetd-chargen sample service
> to send a sufficient number of bytes through the pipe.
> 
> ares40# traceroute build01
> traceroute to build01 (x), 30 hops max, 60 byte packets
>  1  hv03 ()  0.713 ms  0.663 ms  0.636 ms
>  2  build01 ()  0.905 ms  0.882 ms  0.858 ms
> 
> ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
>  480KiB 0:00:05 [91.5KiB/s] [    <=>             ]
> 1.01MiB 0:00:11 [91.1KiB/s] [          <=>       ]
> 1.64MiB 0:00:18 [ 110KiB/s] [                <=> ]
> 
> (PV is the Pipe Viewer, showing throughput.)
> 
> It hovers between 80 and 110 kilobytes/sec, which is 600-fold lower
> than what I would normally see. Once TSO is turned off on the
> container-side interface:
> 
> build01# ethtool -K host0 tso off
> (must be host0 // doing it on ve-build01 has no effect)
> 
> I observe restoration of expected throughput:
> 
> ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
>  182MiB 0:02:05 [66.1MiB/s] [                       <=> ]
> 
> 
> This problem does not manifest when using IPv6.
> The problem also does not manifest if the TCP4 connection is kernel-local,
> e.g. hv03->build01.
> The problem also does not manifest if the TCP4 connection is outgoing, 
> e.g. build01->ares40.
> IOW, the tcp4 listening socket needs to be inside a veth-connected 
> container.

Hi Jan

Nothing comes to mind. It would help if you could provide a script to
reproduce the issue.

I've tried the following on current net-next :

lpaa23:~# cat veth.sh
#!/bin/sh
#This script has to be launched as root
#
brctl addbr br0
ip addr add 192.168.64.1/24 dev br0
ip link set br0 up
ip link add name ext0 type veth peer name int0
ip link set ext0 up
brctl addif br0 ext0
ip netns add vnode0
ip link set dev int0 netns vnode0
ip netns exec vnode0 ip addr add 192.168.64.2/24 dev int0 
ip netns exec vnode0 ip link set dev int0 up
ip link add name ext1 type veth peer name int0
ip link set ext1 up
brctl addif br0 ext1
ip netns add vnode1
ip link set dev int0 netns vnode1
ip netns exec vnode1 ip addr add 192.168.64.3/24 dev int0
ip netns exec vnode1 ip link set dev int0 up

ip netns exec vnode0 netserver &
sleep 1
ip netns exec vnode1 netperf -H 192.168.64.2 -l 10

# Cleanup
ip netns exec vnode0 killall netserver
ifconfig br0 down ; brctl delbr br0
ip netns delete vnode0 ; ip netns delete vnode1


lpaa23:~# ./veth.sh
Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.64.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    14924.09   

Seems pretty honest result.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-07  2:48 ` Eric Dumazet
@ 2015-04-07  9:54   ` Jan Engelhardt
  2015-04-07 19:49     ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-07  9:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List


On Tuesday 2015-04-07 04:48, Eric Dumazet wrote:
>On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote:
>> I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
>> device makes IPv4-TCP transfers going into that veth-connected container 
>> progress slowly.
>
>Nothing comes to mind. It would help if you could provide a script to
>reproduce the issue.

It seems IPsec is *also* a requirement in the mix.
Anyhow, script time!

<<<<< lpaa23-run.sh
#!/bin/sh -x
#lpaa23 has 10.10.40.129/24 on enp0s3
#remember to sysctl -w net.ipv4.conf.enp0s3.forwarding=1 etc.

ip x s a \
	src 10.10.40.129 dst 10.10.40.1 \
	proto esp spi 0xc3184540 reqid 4 mode tunnel \
	replay-window 32 flag af-unspec \
	auth-trunc "hmac(sha1)" 0x30ce4aa84fcc0e11f2d1567b4bdd5ba619b2dd77 96 \
	enc "cbc(aes)" 0x4e2acb3899f8d6b7b472de59535c2fd7
ip x s a \
	src 10.10.40.1 dst 10.10.40.129 \
	proto esp spi 0xc8141c39 reqid 4 mode tunnel \
	replay-window 32 flag af-unspec \
	auth-trunc "hmac(sha1)" 0xbf06bfd2e30a0f043f6c426acd5758112133ffc8 96 \
	enc "cbc(aes)" 0x2d6da3e595d04d2ed9aa89676ad3dabd
ip x p a \
	src 10.10.40.1/32 dst 10.10.23.0/24 \
	dir fwd priority 2851 ptype main \
	tmpl src 10.10.40.1 dst 10.10.40.129 \
		proto esp reqid 4 mode tunnel
ip x p a \
	src 10.10.40.1/32 dst 10.10.23.0/24 \
	dir in priority 2851 ptype main \
	tmpl src 10.10.40.1 dst 10.10.40.129 \
		proto esp reqid 4 mode tunnel
ip x p a \
	src 10.10.23.0/24 dst 10.10.40.1/32 \
	dir out priority 2851 ptype main \
	tmpl src 10.10.40.129 dst 10.10.40.1 \
		proto esp reqid 4 mode tunnel


ip link add name ext0 type veth peer name int0
ip addr add 10.10.40.129/32 dev ext0
ip link set ext0 up
ip route add 10.10.23.23/32 dev ext0
ip netns add vnode0
ip link set dev int0 netns vnode0
ip netns exec vnode0 ip addr add 10.10.23.23/32 dev int0
ip netns exec vnode0 ip link set dev lo up
ip netns exec vnode0 ip link set dev int0 up
ip netns exec vnode0 ip route add 10.10.40.129/32 dev int0
ip netns exec vnode0 ip route replace default via 10.10.40.129

xexit ()
{
	trap "" EXIT INT
	ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir fwd
	ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir in
	ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir out
	ip x s d src 10.10.40.129 dst 10.10.40.1 proto esp spi 0xc3184540
	ip x s d src 10.10.40.1 dst 10.10.40.129 proto esp spi 0xc8141c39
	ip link del ext0
	ip netns delete vnode0
}

trap xexit EXIT INT
ip netns exec vnode0 netserver -D
>>>

<<< lpaa24-run.sh
#!/bin/sh -x
#lpaa24 has 10.10.40.1/24 on eth0

ip x s a \
	src 10.10.40.1 dst 10.10.40.129 \
        proto esp spi 0xc8141c39 reqid 4 mode tunnel \
        replay-window 32 flag af-unspec \
        auth-trunc "hmac(sha1)" 0xbf06bfd2e30a0f043f6c426acd5758112133ffc8 96 \
        enc "cbc(aes)" 0x2d6da3e595d04d2ed9aa89676ad3dabd
ip x s a \
	src 10.10.40.129 dst 10.10.40.1 \
        proto esp spi 0xc3184540 reqid 4 mode tunnel \
        replay-window 32 flag af-unspec \
        auth-trunc "hmac(sha1)" 0x30ce4aa84fcc0e11f2d1567b4bdd5ba619b2dd77 96 \
        enc "cbc(aes)" 0x4e2acb3899f8d6b7b472de59535c2fd7
ip x p a \
	src 10.10.23.0/24 dst 10.10.40.1/32 \
        dir fwd priority 1827 ptype main \
        tmpl src 10.10.40.129 dst 10.10.40.1 \
                proto esp reqid 4 mode tunnel
ip x p a \
	src 10.10.23.0/24 dst 10.10.40.1/32 \
        dir in priority 1827 ptype main \
        tmpl src 10.10.40.129 dst 10.10.40.1 \
                proto esp reqid 4 mode tunnel
ip x p a \
	src 10.10.40.1/32 dst 10.10.23.0/24 \
        dir out priority 1827 ptype main \
        tmpl src 10.10.40.1 dst 10.10.40.129 \
                proto esp reqid 4 mode tunnel
ip r a \
	10.10.23.0/24 via 10.10.40.129 dev eth0  proto static  src 10.10.40.1 


xexit ()
{
	trap "" EXIT INT
	ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir fwd
	ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir in
	ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir out
	ip x s d src 10.10.40.1 dst 10.10.40.129 proto esp spi 0xc8141c39
	ip x s d src 10.10.40.129 dst 10.10.40.1 proto esp spi 0xc3184540
}

trap xexit EXIT INT
netperf -H 10.10.23.23 -l 10
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-07  9:54   ` Jan Engelhardt
@ 2015-04-07 19:49     ` Eric Dumazet
  2015-04-08 17:09       ` Jan Engelhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2015-04-07 19:49 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Tue, 2015-04-07 at 11:54 +0200, Jan Engelhardt wrote:
> On Tuesday 2015-04-07 04:48, Eric Dumazet wrote:
> >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote:
> >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
> >> device makes IPv4-TCP transfers going into that veth-connected container 
> >> progress slowly.
> >
> >Nothing comes to mind. It would help if you could provide a script to
> >reproduce the issue.
> 
> It seems IPsec is *also* a requirement in the mix.
> Anyhow, script time!

I tried your scripts, but the sender does not use veth ?

Where is the part you disable TSO ?
If its on the receiver, I fail to understand how it matters.

+ DUMP_TCP_INFO=1 ./netperf -H 10.10.23.23 -l 10
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.10.23.23 () port 0 AF_INET
tcpi_rto 217000 tcpi_ato 0 tcpi_pmtu 1438 tcpi_rcv_ssthresh 29200
tcpi_rtt 16715 tcpi_rttvar 7 tcpi_snd_ssthresh 702 tpci_snd_cwnd 831
tcpi_reordering 3 tcpi_total_retrans 24
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.03     577.53   

Seems OK to me ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-07 19:49     ` Eric Dumazet
@ 2015-04-08 17:09       ` Jan Engelhardt
  2015-04-08 17:20         ` Rick Jones
  2015-04-09  9:24         ` Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-08 17:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List


On Tuesday 2015-04-07 21:49, Eric Dumazet wrote:
>> On Tuesday 2015-04-07 04:48, Eric Dumazet wrote:
>> >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote:
>> >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
[and now also 3.19.3]
>> >> device makes IPv4-TCP transfers going into that veth-connected container 
>> >> progress slowly.
>> >
>> >Nothing comes to mind. It would help if you could provide a script to
>> >reproduce the issue.
>> 
>> It seems IPsec is *also* a requirement in the mix.
>> Anyhow, script time!
>
>I tried your scripts, but the sender does not use veth ?

I was finally able to reproduce it reliably, and with just one 
machine/kernel instance (and a number of containers of course).

I have uploaded the script mix to
	http://inai.de/files/tso-1.tar.xz
There is a demo screencast at
	http://inai.de/files/tso-1.mkv

>From the script collection, one can run t-all-init to setup the
network, then t-chargen-server (stays in foreground), and on another
terminal then t-chargen-client. Alternatively, the combination
t-zero-{server,client}. It appears that it has to a simple
single-threaded, single-connection, single-everything transfer.

The problem won't manifest with netperf even if run for extended
period (60 seconds). netperf is probably too smart, exploiting some
form of parallelism.

Oh, and if the transfer rate is absolutely *zero* (esp. for 
xinetd-chargen), that just means that it attempts DNS name resolution. 
Just wait a few seconds in that case for the transfer to start.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-08 17:09       ` Jan Engelhardt
@ 2015-04-08 17:20         ` Rick Jones
  2015-04-08 18:16           ` Jan Engelhardt
  2015-04-09  9:24         ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Rick Jones @ 2015-04-08 17:20 UTC (permalink / raw)
  To: Jan Engelhardt, Eric Dumazet; +Cc: Linux Networking Developer Mailing List

On 04/08/2015 10:09 AM, Jan Engelhardt wrote:
> The problem won't manifest with netperf even if run for extended
> period (60 seconds). netperf is probably too smart, exploiting some
> form of parallelism.

Heh - I rather doubt netperf is all that smart.  Unless you use the 
likes of the -T global option to bind netperf/netserver to specific 
CPUs, it will just blythly run wherever the stack/scheduler decides it 
should run.

It is though still interesting you don't see the issue with netperf.  I 
don't know specifically how xinetd-chargen behaves but if it is like the 
chargen of old, I assume that means it is sitting there writing one byte 
at a time to the socket, and perhaps has TCP_NODELAY set.  If you want 
to emulate that with netperf, then something like:

netperf -H <receiver> -t TCP_STREAM -- -m 1 -D

would be the command-line to use.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-08 17:20         ` Rick Jones
@ 2015-04-08 18:16           ` Jan Engelhardt
  2015-04-08 18:19             ` Rick Jones
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-08 18:16 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, Linux Networking Developer Mailing List


On Wednesday 2015-04-08 19:20, Rick Jones wrote:
> On 04/08/2015 10:09 AM, Jan Engelhardt wrote:
>> The problem won't manifest with netperf even if run for extended
>> period (60 seconds). netperf is probably too smart, exploiting some
>> form of parallelism.
>
> Heh - I rather doubt netperf is all that smart.[...]
> I don't
> know specifically how xinetd-chargen behaves but if it is like the chargen of
> old, I assume that means it is sitting there writing one byte at a time to the
> socket, and perhaps has TCP_NODELAY set.

Interesting thought.

Indeed, stracing reveals that xinetd-chargen is issuing 74-byte write
syscalls. They do however get merged to a certain extent at the TCP
level. tcpdump shows groups of three:

IP 23.19 > 40.1.59411: Flags [P.], ..., length 74
IP 23.19 > 40.1.59411: Flags [.], ..., length 1370
IP 23.19 > 40.1.59411: Flags [P.], ..., length 36
(repeat)

However, presence of small packets is not the issue. As an
alternative to chargen, socat can be used. socat issues 8192-byte
write syscalls, and tcpdump shows just large writes. Now
that I look at it, too large:

IP 23.23:19 > 40.1:59412: Flags [.], seq 25975:28715, ack 1, win 227, options [nop,nop,TS val 44808653 ecr 44808653], length 2740
IP 40.1:59412 > 23.23:19: Flags [.], ack 24605, win 627, options [nop,nop,TS val 44808653 ecr 44808653], length 0
IP 23.23:19 > 40.1:59412: Flags [.], seq 28715:31455, ack 1, win 227, options [nop,nop,TS val 44808653 ecr 44808653], length 2740
IP 40.1:59412 > 23.23:19: Flags [.], ack 25975, win 649, options [nop,nop,TS val 44808653 ecr 44808653], length 0
IP 23.23:19 > 40.1:59412: Flags [.], seq 25975:27345, ack 1, win 227, options [nop,nop,TS val 44808854 ecr 44808653], length 1370
IP 40.1:59412 > 23.23:19: Flags [.], ack 27345, win 672, options [nop,nop,TS val 44808854 ecr 44808854], length 0

2740 bytes is larger than the link MTU. That only works if all
the drivers do their segmentation properly, which does not seem
to happen here.


>If you want to emulate that with netperf, then something like:
>
> netperf -H <receiver> -t TCP_STREAM -- -m 1 -D

One more thing to be aware of is that iptraf counts all the
headers, which netperf likely is not doing.

 -m   1: iptraf  57 MBit/s, netperf   0.69*10^6
 -m  10: iptraf  63 MBit/s, netperf   6.76*10^6
 -m 100: iptraf 114 MBit/s, netperf  75.74*10^6
<no -m>: iptraf~580 MBit/s, netperf 542.00*10^6

However, with chargen/socat, both iptraf and pv agree, around 1
MBit/s [108-120Kbyte/s], so small writes really don't seem to be the
issue, but maybe the (aforeposted) over-mtu-size packets.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-08 18:16           ` Jan Engelhardt
@ 2015-04-08 18:19             ` Rick Jones
  0 siblings, 0 replies; 14+ messages in thread
From: Rick Jones @ 2015-04-08 18:19 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Eric Dumazet, Linux Networking Developer Mailing List

On 04/08/2015 11:16 AM, Jan Engelhardt wrote:

> One more thing to be aware of is that iptraf counts all the
> headers, which netperf likely is not doing.
>
>   -m   1: iptraf  57 MBit/s, netperf   0.69*10^6
>   -m  10: iptraf  63 MBit/s, netperf   6.76*10^6
>   -m 100: iptraf 114 MBit/s, netperf  75.74*10^6
> <no -m>: iptraf~580 MBit/s, netperf 542.00*10^6

Indeed, netperf makes no attempt to count headers.  What it reports is 
strictly above the socket interface.

rick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-08 17:09       ` Jan Engelhardt
  2015-04-08 17:20         ` Rick Jones
@ 2015-04-09  9:24         ` Eric Dumazet
  2015-04-09 10:01           ` Eric Dumazet
  2015-04-09 10:26           ` Jan Engelhardt
  1 sibling, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-04-09  9:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Wed, 2015-04-08 at 19:09 +0200, Jan Engelhardt wrote:
> On Tuesday 2015-04-07 21:49, Eric Dumazet wrote:
> >> On Tuesday 2015-04-07 04:48, Eric Dumazet wrote:
> >> >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote:
> >> >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave 
> [and now also 3.19.3]
> >> >> device makes IPv4-TCP transfers going into that veth-connected container 
> >> >> progress slowly.
> >> >
> >> >Nothing comes to mind. It would help if you could provide a script to
> >> >reproduce the issue.
> >> 
> >> It seems IPsec is *also* a requirement in the mix.
> >> Anyhow, script time!
> >
> >I tried your scripts, but the sender does not use veth ?
> 
> I was finally able to reproduce it reliably, and with just one 
> machine/kernel instance (and a number of containers of course).
> 
> I have uploaded the script mix to
> 	http://inai.de/files/tso-1.tar.xz
> There is a demo screencast at
> 	http://inai.de/files/tso-1.mkv

Thanks for providing these scripts, but they do not work on my host
running latest net-next

Basic IP routing seems not properly enabled, as even ping does not work
to reach 10.10.23.23

ip netns exec vclient /bin/bash
ping 10.10.23.23
<no frame is sent>

Have you tried net-next as well ?

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-09  9:24         ` Eric Dumazet
@ 2015-04-09 10:01           ` Eric Dumazet
  2015-04-09 10:26           ` Jan Engelhardt
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-04-09 10:01 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Thu, 2015-04-09 at 02:24 -0700, Eric Dumazet wrote:

> 
> Thanks for providing these scripts, but they do not work on my host
> running latest net-next
> 
> Basic IP routing seems not properly enabled, as even ping does not work
> to reach 10.10.23.23
> 
> ip netns exec vclient /bin/bash
> ping 10.10.23.23
> <no frame is sent>
> 

Same problem if I run pristine 3.19

lpaa23:~/t# ip netns exec vclient /bin/bash
lpaa23:~/t# ip ro get 10.10.23.23
10.10.23.23 via 10.10.40.129 dev veth0  src 10.10.40.1 
    cache 
lpaa23:~/t# nstat
#kernel
lpaa23:~/t# ping 10.10.23.23
PING 10.10.23.23 (10.10.23.23) 56(84) bytes of data.
^C
--- 10.10.23.23 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

lpaa23:~/t# nstat
#kernel
IcmpOutErrors                   2                  0.0
IcmpOutEchoReps                 2                  0.0
IcmpMsgOutType8                 2                  0.0

lpaa23:~/t# grep . `find /proc/sys/net/ipv4 -name forwarding`
/proc/sys/net/ipv4/conf/all/forwarding:1
/proc/sys/net/ipv4/conf/default/forwarding:1
/proc/sys/net/ipv4/conf/gre0/forwarding:1
/proc/sys/net/ipv4/conf/gretap0/forwarding:1
/proc/sys/net/ipv4/conf/lo/forwarding:1
/proc/sys/net/ipv4/conf/sit0/forwarding:1
/proc/sys/net/ipv4/conf/veth0/forwarding:1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-09  9:24         ` Eric Dumazet
  2015-04-09 10:01           ` Eric Dumazet
@ 2015-04-09 10:26           ` Jan Engelhardt
  2015-04-09 15:08             ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-09 10:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List


On Thursday 2015-04-09 11:24, Eric Dumazet wrote:
>> I was finally able to reproduce it reliably, and with just one 
>> machine/kernel instance (and a number of containers of course).
>> 
>> I have uploaded the script mix to
>> 	http://inai.de/files/tso-1.tar.xz
>> There is a demo screencast at
>> 	http://inai.de/files/tso-1.mkv
>
>Thanks for providing these scripts, but they do not work on my host
>running latest net-next
>
>Basic IP routing seems not properly enabled, as even ping does not work
>to reach 10.10.23.23

Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1
this probably got inherited into the containers, so I did not
write it down into the scripts.
Enable it :)

>Have you tried net-next as well ?

I have now - a quick test yields no problem. Next I will do a reverse 
bisect just for the fun.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-09 10:26           ` Jan Engelhardt
@ 2015-04-09 15:08             ` Eric Dumazet
  2015-04-09 15:35               ` Jan Engelhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2015-04-09 15:08 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Thu, 2015-04-09 at 12:26 +0200, Jan Engelhardt wrote:
> On Thursday 2015-04-09 11:24, Eric Dumazet wrote:
> >> I was finally able to reproduce it reliably, and with just one 
> >> machine/kernel instance (and a number of containers of course).
> >> 
> >> I have uploaded the script mix to
> >> 	http://inai.de/files/tso-1.tar.xz
> >> There is a demo screencast at
> >> 	http://inai.de/files/tso-1.mkv
> >
> >Thanks for providing these scripts, but they do not work on my host
> >running latest net-next
> >
> >Basic IP routing seems not properly enabled, as even ping does not work
> >to reach 10.10.23.23
> 
> Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1
> this probably got inherited into the containers, so I did not
> write it down into the scripts.
> Enable it :)

The thing is : It is enabled.

> 
> >Have you tried net-next as well ?
> 
> I have now - a quick test yields no problem. Next I will do a reverse 
> bisect just for the fun.

Cool ;)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-09 15:08             ` Eric Dumazet
@ 2015-04-09 15:35               ` Jan Engelhardt
  2015-04-09 15:55                 ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-09 15:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List

On Thursday 2015-04-09 17:08, Eric Dumazet wrote:
>> >
>> >Basic IP routing seems not properly enabled, as even ping does not work
>> >to reach 10.10.23.23
>> 
>> Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1
>> this probably got inherited into the containers, so I did not
>> write it down into the scripts.
>> Enable it :)
>
>The thing is : It is enabled.

If forwarding is on, and the config is reasonably correct (which
I think it is), and it still won't ping, then would that not mean
there is another bug in the kernel?

>> >Have you tried net-next as well ?
>> 
>> I have now - a quick test yields no problem. Next I will do a reverse 
>> bisect just for the fun.
>
>Cool ;)

I've aborted the search. v4.0-rc crashes in a VM
http://marc.info/?l=linux-kernel&m=142850643415538&w=2 , and on a
real system, the SCSI layer similarly acts up in 4.0rc5 such that sda
won't even show up in the first place. If large portions of the
history don't be booted, I can't bisect it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TSO on veth device slows transmission to a crawl
  2015-04-09 15:35               ` Jan Engelhardt
@ 2015-04-09 15:55                 ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-04-09 15:55 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List

On Thu, 2015-04-09 at 17:35 +0200, Jan Engelhardt wrote:
> On Thursday 2015-04-09 17:08, Eric Dumazet wrote:
> >> >
> >> >Basic IP routing seems not properly enabled, as even ping does not work
> >> >to reach 10.10.23.23
> >> 
> >> Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1
> >> this probably got inherited into the containers, so I did not
> >> write it down into the scripts.
> >> Enable it :)
> >
> >The thing is : It is enabled.
> 
> If forwarding is on, and the config is reasonably correct (which
> I think it is), and it still won't ping, then would that not mean
> there is another bug in the kernel?

No idea. I tried 3.19 it was not working.

Note that your prior scripts (using 2 hosts) were OK.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-04-09 15:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-06 22:45 TSO on veth device slows transmission to a crawl Jan Engelhardt
2015-04-07  2:48 ` Eric Dumazet
2015-04-07  9:54   ` Jan Engelhardt
2015-04-07 19:49     ` Eric Dumazet
2015-04-08 17:09       ` Jan Engelhardt
2015-04-08 17:20         ` Rick Jones
2015-04-08 18:16           ` Jan Engelhardt
2015-04-08 18:19             ` Rick Jones
2015-04-09  9:24         ` Eric Dumazet
2015-04-09 10:01           ` Eric Dumazet
2015-04-09 10:26           ` Jan Engelhardt
2015-04-09 15:08             ` Eric Dumazet
2015-04-09 15:35               ` Jan Engelhardt
2015-04-09 15:55                 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).