* TSO on veth device slows transmission to a crawl
@ 2015-04-06 22:45 Jan Engelhardt
2015-04-07 2:48 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2015-04-06 22:45 UTC (permalink / raw)
To: Linux Networking Developer Mailing List
I have here a Linux 3.19(.0) system where activated TSO on a veth slave
device makes IPv4-TCP transfers going into that veth-connected container
progress slowly.
Host side (hv03):
hv03# ip l
2: ge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000 [Intel 82579LM]
7: ve-build01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UP mode DEFAULT group default qlen 1000 [veth]
hv03# ethtool -k ve-build01
Features for ve-build01:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
Guest side (build01):
build01# ip l
2: host0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
build01# ethtool -k host0
Features for host0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
Using an independent machine, I query a xinetd-chargen sample service
to send a sufficient number of bytes through the pipe.
ares40# traceroute build01
traceroute to build01 (x), 30 hops max, 60 byte packets
1 hv03 () 0.713 ms 0.663 ms 0.636 ms
2 build01 () 0.905 ms 0.882 ms 0.858 ms
ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
480KiB 0:00:05 [91.5KiB/s] [ <=> ]
1.01MiB 0:00:11 [91.1KiB/s] [ <=> ]
1.64MiB 0:00:18 [ 110KiB/s] [ <=> ]
(PV is the Pipe Viewer, showing throughput.)
It hovers between 80 and 110 kilobytes/sec, which is 600-fold lower
than what I would normally see. Once TSO is turned off on the
container-side interface:
build01# ethtool -K host0 tso off
(must be host0 // doing it on ve-build01 has no effect)
I observe restoration of expected throughput:
ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null
182MiB 0:02:05 [66.1MiB/s] [ <=> ]
This problem does not manifest when using IPv6.
The problem also does not manifest if the TCP4 connection is kernel-local,
e.g. hv03->build01.
The problem also does not manifest if the TCP4 connection is outgoing,
e.g. build01->ares40.
IOW, the tcp4 listening socket needs to be inside a veth-connected
container.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: TSO on veth device slows transmission to a crawl 2015-04-06 22:45 TSO on veth device slows transmission to a crawl Jan Engelhardt @ 2015-04-07 2:48 ` Eric Dumazet 2015-04-07 9:54 ` Jan Engelhardt 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2015-04-07 2:48 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote: > I have here a Linux 3.19(.0) system where activated TSO on a veth slave > device makes IPv4-TCP transfers going into that veth-connected container > progress slowly. > > > Host side (hv03): > hv03# ip l > 2: ge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast > state UP mode DEFAULT group default qlen 1000 [Intel 82579LM] > 7: ve-build01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > pfifo_fast state UP mode DEFAULT group default qlen 1000 [veth] > hv03# ethtool -k ve-build01 > Features for ve-build01: > rx-checksumming: on > tx-checksumming: on > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: on > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: off [fixed] > tx-checksum-sctp: off [fixed] > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: on > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: on > tx-tcp6-segmentation: on > udp-fragmentation-offload: on > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: on > tx-vlan-offload: on > ntuple-filters: off [fixed] > receive-hashing: off [fixed] > highdma: on > rx-vlan-filter: off [fixed] > vlan-challenged: off [fixed] > tx-lockless: on [fixed] > netns-local: off [fixed] > tx-gso-robust: off [fixed] > tx-fcoe-segmentation: off [fixed] > tx-gre-segmentation: on > tx-ipip-segmentation: on > tx-sit-segmentation: on > tx-udp_tnl-segmentation: on > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off [fixed] > tx-vlan-stag-hw-insert: on > rx-vlan-stag-hw-parse: on > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off [fixed] > busy-poll: off [fixed] > > > Guest side (build01): > build01# ip l > 2: host0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast > state UP mode DEFAULT group default qlen 1000 > build01# ethtool -k host0 > Features for host0: > rx-checksumming: on > tx-checksumming: on > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: on > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: off [fixed] > tx-checksum-sctp: off [fixed] > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: on > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: on > tx-tcp6-segmentation: on > udp-fragmentation-offload: on > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: on > tx-vlan-offload: on > ntuple-filters: off [fixed] > receive-hashing: off [fixed] > highdma: on > rx-vlan-filter: off [fixed] > vlan-challenged: off [fixed] > tx-lockless: on [fixed] > netns-local: off [fixed] > tx-gso-robust: off [fixed] > tx-fcoe-segmentation: off [fixed] > tx-gre-segmentation: on > tx-ipip-segmentation: on > tx-sit-segmentation: on > tx-udp_tnl-segmentation: on > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off [fixed] > tx-vlan-stag-hw-insert: on > rx-vlan-stag-hw-parse: on > rx-vlan-stag-filter: off [fixed] > l2-fwd-offload: off [fixed] > busy-poll: off [fixed] > > > Using an independent machine, I query a xinetd-chargen sample service > to send a sufficient number of bytes through the pipe. > > ares40# traceroute build01 > traceroute to build01 (x), 30 hops max, 60 byte packets > 1 hv03 () 0.713 ms 0.663 ms 0.636 ms > 2 build01 () 0.905 ms 0.882 ms 0.858 ms > > ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null > 480KiB 0:00:05 [91.5KiB/s] [ <=> ] > 1.01MiB 0:00:11 [91.1KiB/s] [ <=> ] > 1.64MiB 0:00:18 [ 110KiB/s] [ <=> ] > > (PV is the Pipe Viewer, showing throughput.) > > It hovers between 80 and 110 kilobytes/sec, which is 600-fold lower > than what I would normally see. Once TSO is turned off on the > container-side interface: > > build01# ethtool -K host0 tso off > (must be host0 // doing it on ve-build01 has no effect) > > I observe restoration of expected throughput: > > ares40$ socat tcp4-connect:build01:19 - | pv >/dev/null > 182MiB 0:02:05 [66.1MiB/s] [ <=> ] > > > This problem does not manifest when using IPv6. > The problem also does not manifest if the TCP4 connection is kernel-local, > e.g. hv03->build01. > The problem also does not manifest if the TCP4 connection is outgoing, > e.g. build01->ares40. > IOW, the tcp4 listening socket needs to be inside a veth-connected > container. Hi Jan Nothing comes to mind. It would help if you could provide a script to reproduce the issue. I've tried the following on current net-next : lpaa23:~# cat veth.sh #!/bin/sh #This script has to be launched as root # brctl addbr br0 ip addr add 192.168.64.1/24 dev br0 ip link set br0 up ip link add name ext0 type veth peer name int0 ip link set ext0 up brctl addif br0 ext0 ip netns add vnode0 ip link set dev int0 netns vnode0 ip netns exec vnode0 ip addr add 192.168.64.2/24 dev int0 ip netns exec vnode0 ip link set dev int0 up ip link add name ext1 type veth peer name int0 ip link set ext1 up brctl addif br0 ext1 ip netns add vnode1 ip link set dev int0 netns vnode1 ip netns exec vnode1 ip addr add 192.168.64.3/24 dev int0 ip netns exec vnode1 ip link set dev int0 up ip netns exec vnode0 netserver & sleep 1 ip netns exec vnode1 netperf -H 192.168.64.2 -l 10 # Cleanup ip netns exec vnode0 killall netserver ifconfig br0 down ; brctl delbr br0 ip netns delete vnode0 ; ip netns delete vnode1 lpaa23:~# ./veth.sh Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.64.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 14924.09 Seems pretty honest result. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-07 2:48 ` Eric Dumazet @ 2015-04-07 9:54 ` Jan Engelhardt 2015-04-07 19:49 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Jan Engelhardt @ 2015-04-07 9:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List On Tuesday 2015-04-07 04:48, Eric Dumazet wrote: >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote: >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave >> device makes IPv4-TCP transfers going into that veth-connected container >> progress slowly. > >Nothing comes to mind. It would help if you could provide a script to >reproduce the issue. It seems IPsec is *also* a requirement in the mix. Anyhow, script time! <<<<< lpaa23-run.sh #!/bin/sh -x #lpaa23 has 10.10.40.129/24 on enp0s3 #remember to sysctl -w net.ipv4.conf.enp0s3.forwarding=1 etc. ip x s a \ src 10.10.40.129 dst 10.10.40.1 \ proto esp spi 0xc3184540 reqid 4 mode tunnel \ replay-window 32 flag af-unspec \ auth-trunc "hmac(sha1)" 0x30ce4aa84fcc0e11f2d1567b4bdd5ba619b2dd77 96 \ enc "cbc(aes)" 0x4e2acb3899f8d6b7b472de59535c2fd7 ip x s a \ src 10.10.40.1 dst 10.10.40.129 \ proto esp spi 0xc8141c39 reqid 4 mode tunnel \ replay-window 32 flag af-unspec \ auth-trunc "hmac(sha1)" 0xbf06bfd2e30a0f043f6c426acd5758112133ffc8 96 \ enc "cbc(aes)" 0x2d6da3e595d04d2ed9aa89676ad3dabd ip x p a \ src 10.10.40.1/32 dst 10.10.23.0/24 \ dir fwd priority 2851 ptype main \ tmpl src 10.10.40.1 dst 10.10.40.129 \ proto esp reqid 4 mode tunnel ip x p a \ src 10.10.40.1/32 dst 10.10.23.0/24 \ dir in priority 2851 ptype main \ tmpl src 10.10.40.1 dst 10.10.40.129 \ proto esp reqid 4 mode tunnel ip x p a \ src 10.10.23.0/24 dst 10.10.40.1/32 \ dir out priority 2851 ptype main \ tmpl src 10.10.40.129 dst 10.10.40.1 \ proto esp reqid 4 mode tunnel ip link add name ext0 type veth peer name int0 ip addr add 10.10.40.129/32 dev ext0 ip link set ext0 up ip route add 10.10.23.23/32 dev ext0 ip netns add vnode0 ip link set dev int0 netns vnode0 ip netns exec vnode0 ip addr add 10.10.23.23/32 dev int0 ip netns exec vnode0 ip link set dev lo up ip netns exec vnode0 ip link set dev int0 up ip netns exec vnode0 ip route add 10.10.40.129/32 dev int0 ip netns exec vnode0 ip route replace default via 10.10.40.129 xexit () { trap "" EXIT INT ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir fwd ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir in ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir out ip x s d src 10.10.40.129 dst 10.10.40.1 proto esp spi 0xc3184540 ip x s d src 10.10.40.1 dst 10.10.40.129 proto esp spi 0xc8141c39 ip link del ext0 ip netns delete vnode0 } trap xexit EXIT INT ip netns exec vnode0 netserver -D >>> <<< lpaa24-run.sh #!/bin/sh -x #lpaa24 has 10.10.40.1/24 on eth0 ip x s a \ src 10.10.40.1 dst 10.10.40.129 \ proto esp spi 0xc8141c39 reqid 4 mode tunnel \ replay-window 32 flag af-unspec \ auth-trunc "hmac(sha1)" 0xbf06bfd2e30a0f043f6c426acd5758112133ffc8 96 \ enc "cbc(aes)" 0x2d6da3e595d04d2ed9aa89676ad3dabd ip x s a \ src 10.10.40.129 dst 10.10.40.1 \ proto esp spi 0xc3184540 reqid 4 mode tunnel \ replay-window 32 flag af-unspec \ auth-trunc "hmac(sha1)" 0x30ce4aa84fcc0e11f2d1567b4bdd5ba619b2dd77 96 \ enc "cbc(aes)" 0x4e2acb3899f8d6b7b472de59535c2fd7 ip x p a \ src 10.10.23.0/24 dst 10.10.40.1/32 \ dir fwd priority 1827 ptype main \ tmpl src 10.10.40.129 dst 10.10.40.1 \ proto esp reqid 4 mode tunnel ip x p a \ src 10.10.23.0/24 dst 10.10.40.1/32 \ dir in priority 1827 ptype main \ tmpl src 10.10.40.129 dst 10.10.40.1 \ proto esp reqid 4 mode tunnel ip x p a \ src 10.10.40.1/32 dst 10.10.23.0/24 \ dir out priority 1827 ptype main \ tmpl src 10.10.40.1 dst 10.10.40.129 \ proto esp reqid 4 mode tunnel ip r a \ 10.10.23.0/24 via 10.10.40.129 dev eth0 proto static src 10.10.40.1 xexit () { trap "" EXIT INT ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir fwd ip x p d src 10.10.23.0/24 dst 10.10.40.1/32 dir in ip x p d src 10.10.40.1/32 dst 10.10.23.0/24 dir out ip x s d src 10.10.40.1 dst 10.10.40.129 proto esp spi 0xc8141c39 ip x s d src 10.10.40.129 dst 10.10.40.1 proto esp spi 0xc3184540 } trap xexit EXIT INT netperf -H 10.10.23.23 -l 10 >>> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-07 9:54 ` Jan Engelhardt @ 2015-04-07 19:49 ` Eric Dumazet 2015-04-08 17:09 ` Jan Engelhardt 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2015-04-07 19:49 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Tue, 2015-04-07 at 11:54 +0200, Jan Engelhardt wrote: > On Tuesday 2015-04-07 04:48, Eric Dumazet wrote: > >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote: > >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave > >> device makes IPv4-TCP transfers going into that veth-connected container > >> progress slowly. > > > >Nothing comes to mind. It would help if you could provide a script to > >reproduce the issue. > > It seems IPsec is *also* a requirement in the mix. > Anyhow, script time! I tried your scripts, but the sender does not use veth ? Where is the part you disable TSO ? If its on the receiver, I fail to understand how it matters. + DUMP_TCP_INFO=1 ./netperf -H 10.10.23.23 -l 10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.10.23.23 () port 0 AF_INET tcpi_rto 217000 tcpi_ato 0 tcpi_pmtu 1438 tcpi_rcv_ssthresh 29200 tcpi_rtt 16715 tcpi_rttvar 7 tcpi_snd_ssthresh 702 tpci_snd_cwnd 831 tcpi_reordering 3 tcpi_total_retrans 24 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.03 577.53 Seems OK to me ? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-07 19:49 ` Eric Dumazet @ 2015-04-08 17:09 ` Jan Engelhardt 2015-04-08 17:20 ` Rick Jones 2015-04-09 9:24 ` Eric Dumazet 0 siblings, 2 replies; 14+ messages in thread From: Jan Engelhardt @ 2015-04-08 17:09 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List On Tuesday 2015-04-07 21:49, Eric Dumazet wrote: >> On Tuesday 2015-04-07 04:48, Eric Dumazet wrote: >> >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote: >> >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave [and now also 3.19.3] >> >> device makes IPv4-TCP transfers going into that veth-connected container >> >> progress slowly. >> > >> >Nothing comes to mind. It would help if you could provide a script to >> >reproduce the issue. >> >> It seems IPsec is *also* a requirement in the mix. >> Anyhow, script time! > >I tried your scripts, but the sender does not use veth ? I was finally able to reproduce it reliably, and with just one machine/kernel instance (and a number of containers of course). I have uploaded the script mix to http://inai.de/files/tso-1.tar.xz There is a demo screencast at http://inai.de/files/tso-1.mkv >From the script collection, one can run t-all-init to setup the network, then t-chargen-server (stays in foreground), and on another terminal then t-chargen-client. Alternatively, the combination t-zero-{server,client}. It appears that it has to a simple single-threaded, single-connection, single-everything transfer. The problem won't manifest with netperf even if run for extended period (60 seconds). netperf is probably too smart, exploiting some form of parallelism. Oh, and if the transfer rate is absolutely *zero* (esp. for xinetd-chargen), that just means that it attempts DNS name resolution. Just wait a few seconds in that case for the transfer to start. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-08 17:09 ` Jan Engelhardt @ 2015-04-08 17:20 ` Rick Jones 2015-04-08 18:16 ` Jan Engelhardt 2015-04-09 9:24 ` Eric Dumazet 1 sibling, 1 reply; 14+ messages in thread From: Rick Jones @ 2015-04-08 17:20 UTC (permalink / raw) To: Jan Engelhardt, Eric Dumazet; +Cc: Linux Networking Developer Mailing List On 04/08/2015 10:09 AM, Jan Engelhardt wrote: > The problem won't manifest with netperf even if run for extended > period (60 seconds). netperf is probably too smart, exploiting some > form of parallelism. Heh - I rather doubt netperf is all that smart. Unless you use the likes of the -T global option to bind netperf/netserver to specific CPUs, it will just blythly run wherever the stack/scheduler decides it should run. It is though still interesting you don't see the issue with netperf. I don't know specifically how xinetd-chargen behaves but if it is like the chargen of old, I assume that means it is sitting there writing one byte at a time to the socket, and perhaps has TCP_NODELAY set. If you want to emulate that with netperf, then something like: netperf -H <receiver> -t TCP_STREAM -- -m 1 -D would be the command-line to use. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-08 17:20 ` Rick Jones @ 2015-04-08 18:16 ` Jan Engelhardt 2015-04-08 18:19 ` Rick Jones 0 siblings, 1 reply; 14+ messages in thread From: Jan Engelhardt @ 2015-04-08 18:16 UTC (permalink / raw) To: Rick Jones; +Cc: Eric Dumazet, Linux Networking Developer Mailing List On Wednesday 2015-04-08 19:20, Rick Jones wrote: > On 04/08/2015 10:09 AM, Jan Engelhardt wrote: >> The problem won't manifest with netperf even if run for extended >> period (60 seconds). netperf is probably too smart, exploiting some >> form of parallelism. > > Heh - I rather doubt netperf is all that smart.[...] > I don't > know specifically how xinetd-chargen behaves but if it is like the chargen of > old, I assume that means it is sitting there writing one byte at a time to the > socket, and perhaps has TCP_NODELAY set. Interesting thought. Indeed, stracing reveals that xinetd-chargen is issuing 74-byte write syscalls. They do however get merged to a certain extent at the TCP level. tcpdump shows groups of three: IP 23.19 > 40.1.59411: Flags [P.], ..., length 74 IP 23.19 > 40.1.59411: Flags [.], ..., length 1370 IP 23.19 > 40.1.59411: Flags [P.], ..., length 36 (repeat) However, presence of small packets is not the issue. As an alternative to chargen, socat can be used. socat issues 8192-byte write syscalls, and tcpdump shows just large writes. Now that I look at it, too large: IP 23.23:19 > 40.1:59412: Flags [.], seq 25975:28715, ack 1, win 227, options [nop,nop,TS val 44808653 ecr 44808653], length 2740 IP 40.1:59412 > 23.23:19: Flags [.], ack 24605, win 627, options [nop,nop,TS val 44808653 ecr 44808653], length 0 IP 23.23:19 > 40.1:59412: Flags [.], seq 28715:31455, ack 1, win 227, options [nop,nop,TS val 44808653 ecr 44808653], length 2740 IP 40.1:59412 > 23.23:19: Flags [.], ack 25975, win 649, options [nop,nop,TS val 44808653 ecr 44808653], length 0 IP 23.23:19 > 40.1:59412: Flags [.], seq 25975:27345, ack 1, win 227, options [nop,nop,TS val 44808854 ecr 44808653], length 1370 IP 40.1:59412 > 23.23:19: Flags [.], ack 27345, win 672, options [nop,nop,TS val 44808854 ecr 44808854], length 0 2740 bytes is larger than the link MTU. That only works if all the drivers do their segmentation properly, which does not seem to happen here. >If you want to emulate that with netperf, then something like: > > netperf -H <receiver> -t TCP_STREAM -- -m 1 -D One more thing to be aware of is that iptraf counts all the headers, which netperf likely is not doing. -m 1: iptraf 57 MBit/s, netperf 0.69*10^6 -m 10: iptraf 63 MBit/s, netperf 6.76*10^6 -m 100: iptraf 114 MBit/s, netperf 75.74*10^6 <no -m>: iptraf~580 MBit/s, netperf 542.00*10^6 However, with chargen/socat, both iptraf and pv agree, around 1 MBit/s [108-120Kbyte/s], so small writes really don't seem to be the issue, but maybe the (aforeposted) over-mtu-size packets. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-08 18:16 ` Jan Engelhardt @ 2015-04-08 18:19 ` Rick Jones 0 siblings, 0 replies; 14+ messages in thread From: Rick Jones @ 2015-04-08 18:19 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Eric Dumazet, Linux Networking Developer Mailing List On 04/08/2015 11:16 AM, Jan Engelhardt wrote: > One more thing to be aware of is that iptraf counts all the > headers, which netperf likely is not doing. > > -m 1: iptraf 57 MBit/s, netperf 0.69*10^6 > -m 10: iptraf 63 MBit/s, netperf 6.76*10^6 > -m 100: iptraf 114 MBit/s, netperf 75.74*10^6 > <no -m>: iptraf~580 MBit/s, netperf 542.00*10^6 Indeed, netperf makes no attempt to count headers. What it reports is strictly above the socket interface. rick ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-08 17:09 ` Jan Engelhardt 2015-04-08 17:20 ` Rick Jones @ 2015-04-09 9:24 ` Eric Dumazet 2015-04-09 10:01 ` Eric Dumazet 2015-04-09 10:26 ` Jan Engelhardt 1 sibling, 2 replies; 14+ messages in thread From: Eric Dumazet @ 2015-04-09 9:24 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Wed, 2015-04-08 at 19:09 +0200, Jan Engelhardt wrote: > On Tuesday 2015-04-07 21:49, Eric Dumazet wrote: > >> On Tuesday 2015-04-07 04:48, Eric Dumazet wrote: > >> >On Tue, 2015-04-07 at 00:45 +0200, Jan Engelhardt wrote: > >> >> I have here a Linux 3.19(.0) system where activated TSO on a veth slave > [and now also 3.19.3] > >> >> device makes IPv4-TCP transfers going into that veth-connected container > >> >> progress slowly. > >> > > >> >Nothing comes to mind. It would help if you could provide a script to > >> >reproduce the issue. > >> > >> It seems IPsec is *also* a requirement in the mix. > >> Anyhow, script time! > > > >I tried your scripts, but the sender does not use veth ? > > I was finally able to reproduce it reliably, and with just one > machine/kernel instance (and a number of containers of course). > > I have uploaded the script mix to > http://inai.de/files/tso-1.tar.xz > There is a demo screencast at > http://inai.de/files/tso-1.mkv Thanks for providing these scripts, but they do not work on my host running latest net-next Basic IP routing seems not properly enabled, as even ping does not work to reach 10.10.23.23 ip netns exec vclient /bin/bash ping 10.10.23.23 <no frame is sent> Have you tried net-next as well ? Thanks ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-09 9:24 ` Eric Dumazet @ 2015-04-09 10:01 ` Eric Dumazet 2015-04-09 10:26 ` Jan Engelhardt 1 sibling, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2015-04-09 10:01 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Thu, 2015-04-09 at 02:24 -0700, Eric Dumazet wrote: > > Thanks for providing these scripts, but they do not work on my host > running latest net-next > > Basic IP routing seems not properly enabled, as even ping does not work > to reach 10.10.23.23 > > ip netns exec vclient /bin/bash > ping 10.10.23.23 > <no frame is sent> > Same problem if I run pristine 3.19 lpaa23:~/t# ip netns exec vclient /bin/bash lpaa23:~/t# ip ro get 10.10.23.23 10.10.23.23 via 10.10.40.129 dev veth0 src 10.10.40.1 cache lpaa23:~/t# nstat #kernel lpaa23:~/t# ping 10.10.23.23 PING 10.10.23.23 (10.10.23.23) 56(84) bytes of data. ^C --- 10.10.23.23 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms lpaa23:~/t# nstat #kernel IcmpOutErrors 2 0.0 IcmpOutEchoReps 2 0.0 IcmpMsgOutType8 2 0.0 lpaa23:~/t# grep . `find /proc/sys/net/ipv4 -name forwarding` /proc/sys/net/ipv4/conf/all/forwarding:1 /proc/sys/net/ipv4/conf/default/forwarding:1 /proc/sys/net/ipv4/conf/gre0/forwarding:1 /proc/sys/net/ipv4/conf/gretap0/forwarding:1 /proc/sys/net/ipv4/conf/lo/forwarding:1 /proc/sys/net/ipv4/conf/sit0/forwarding:1 /proc/sys/net/ipv4/conf/veth0/forwarding:1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-09 9:24 ` Eric Dumazet 2015-04-09 10:01 ` Eric Dumazet @ 2015-04-09 10:26 ` Jan Engelhardt 2015-04-09 15:08 ` Eric Dumazet 1 sibling, 1 reply; 14+ messages in thread From: Jan Engelhardt @ 2015-04-09 10:26 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List On Thursday 2015-04-09 11:24, Eric Dumazet wrote: >> I was finally able to reproduce it reliably, and with just one >> machine/kernel instance (and a number of containers of course). >> >> I have uploaded the script mix to >> http://inai.de/files/tso-1.tar.xz >> There is a demo screencast at >> http://inai.de/files/tso-1.mkv > >Thanks for providing these scripts, but they do not work on my host >running latest net-next > >Basic IP routing seems not properly enabled, as even ping does not work >to reach 10.10.23.23 Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1 this probably got inherited into the containers, so I did not write it down into the scripts. Enable it :) >Have you tried net-next as well ? I have now - a quick test yields no problem. Next I will do a reverse bisect just for the fun. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-09 10:26 ` Jan Engelhardt @ 2015-04-09 15:08 ` Eric Dumazet 2015-04-09 15:35 ` Jan Engelhardt 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2015-04-09 15:08 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Thu, 2015-04-09 at 12:26 +0200, Jan Engelhardt wrote: > On Thursday 2015-04-09 11:24, Eric Dumazet wrote: > >> I was finally able to reproduce it reliably, and with just one > >> machine/kernel instance (and a number of containers of course). > >> > >> I have uploaded the script mix to > >> http://inai.de/files/tso-1.tar.xz > >> There is a demo screencast at > >> http://inai.de/files/tso-1.mkv > > > >Thanks for providing these scripts, but they do not work on my host > >running latest net-next > > > >Basic IP routing seems not properly enabled, as even ping does not work > >to reach 10.10.23.23 > > Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1 > this probably got inherited into the containers, so I did not > write it down into the scripts. > Enable it :) The thing is : It is enabled. > > >Have you tried net-next as well ? > > I have now - a quick test yields no problem. Next I will do a reverse > bisect just for the fun. Cool ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-09 15:08 ` Eric Dumazet @ 2015-04-09 15:35 ` Jan Engelhardt 2015-04-09 15:55 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Jan Engelhardt @ 2015-04-09 15:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: Linux Networking Developer Mailing List On Thursday 2015-04-09 17:08, Eric Dumazet wrote: >> > >> >Basic IP routing seems not properly enabled, as even ping does not work >> >to reach 10.10.23.23 >> >> Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1 >> this probably got inherited into the containers, so I did not >> write it down into the scripts. >> Enable it :) > >The thing is : It is enabled. If forwarding is on, and the config is reasonably correct (which I think it is), and it still won't ping, then would that not mean there is another bug in the kernel? >> >Have you tried net-next as well ? >> >> I have now - a quick test yields no problem. Next I will do a reverse >> bisect just for the fun. > >Cool ;) I've aborted the search. v4.0-rc crashes in a VM http://marc.info/?l=linux-kernel&m=142850643415538&w=2 , and on a real system, the SCSI layer similarly acts up in 4.0rc5 such that sda won't even show up in the first place. If large portions of the history don't be booted, I can't bisect it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TSO on veth device slows transmission to a crawl 2015-04-09 15:35 ` Jan Engelhardt @ 2015-04-09 15:55 ` Eric Dumazet 0 siblings, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2015-04-09 15:55 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Networking Developer Mailing List On Thu, 2015-04-09 at 17:35 +0200, Jan Engelhardt wrote: > On Thursday 2015-04-09 17:08, Eric Dumazet wrote: > >> > > >> >Basic IP routing seems not properly enabled, as even ping does not work > >> >to reach 10.10.23.23 > >> > >> Since I have net.ipv4.conf.all.forwarding,net.ipv4.ip_forward=1 > >> this probably got inherited into the containers, so I did not > >> write it down into the scripts. > >> Enable it :) > > > >The thing is : It is enabled. > > If forwarding is on, and the config is reasonably correct (which > I think it is), and it still won't ping, then would that not mean > there is another bug in the kernel? No idea. I tried 3.19 it was not working. Note that your prior scripts (using 2 hosts) were OK. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-04-09 15:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-06 22:45 TSO on veth device slows transmission to a crawl Jan Engelhardt 2015-04-07 2:48 ` Eric Dumazet 2015-04-07 9:54 ` Jan Engelhardt 2015-04-07 19:49 ` Eric Dumazet 2015-04-08 17:09 ` Jan Engelhardt 2015-04-08 17:20 ` Rick Jones 2015-04-08 18:16 ` Jan Engelhardt 2015-04-08 18:19 ` Rick Jones 2015-04-09 9:24 ` Eric Dumazet 2015-04-09 10:01 ` Eric Dumazet 2015-04-09 10:26 ` Jan Engelhardt 2015-04-09 15:08 ` Eric Dumazet 2015-04-09 15:35 ` Jan Engelhardt 2015-04-09 15:55 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).