From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>,
Michal Kubecek <mkubecek@suse.cz>,
Yuchung Cheng <ycheng@google.com>,
"David S. Miller" <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
James Morris <jmorris@namei.org>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH net] tcp: avoid multiple ssthresh reductions in on retransmit window
Date: Wed, 18 Jun 2014 23:05:07 -0700 [thread overview]
Message-ID: <12254.1403157907@localhost.localdomain> (raw)
In-Reply-To: <1403144937.1225.1.camel@edumazet-glaptop2.roam.corp.google.com>
Eric Dumazet <eric.dumazet@gmail.com> wrote:
>On Wed, 2014-06-18 at 18:52 -0700, Jay Vosburgh wrote:
>> The test involves adding 40 ms of delay in and out from machine
>> A with netem, then running iperf from A to B. Once the iperf reaches a
>> steady cwnd, on B, I add an iptables rule to drop 1 packet out of every
>> 1000 coming from A, then remove the rule after 10 seconds. The behavior
>> resulting from this closely matches what I see on the real systems.
>
>Please share the netem setup. Are you sure you do not drop frames on
>netem ? (considering you disable GSO/TSO netem has to be able to store a
>lot of packets)
Reasonably sure; the tc -s qdisc doesn't show any drops by netem
for these test runs. The data I linked to earlier is one run with
TSO/GSO/GRO enabled, and one with TSO/GSO/GRO disabled, and the results
are similar in terms of cwnd recovery time. Looking at the packet
capture for the TSO/GSO/GRO disabled case, the time span from the first
duplicate ACK to the last is about 9 seconds, which is close to the 10
seconds the iptables drop rule is in effect; the same time analysis
applies to retransmissions from the sender.
I've also tested with using netem to induce drops, but in this
particular case I used iptables.
The script I use to set up netem is:
#!/bin/bash
IF=eth1
TC=/usr/local/bin/tc
DELAY=40ms
rmmod ifb
modprobe ifb
ip link set dev ifb0 up
if ${TC} qdisc show dev ${IF} | grep -q ingress; then
${TC} qdisc del dev ${IF} ingress
fi
${TC} qdisc add dev ${IF} ingress
${TC} qdisc del dev ${IF} root
${TC} filter add dev ${IF} parent ffff: protocol ip \
u32 match u32 0 0 flowid 1:1 action mirred egress redirect dev ifb0
${TC} qdisc add dev ifb0 root netem delay ${DELAY} limit 5000
${TC} qdisc add dev ${IF} root netem delay ${DELAY} limit 5000
In the past I've watched the tc backlog, and the highest I've
seen is about 900 packets, so the limit 5000 is probably overkill.
I'm also not absolutely sure the delay 40ms each direction is
materially different from 80ms in one direction, but the real
configuration I'm recreating is 40ms each way.
The tc qdisc stats after the two runs I did earlier to capture
data look like this:
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 1905005 bytes 22277 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc netem 8002: dev eth1 root refcnt 2 limit 5000 delay 40.0ms
Sent 773383636 bytes 510901 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
Sent 14852588 bytes 281846 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc netem 8001: dev ifb0 root refcnt 2 limit 5000 delay 40.0ms
Sent 18763686 bytes 281291 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Lastly, I ran the same test on the actual systems, and the iperf
results are similar to my test lab:
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 896 KBytes 7.34 Mbits/sec
[ 3] 1.0- 2.0 sec 1.50 MBytes 12.6 Mbits/sec
[ 3] 2.0- 3.0 sec 5.12 MBytes 43.0 Mbits/sec
[ 3] 3.0- 4.0 sec 13.9 MBytes 116 Mbits/sec
[ 3] 4.0- 5.0 sec 27.8 MBytes 233 Mbits/sec
[ 3] 5.0- 6.0 sec 39.0 MBytes 327 Mbits/sec
[ 3] 6.0- 7.0 sec 36.8 MBytes 308 Mbits/sec
[ 3] 7.0- 8.0 sec 36.8 MBytes 308 Mbits/sec
[ 3] 8.0- 9.0 sec 37.0 MBytes 310 Mbits/sec
[ 3] 9.0-10.0 sec 36.6 MBytes 307 Mbits/sec
[ 3] 10.0-11.0 sec 33.9 MBytes 284 Mbits/sec
[ 3] 11.0-12.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 12.0-13.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 13.0-14.0 sec 4.38 MBytes 36.7 Mbits/sec
[ 3] 14.0-15.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 15.0-16.0 sec 7.00 MBytes 58.7 Mbits/sec
[ 3] 16.0-17.0 sec 8.62 MBytes 72.4 Mbits/sec
[ 3] 17.0-18.0 sec 4.25 MBytes 35.7 Mbits/sec
[ 3] 18.0-19.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 19.0-20.0 sec 4.25 MBytes 35.7 Mbits/sec
[ 3] 20.0-21.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 21.0-22.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 22.0-23.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 23.0-24.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 24.0-25.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 25.0-26.0 sec 8.38 MBytes 70.3 Mbits/sec
[ 3] 26.0-27.0 sec 8.62 MBytes 72.4 Mbits/sec
[ 3] 27.0-28.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 28.0-29.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 29.0-30.0 sec 8.38 MBytes 70.3 Mbits/sec
[ 3] 30.0-31.0 sec 8.50 MBytes 71.3 Mbits/sec
[ 3] 31.0-32.0 sec 8.62 MBytes 72.4 Mbits/sec
[ 3] 32.0-33.0 sec 8.38 MBytes 70.3 Mbits/sec
[ 3] 33.0-34.0 sec 10.6 MBytes 89.1 Mbits/sec
[ 3] 34.0-35.0 sec 10.6 MBytes 89.1 Mbits/sec
[ 3] 35.0-36.0 sec 10.6 MBytes 89.1 Mbits/sec
[ 3] 36.0-37.0 sec 12.8 MBytes 107 Mbits/sec
[ 3] 37.0-38.0 sec 15.0 MBytes 126 Mbits/sec
[ 3] 38.0-39.0 sec 17.0 MBytes 143 Mbits/sec
[ 3] 39.0-40.0 sec 19.4 MBytes 163 Mbits/sec
[ 3] 40.0-41.0 sec 23.5 MBytes 197 Mbits/sec
[ 3] 41.0-42.0 sec 25.6 MBytes 215 Mbits/sec
[ 3] 42.0-43.0 sec 30.2 MBytes 254 Mbits/sec
[ 3] 43.0-44.0 sec 34.2 MBytes 287 Mbits/sec
[ 3] 44.0-45.0 sec 36.6 MBytes 307 Mbits/sec
[ 3] 45.0-46.0 sec 38.8 MBytes 325 Mbits/sec
[ 3] 46.0-47.0 sec 36.5 MBytes 306 Mbits/sec
This result is consistently repeatable. These systems have more
hops between them than my lab systems, but the ping RTT is 80ms.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2014-06-19 6:05 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-16 15:35 tcp: multiple ssthresh reductions before all packets are retransmitted Michal Kubecek
2014-06-16 17:02 ` Yuchung Cheng
2014-06-16 18:48 ` Michal Kubecek
[not found] ` <20140616174721.GA15406@lion>
2014-06-16 19:04 ` Yuchung Cheng
2014-06-16 20:06 ` Michal Kubecek
2014-06-16 21:19 ` [PATCH net] tcp: avoid multiple ssthresh reductions in on retransmit window Michal Kubecek
2014-06-16 22:39 ` Yuchung Cheng
2014-06-16 23:42 ` Neal Cardwell
2014-06-17 0:25 ` Yuchung Cheng
2014-06-17 0:44 ` Neal Cardwell
2014-06-17 12:20 ` Michal Kubecek
2014-06-17 21:35 ` Yuchung Cheng
2014-06-17 22:42 ` Michal Kubecek
2014-06-18 0:38 ` Jay Vosburgh
2014-06-18 0:56 ` Neal Cardwell
2014-06-18 2:00 ` Jay Vosburgh
2014-06-19 1:52 ` Jay Vosburgh
2014-06-19 2:28 ` Eric Dumazet
2014-06-19 6:05 ` Jay Vosburgh [this message]
2014-06-18 16:56 ` Yuchung Cheng
2014-06-18 7:17 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=12254.1403157907@localhost.localdomain \
--to=jay.vosburgh@canonical.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=mkubecek@suse.cz \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).