From: Eric Dumazet <eric.dumazet@gmail.com>
To: Eric Dumazet <edumazet@google.com>,
Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Neal Cardwell <ncardwell@google.com>,
"David S. Miller" <davem@davemloft.net>,
Netdev <netdev@vger.kernel.org>,
Yuchung Cheng <ycheng@google.com>,
Soheil Hassas Yeganeh <soheil@google.com>,
Jerry Chu <hkchu@google.com>, Dave Taht <dave.taht@gmail.com>
Subject: Re: TCP and BBR: reproducibly low cwnd and bandwidth
Date: Fri, 16 Feb 2018 14:50:35 -0800 [thread overview]
Message-ID: <1518821435.55655.6.camel@gmail.com> (raw)
In-Reply-To: <CANn89iJ_P4AUKEVAzWYWd-4rk336zftDXSe_nCNxU3O7NqwrNQ@mail.gmail.com>
On Fri, 2018-02-16 at 12:54 -0800, Eric Dumazet wrote:
> On Fri, Feb 16, 2018 at 9:25 AM, Oleksandr Natalenko
> <oleksandr@natalenko.name> wrote:
> > Hi.
> >
> > On pátek 16. února 2018 17:33:48 CET Neal Cardwell wrote:
> > > Thanks for the detailed report! Yes, this sounds like an issue in BBR. We
> > > have not run into this one in our team, but we will try to work with you to
> > > fix this.
> > >
> > > Would you be able to take a sender-side tcpdump trace of the slow BBR
> > > transfer ("v4.13 + BBR + fq_codel == Not OK")? Packet headers only would be
> > > fine. Maybe something like:
> > >
> > > tcpdump -w /tmp/test.pcap -c1000000 -s 100 -i eth0 port $PORT
> >
> > So, going on with two real HW hosts. They are both running latest stock Arch
> > Linux kernel (4.15.3-1-ARCH, CONFIG_PREEMPT=y, CONFIG_HZ=1000) and are
> > interconnected with 1 Gbps link (via switch if that matters). Using iperf3,
> > running each test for 20 seconds.
> >
> > Having BBR+fq_codel (or pfifo_fast, same result) on both hosts:
> >
> > Client to server: 112 Mbits/sec
> > Server to client: 96.1 Mbits/sec
> >
> > Having BBR+fq on both hosts:
> >
> > Client to server: 347 Mbits/sec
> > Server to client: 397 Mbits/sec
> >
> > Having YeAH+fq on both hosts:
> > [1] https://natalenko.name/myfiles/bbr/
> >
>
> Something fishy really :
>
> 09:18:31.449903 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 76745:79641, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 2896
> 09:18:31.449916 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 79641, win 1011, options [nop,nop,TS val 3190508870 ecr
> 2327043753], length 0
> 09:18:31.449925 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 79641:83985, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 4344
> 09:18:31.449936 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 83985, win 987, options [nop,nop,TS val 3190508870 ecr
> 2327043753], length 0
> 09:18:31.450112 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 83985:86881, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 2896
> 09:18:31.450124 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 86881, win 971, options [nop,nop,TS val 3190508871 ecr
> 2327043753], length 0
> 09:18:31.450299 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 86881:91225, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 4344
> 09:18:31.450313 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 91225, win 947, options [nop,nop,TS val 3190508871 ecr
> 2327043753], length 0
> 09:18:31.450491 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 91225:92673, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 1448
> 09:18:31.450505 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 92673:94121, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508871], length 1448
> 09:18:31.450511 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 94121:95569, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 1448
> 09:18:31.450720 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 95569:101361, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 5792
> 09:18:31.450932 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 101361:105705, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 4344
> 09:18:31.451132 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 105705:110049, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 4344
> 09:18:31.451342 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 110049:111497, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 1448
> 09:18:31.455841 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 111497:112945, ack 38, win 227, options [nop,nop,TS val 2327043759
> ecr 3190508871], length 1448
>
> Not only the receiver suddenly adds a 25 ms delay, but also note that
> it acknowledges all prior segments (ack 112949), but with a wrong ecr
> value ( 2327043753 )
> instead of 2327043759
If you use
tcptrace -R test_s2c.pcap
xplot.org d2c_rtt.xpl
Then you'll see plenty of suspect 40ms rtt samples.
It looks like receiver misses wakeups for some reason,
and only the TCP delayed ACK timer is helping.
So it does not look like a sender side issue to me.
next prev parent reply other threads:[~2018-02-16 22:50 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-15 20:42 TCP and BBR: reproducibly low cwnd and bandwidth Oleksandr Natalenko
2018-02-16 15:15 ` Oleksandr Natalenko
2018-02-16 16:25 ` Eric Dumazet
2018-02-16 17:37 ` Oleksandr Natalenko
2018-02-16 16:26 ` Holger Hoffstätte
2018-02-16 16:56 ` Neal Cardwell
2018-02-16 17:13 ` Holger Hoffstätte
2018-02-16 17:35 ` Oleksandr Natalenko
2018-02-16 16:21 ` Eric Dumazet
[not found] ` <CADVnQymiswHBp32dcMvWd1WfYLpFqY4QTas8yABFQE7KKKc5ag@mail.gmail.com>
2018-02-16 16:43 ` Eric Dumazet
2018-02-16 16:45 ` Neal Cardwell
2018-02-16 17:00 ` Oleksandr Natalenko
2018-02-16 17:25 ` Oleksandr Natalenko
2018-02-16 17:56 ` Holger Hoffstätte
2018-02-16 19:54 ` Oleksandr Natalenko
2018-02-16 20:54 ` Eric Dumazet
2018-02-16 22:50 ` Eric Dumazet [this message]
2018-02-16 23:06 ` Oleksandr Natalenko
2018-02-16 22:50 ` Oleksandr Natalenko
2018-02-16 22:59 ` Eric Dumazet
2018-02-17 10:01 ` Oleksandr Natalenko
2018-02-17 18:52 ` Eric Dumazet
2018-02-18 21:04 ` Eric Dumazet
2018-02-18 21:06 ` Eric Dumazet
2018-02-18 21:49 ` Oleksandr Natalenko
2018-02-18 22:24 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518821435.55655.6.camel@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=dave.taht@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hkchu@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=oleksandr@natalenko.name \
--cc=soheil@google.com \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).