netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lawrence Brakmo <brakmo@fb.com>
To: netdev <netdev@vger.kernel.org>
Cc: Kernel Team <kernel-team@fb.com>, Blake Matheny <bmatheny@fb.com>,
	Alexei Starovoitov <ast@fb.com>,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	Steve Ibanez <sibanez@stanford.edu>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH net-next 0/2] tcp: fix high tail latencies in DCTCP
Date: Fri, 29 Jun 2018 18:48:13 -0700	[thread overview]
Message-ID: <20180630014815.2881895-1-brakmo@fb.com> (raw)

When have observed high tail latencies when using DCTCP for RPCs as
compared to using Cubic. For example, in one setup there are 2 hosts
sending to a 3rd one, with each sender having 3 flows (1 stream,
1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
table shows the 99% and 99.9% latencies for both Cubic and dctcp:

           Cubic 99%  Cubic 99.9%   dctcp 99%    dctcp 99.9%
  1MB RPCs    2.6ms       5.5ms         43ms          208ms
 10KB RPCs    1.1ms       1.3ms         53ms          212ms

Looking at tcpdump traces showed that there are two causes for the
latency.  

  1) RTOs caused by the receiver sending a dup ACK and not ACKing
     the last (and only) packet sent.
  2) Delaying ACKs when the sender has a cwnd of 1, so everything
     pauses for the duration of the delayed ACK.

The first patch fixes the cause of the dup ACKs, not updating DCTCP
state when an ACK that was initially delayed has been sent with a
data packet.

The second patch insures that an ACK is sent immediately when a
CWR marked packet arrives.

With the patches the latencies for DCTCP now look like:

           dctcp 99%  dctcp 99.9% 
  1MB RPCs    4.8ms       6.5ms
 10KB RPCs    143us       184us

Note that while the 1MB RPCs tail latencies are higher than Cubic's,
the 10KB latencies are much smaller than Cubic's. These patches fix
issues on the receiver, but tcpdump traces indicate there is an
opportunity to also fix an issue at the sender that adds about 3ms
to the tail latencies.

The following trace shows the issue that tiggers an RTO (fixed by these patches):

   Host A sends the last packets of the request
   Host B receives them, and the last packet is marked with congestion (CE)
   Host B sends ACKs for packets not marked with congestion
   Host B sends data packet with reply and ACK for packet marked with
          congestion (TCP flag ECE)
   Host A receives ACKs with no ECE flag
   Host A receives data packet with ACK for the last packet of request
          and which has TCP ECE bit set
   Host A sends 1st data packet of the next request with TCP flag CWR
   Host B receives the packet (as seen in tcpdump at B), no CE flag
   Host B sends a dup ACK that also has the TCP ECE flag
   Host A RTO timer fires!
   Host A to send the next packet
   Host A receives an ACK for everything it has sent (i.e. Host B
          did receive 1st packet of request)
   Host A send more packets…

[PATCH net-next 1/2] tcp: notify when a delayed ack is sent
[PATCH net-next 2/2] tcp: ack immediately when a cwr packet arrives

 net/ipv4/tcp_input.c  | 25 +++++++++++++++++--------
 net/ipv4/tcp_output.c |  2 ++
 2 files changed, 19 insertions(+), 8 deletions(-)

             reply	other threads:[~2018-06-30  1:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-30  1:48 Lawrence Brakmo [this message]
2018-06-30  1:48 ` [PATCH net-next 1/2] tcp: notify when a delayed ack is sent Lawrence Brakmo
2018-07-02 15:17   ` Neal Cardwell
2018-07-02 21:24     ` Lawrence Brakmo
2018-06-30  1:48 ` [PATCH net-next 2/2] tcp: ack immediately when a cwr packet arrives Lawrence Brakmo
2018-06-30 18:23   ` Neal Cardwell
2018-07-01  1:46     ` Lawrence Brakmo
2018-07-02 14:57       ` Neal Cardwell
2018-07-02 21:24         ` Lawrence Brakmo
2018-07-01  0:26 ` [PATCH net-next 0/2] tcp: fix high tail latencies in DCTCP Neal Cardwell
2018-07-01  4:09   ` Lawrence Brakmo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180630014815.2881895-1-brakmo@fb.com \
    --to=brakmo@fb.com \
    --cc=ast@fb.com \
    --cc=bmatheny@fb.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=sibanez@stanford.edu \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).