From: Florian Westphal <fw@strlen.de>
To: Steve Ibanez <sibanez@stanford.edu>
Cc: ncardwell@google.com, daniel@iogearbox.net,
Mohammad Alizadeh <alizadeh@csail.mit.edu>,
Nick McKeown <nickm@stanford.edu>,
Lavanya Jose <lavanyaj@cs.stanford.edu>,
netdev@vger.kernel.org
Subject: Re: Linux ECN Handling
Date: Thu, 19 Oct 2017 14:43:12 +0200 [thread overview]
Message-ID: <20171019124312.GE16796@breakpoint.cc> (raw)
In-Reply-To: <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
[ full-quoting due to Cc fixups, adding netdev ]
Steve Ibanez <sibanez@stanford.edu> wrote:
> Hi Florian, Neal, and Daniel,
>
> I hope this email finds you well. My name is Stephen Ibanez and I'm a PhD
> Student at Stanford currently working on a project with Mohammad Alizadeh,
> Nick McKeown, and Lavanya Jose. We have been doing some experiments using
> the linux DCTCP implementation and are trying to understand some strange
> behavior that we are encountering. I'm contacting you three because I have
> seen your names on some of the source files and recent commits in the linux
> source tree. Hopefully you can help us out or put us in contact with the
> right people?
>
> Here are some details about our servers:
>
> - Distribution: Ubuntu 14.04 LTS
> - Kernel release: 4.4.0-75-generic
Can you re-test with a more recent kernel such as 4.13.8?
> *The experiment:*
>
> We use iperf3 to generate two DCTCP flows from different servers to a
> common server, as shown in the diagram below. We measure the sending rate
> of each flow, record the tcp_probe output, as well as run tcpdump on the
> source host interfaces.
>
> [image: Inline image 6]
>
> *The problem:*
>
> Our rate measurements look like the one shown below; the flows often enter
> timeouts. In this case, both flows hit a timeout at t=0.3.
> [image: Inline image 2]
>
> When looking at the sequence of packets seen at the source host interfaces
> around this timeout event this is what we see:
>
> *10.0.0.1 timeout event:*
> [image: Inline image 3]
>
> *10.0.0.3 timeout event:*
> [image: Inline image 4]
>
> In both cases, the source:
> (1) receives an ACK for byte XYZ with the ECN flag set
> (2) stops sending anything for RTO_min=300ms
> (3) sends a retransmission for byte XYZ
>
> I have verified that this behavior is consistent across multiple experiment
> runs. Here are the CWND samples for the 10.0.0.1 flow provided by tcp_probe
> at the time of the timeout event:
>
> [image: Inline image 5]
>
> From what I can tell, tcp_probe logs a sample whenever a packet is
> received. If this is true, then that means when the source receives the
> final ECN marked ACK just before the timeout the CWND=1 MSS.
>
> *The conclusion:*
>
> We believe that there may be an issue with how the linux kernel is handling
> the ECN echoes. For DCTCP, if the CWND is 1 MSS and the end host is still
> receiving ECN marks then the CWND should remain at 1 MSS and should *not*
> enter a timeout. This is because the switch can perform ECN marking very
> aggressively causing the source end host to receive many redundant ECN
> echoes over a short period of time.
>
> Another potential issue is that from the CWND plot above it looks like the
> end host may be reacting to congestion signals more than once per window,
> which should not happen (section 5 of RF3168
> <https://tools.ietf.org/html/rfc3168>). tcp_probe reports SRTT measurements
> of about 400-500 us and in the plot above the CWND is reduced 6 times
> within this amount of time.
>
> We have not yet tracked down the code path in the kernel code that is
> causing the behavior described above. Perhaps this is something that you
> can help us with? We would love to hear your thoughts on this matter and
> are happy to try other experiments that you suggest.
>
> Here is a link
> <https://drive.google.com/file/d/0Bw-GEX7h5ufiYmpCV2VpOGEtQWs/view?usp=sharing>
> to
> download the packet traces if you would like to take a look.
> han-1_host.pcap is the trace from 10.0.0.1 and han-3_host.pcap is the trace
> from 10.0.0.3.
>
> Looking forward to hearing from you!
>
> Best,
> -Steve
next parent reply other threads:[~2017-10-19 12:43 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Florian Westphal [this message]
2017-10-23 22:15 ` Linux ECN Handling Steve Ibanez
2017-10-24 1:11 ` Neal Cardwell
2017-11-06 14:08 ` Daniel Borkmann
2017-11-06 23:31 ` Steve Ibanez
2017-11-20 7:31 ` Steve Ibanez
2017-11-20 15:05 ` Neal Cardwell
[not found] ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40 ` Eric Dumazet
2017-11-21 5:58 ` Steve Ibanez
2017-11-21 15:01 ` Neal Cardwell
2017-11-21 15:51 ` Yuchung Cheng
2017-11-21 16:20 ` Neal Cardwell
2017-11-21 16:52 ` Eric Dumazet
2017-11-22 3:02 ` Steve Ibanez
2017-11-22 3:46 ` Neal Cardwell
2017-11-27 18:49 ` Steve Ibanez
2017-12-01 16:35 ` Neal Cardwell
2017-12-05 5:22 ` Steve Ibanez
2017-12-05 15:23 ` Neal Cardwell
2017-12-05 19:36 ` Steve Ibanez
2017-12-05 20:04 ` Neal Cardwell
2017-12-19 5:16 ` Steve Ibanez
2017-12-19 15:28 ` Neal Cardwell
2017-12-19 22:00 ` Steve Ibanez
2017-12-20 0:08 ` Neal Cardwell
2017-12-20 19:20 ` Steve Ibanez
2017-12-20 20:17 ` Neal Cardwell
2018-01-02 7:43 ` Steve Ibanez
2018-01-02 16:27 ` Neal Cardwell
2018-01-02 23:57 ` Steve Ibanez
2018-01-03 19:39 ` Neal Cardwell
2018-01-03 22:21 ` Steve Ibanez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171019124312.GE16796@breakpoint.cc \
--to=fw@strlen.de \
--cc=alizadeh@csail.mit.edu \
--cc=daniel@iogearbox.net \
--cc=lavanyaj@cs.stanford.edu \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=nickm@stanford.edu \
--cc=sibanez@stanford.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).