From: Florian Westphal <fw@strlen.de>
To: Steve Ibanez <sibanez@stanford.edu>
Cc: ncardwell@google.com, daniel@iogearbox.net,
Mohammad Alizadeh <alizadeh@csail.mit.edu>,
Nick McKeown <nickm@stanford.edu>,
Lavanya Jose <lavanyaj@cs.stanford.edu>,
netdev@vger.kernel.org
Subject: Re: Linux ECN Handling
Date: Thu, 19 Oct 2017 14:43:12 +0200 [thread overview]
Message-ID: <20171019124312.GE16796@breakpoint.cc> (raw)
In-Reply-To: <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
[ full-quoting due to Cc fixups, adding netdev ]
Steve Ibanez <sibanez@stanford.edu> wrote:
> Hi Florian, Neal, and Daniel,
>
> I hope this email finds you well. My name is Stephen Ibanez and I'm a PhD
> Student at Stanford currently working on a project with Mohammad Alizadeh,
> Nick McKeown, and Lavanya Jose. We have been doing some experiments using
> the linux DCTCP implementation and are trying to understand some strange
> behavior that we are encountering. I'm contacting you three because I have
> seen your names on some of the source files and recent commits in the linux
> source tree. Hopefully you can help us out or put us in contact with the
> right people?
>
> Here are some details about our servers:
>
> - Distribution: Ubuntu 14.04 LTS
> - Kernel release: 4.4.0-75-generic
Can you re-test with a more recent kernel such as 4.13.8?
> *The experiment:*
>
> We use iperf3 to generate two DCTCP flows from different servers to a
> common server, as shown in the diagram below. We measure the sending rate
> of each flow, record the tcp_probe output, as well as run tcpdump on the
> source host interfaces.
>
> [image: Inline image 6]
>
> *The problem:*
>
> Our rate measurements look like the one shown below; the flows often enter
> timeouts. In this case, both flows hit a timeout at t=0.3.
> [image: Inline image 2]
>
> When looking at the sequence of packets seen at the source host interfaces
> around this timeout event this is what we see:
>
> *10.0.0.1 timeout event:*
> [image: Inline image 3]
>
> *10.0.0.3 timeout event:*
> [image: Inline image 4]
>
> In both cases, the source:
> (1) receives an ACK for byte XYZ with the ECN flag set
> (2) stops sending anything for RTO_min=300ms
> (3) sends a retransmission for byte XYZ
>
> I have verified that this behavior is consistent across multiple experiment
> runs. Here are the CWND samples for the 10.0.0.1 flow provided by tcp_probe
> at the time of the timeout event:
>
> [image: Inline image 5]
>
> From what I can tell, tcp_probe logs a sample whenever a packet is
> received. If this is true, then that means when the source receives the
> final ECN marked ACK just before the timeout the CWND=1 MSS.
>
> *The conclusion:*
>
> We believe that there may be an issue with how the linux kernel is handling
> the ECN echoes. For DCTCP, if the CWND is 1 MSS and the end host is still
> receiving ECN marks then the CWND should remain at 1 MSS and should *not*
> enter a timeout. This is because the switch can perform ECN marking very
> aggressively causing the source end host to receive many redundant ECN
> echoes over a short period of time.
>
> Another potential issue is that from the CWND plot above it looks like the
> end host may be reacting to congestion signals more than once per window,
> which should not happen (section 5 of RF3168
> <https://tools.ietf.org/html/rfc3168>). tcp_probe reports SRTT measurements
> of about 400-500 us and in the plot above the CWND is reduced 6 times
> within this amount of time.
>
> We have not yet tracked down the code path in the kernel code that is
> causing the behavior described above. Perhaps this is something that you
> can help us with? We would love to hear your thoughts on this matter and
> are happy to try other experiments that you suggest.
>
> Here is a link
> <https://drive.google.com/file/d/0Bw-GEX7h5ufiYmpCV2VpOGEtQWs/view?usp=sharing>
> to
> download the packet traces if you would like to take a look.
> han-1_host.pcap is the trace from 10.0.0.1 and han-3_host.pcap is the trace
> from 10.0.0.3.
>
> Looking forward to hearing from you!
>
> Best,
> -Steve
next parent reply other threads:[~2017-10-19 12:43 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Florian Westphal [this message]
2017-10-23 22:15 ` Linux ECN Handling Steve Ibanez
2017-10-24 1:11 ` Neal Cardwell
2017-11-06 14:08 ` Daniel Borkmann
2017-11-06 23:31 ` Steve Ibanez
2017-11-20 7:31 ` Steve Ibanez
2017-11-20 15:05 ` Neal Cardwell
[not found] ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40 ` Eric Dumazet
2017-11-21 5:58 ` Steve Ibanez
2017-11-21 15:01 ` Neal Cardwell
2017-11-21 15:51 ` Yuchung Cheng
2017-11-21 16:20 ` Neal Cardwell
2017-11-21 16:52 ` Eric Dumazet
2017-11-22 3:02 ` Steve Ibanez
2017-11-22 3:46 ` Neal Cardwell
2017-11-27 18:49 ` Steve Ibanez
2017-12-01 16:35 ` Neal Cardwell
2017-12-05 5:22 ` Steve Ibanez
2017-12-05 15:23 ` Neal Cardwell
2017-12-05 19:36 ` Steve Ibanez
2017-12-05 20:04 ` Neal Cardwell
2017-12-19 5:16 ` Steve Ibanez
2017-12-19 15:28 ` Neal Cardwell
2017-12-19 22:00 ` Steve Ibanez
2017-12-20 0:08 ` Neal Cardwell
2017-12-20 19:20 ` Steve Ibanez
2017-12-20 20:17 ` Neal Cardwell
2018-01-02 7:43 ` Steve Ibanez
2018-01-02 16:27 ` Neal Cardwell
2018-01-02 23:57 ` Steve Ibanez
2018-01-03 19:39 ` Neal Cardwell
2018-01-03 22:21 ` Steve Ibanez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171019124312.GE16796@breakpoint.cc \
--to=fw@strlen.de \
--cc=alizadeh@csail.mit.edu \
--cc=daniel@iogearbox.net \
--cc=lavanyaj@cs.stanford.edu \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=nickm@stanford.edu \
--cc=sibanez@stanford.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.