From: Steve Ibanez <sibanez@stanford.edu>
To: netdev@vger.kernel.org
Cc: Florian Westphal <fw@strlen.de>,
Mohammad Alizadeh <alizadeh@csail.mit.edu>
Subject: Re: Linux ECN Handling
Date: Mon, 23 Oct 2017 15:15:36 -0700 [thread overview]
Message-ID: <CACJspmKjAr+q9cFVssXVxWQMCUWe3TNYO77m0nQwzQK4hTCOzA@mail.gmail.com> (raw)
In-Reply-To: <20171019124312.GE16796@breakpoint.cc>
Hi All,
I upgraded the kernel on all of our machines to Linux
4.13.8-041308-lowlatency. However, I'm still observing the same
behavior where the source enters a timeout when the CWND=1MSS and it
receives ECN marks.
Here are the measured flow rates:
<https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing>
Here are snapshots of the packet traces at the sources when they both
enter a timeout at t=1.6sec:
10.0.0.1 timeout event:
<https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing>
10.0.0.3 timeout event:
<https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing>
Both still essentially follow the same sequence of events that I
mentioned earlier:
(1) receives an ACK for byte XYZ with the ECN flag set
(2) stops sending for RTO_min=300ms
(3) sends a retransmission for byte XYZ
The cwnd samples reported by tcp_probe still indicate that the sources
are reacting to the ECN marks more than once per window. Here are the
cwnd samples at the same timeout event mentioned above:
<https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing>
Let me know if there is anything else you think I should try.
Thanks,
-Steve
On Thu, Oct 19, 2017 at 5:43 AM, Florian Westphal <fw@strlen.de> wrote:
>
> [ full-quoting due to Cc fixups, adding netdev ]
>
> Steve Ibanez <sibanez@stanford.edu> wrote:
> > Hi Florian, Neal, and Daniel,
> >
> > I hope this email finds you well. My name is Stephen Ibanez and I'm a PhD
> > Student at Stanford currently working on a project with Mohammad Alizadeh,
> > Nick McKeown, and Lavanya Jose. We have been doing some experiments using
> > the linux DCTCP implementation and are trying to understand some strange
> > behavior that we are encountering. I'm contacting you three because I have
> > seen your names on some of the source files and recent commits in the linux
> > source tree. Hopefully you can help us out or put us in contact with the
> > right people?
> >
> > Here are some details about our servers:
> >
> > - Distribution: Ubuntu 14.04 LTS
> > - Kernel release: 4.4.0-75-generic
>
> Can you re-test with a more recent kernel such as 4.13.8?
>
> > *The experiment:*
> >
> > We use iperf3 to generate two DCTCP flows from different servers to a
> > common server, as shown in the diagram below. We measure the sending rate
> > of each flow, record the tcp_probe output, as well as run tcpdump on the
> > source host interfaces.
> >
> > [image: Inline image 6]
> >
> > *The problem:*
> >
> > Our rate measurements look like the one shown below; the flows often enter
> > timeouts. In this case, both flows hit a timeout at t=0.3.
> > [image: Inline image 2]
> >
> > When looking at the sequence of packets seen at the source host interfaces
> > around this timeout event this is what we see:
> >
> > *10.0.0.1 timeout event:*
> > [image: Inline image 3]
> >
> > *10.0.0.3 timeout event:*
> > [image: Inline image 4]
> >
> > In both cases, the source:
> > (1) receives an ACK for byte XYZ with the ECN flag set
> > (2) stops sending anything for RTO_min=300ms
> > (3) sends a retransmission for byte XYZ
> >
> > I have verified that this behavior is consistent across multiple experiment
> > runs. Here are the CWND samples for the 10.0.0.1 flow provided by tcp_probe
> > at the time of the timeout event:
> >
> > [image: Inline image 5]
> >
> > From what I can tell, tcp_probe logs a sample whenever a packet is
> > received. If this is true, then that means when the source receives the
> > final ECN marked ACK just before the timeout the CWND=1 MSS.
> >
> > *The conclusion:*
> >
> > We believe that there may be an issue with how the linux kernel is handling
> > the ECN echoes. For DCTCP, if the CWND is 1 MSS and the end host is still
> > receiving ECN marks then the CWND should remain at 1 MSS and should *not*
> > enter a timeout. This is because the switch can perform ECN marking very
> > aggressively causing the source end host to receive many redundant ECN
> > echoes over a short period of time.
> >
> > Another potential issue is that from the CWND plot above it looks like the
> > end host may be reacting to congestion signals more than once per window,
> > which should not happen (section 5 of RF3168
> > <https://tools.ietf.org/html/rfc3168>). tcp_probe reports SRTT measurements
> > of about 400-500 us and in the plot above the CWND is reduced 6 times
> > within this amount of time.
> >
> > We have not yet tracked down the code path in the kernel code that is
> > causing the behavior described above. Perhaps this is something that you
> > can help us with? We would love to hear your thoughts on this matter and
> > are happy to try other experiments that you suggest.
> >
> > Here is a link
> > <https://drive.google.com/file/d/0Bw-GEX7h5ufiYmpCV2VpOGEtQWs/view?usp=sharing>
> > to
> > download the packet traces if you would like to take a look.
> > han-1_host.pcap is the trace from 10.0.0.1 and han-3_host.pcap is the trace
> > from 10.0.0.3.
> >
> > Looking forward to hearing from you!
> >
> > Best,
> > -Steve
next prev parent reply other threads:[~2017-10-23 23:17 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Linux ECN Handling Florian Westphal
2017-10-23 22:15 ` Steve Ibanez [this message]
2017-10-24 1:11 ` Neal Cardwell
2017-11-06 14:08 ` Daniel Borkmann
2017-11-06 23:31 ` Steve Ibanez
2017-11-20 7:31 ` Steve Ibanez
2017-11-20 15:05 ` Neal Cardwell
[not found] ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40 ` Eric Dumazet
2017-11-21 5:58 ` Steve Ibanez
2017-11-21 15:01 ` Neal Cardwell
2017-11-21 15:51 ` Yuchung Cheng
2017-11-21 16:20 ` Neal Cardwell
2017-11-21 16:52 ` Eric Dumazet
2017-11-22 3:02 ` Steve Ibanez
2017-11-22 3:46 ` Neal Cardwell
2017-11-27 18:49 ` Steve Ibanez
2017-12-01 16:35 ` Neal Cardwell
2017-12-05 5:22 ` Steve Ibanez
2017-12-05 15:23 ` Neal Cardwell
2017-12-05 19:36 ` Steve Ibanez
2017-12-05 20:04 ` Neal Cardwell
2017-12-19 5:16 ` Steve Ibanez
2017-12-19 15:28 ` Neal Cardwell
2017-12-19 22:00 ` Steve Ibanez
2017-12-20 0:08 ` Neal Cardwell
2017-12-20 19:20 ` Steve Ibanez
2017-12-20 20:17 ` Neal Cardwell
2018-01-02 7:43 ` Steve Ibanez
2018-01-02 16:27 ` Neal Cardwell
2018-01-02 23:57 ` Steve Ibanez
2018-01-03 19:39 ` Neal Cardwell
2018-01-03 22:21 ` Steve Ibanez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACJspmKjAr+q9cFVssXVxWQMCUWe3TNYO77m0nQwzQK4hTCOzA@mail.gmail.com \
--to=sibanez@stanford.edu \
--cc=alizadeh@csail.mit.edu \
--cc=fw@strlen.de \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).