netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steve Ibanez <sibanez@stanford.edu>
To: netdev@vger.kernel.org
Cc: Florian Westphal <fw@strlen.de>,
	Mohammad Alizadeh <alizadeh@csail.mit.edu>
Subject: Re: Linux ECN Handling
Date: Mon, 23 Oct 2017 15:15:36 -0700	[thread overview]
Message-ID: <CACJspmKjAr+q9cFVssXVxWQMCUWe3TNYO77m0nQwzQK4hTCOzA@mail.gmail.com> (raw)
In-Reply-To: <20171019124312.GE16796@breakpoint.cc>

Hi All,

I upgraded the kernel on all of our machines to Linux
4.13.8-041308-lowlatency. However, I'm still observing the same
behavior where the source enters a timeout when the CWND=1MSS and it
receives ECN marks.

Here are the measured flow rates:
<https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing>

Here are snapshots of the packet traces at the sources when they both
enter a timeout at t=1.6sec:

10.0.0.1 timeout event:
<https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing>

10.0.0.3 timeout event:
<https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing>

Both still essentially follow the same sequence of events that I
mentioned earlier:
(1) receives an ACK for byte XYZ with the ECN flag set
(2) stops sending for RTO_min=300ms
(3) sends a retransmission for byte XYZ

The cwnd samples reported by tcp_probe still indicate that the sources
are reacting to the ECN marks more than once per window. Here are the
cwnd samples at the same timeout event mentioned above:
<https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing>

Let me know if there is anything else you think I should try.

Thanks,
-Steve

On Thu, Oct 19, 2017 at 5:43 AM, Florian Westphal <fw@strlen.de> wrote:
>
> [ full-quoting due to Cc fixups, adding netdev ]
>
> Steve Ibanez <sibanez@stanford.edu> wrote:
> > Hi Florian, Neal, and Daniel,
> >
> > I hope this email finds you well. My name is Stephen Ibanez and I'm a PhD
> > Student at Stanford currently working on a project with Mohammad Alizadeh,
> > Nick McKeown, and Lavanya Jose. We have been doing some experiments using
> > the linux DCTCP implementation and are trying to understand some strange
> > behavior that we are encountering. I'm contacting you three because I have
> > seen your names on some of the source files and recent commits in the linux
> > source tree. Hopefully you can help us out or put us in contact with the
> > right people?
> >
> > Here are some details about our servers:
> >
> >    - Distribution: Ubuntu 14.04 LTS
> >    - Kernel release: 4.4.0-75-generic
>
> Can you re-test with a more recent kernel such as 4.13.8?
>
> > *The experiment:*
> >
> > We use iperf3 to generate two DCTCP flows from different servers to a
> > common server, as shown in the diagram below. We measure the sending rate
> > of each flow, record the tcp_probe output, as well as run tcpdump on the
> > source host interfaces.
> >
> > [image: Inline image 6]
> >
> > *The problem:*
> >
> > Our rate measurements look like the one shown below; the flows often enter
> > timeouts. In this case, both flows hit a timeout at t=0.3.
> > [image: Inline image 2]
> >
> > When looking at the sequence of packets seen at the source host interfaces
> > around this timeout event this is what we see:
> >
> > *10.0.0.1 timeout event:*
> > [image: Inline image 3]
> >
> > *10.0.0.3 timeout event:*
> > [image: Inline image 4]
> >
> > In both cases, the source:
> > (1) receives an ACK for byte XYZ with the ECN flag set
> > (2) stops sending anything for RTO_min=300ms
> > (3) sends a retransmission for byte XYZ
> >
> > I have verified that this behavior is consistent across multiple experiment
> > runs. Here are the CWND samples for the 10.0.0.1 flow provided by tcp_probe
> > at the time of the timeout event:
> >
> > [image: Inline image 5]
> >
> > From what I can tell, tcp_probe logs a sample whenever a packet is
> > received. If this is true, then that means when the source receives the
> > final ECN marked ACK just before the timeout the CWND=1 MSS.
> >
> > *The conclusion:*
> >
> > We believe that there may be an issue with how the linux kernel is handling
> > the ECN echoes. For DCTCP, if the CWND is 1 MSS and the end host is still
> > receiving ECN marks then the CWND should remain at 1 MSS and should *not*
> > enter a timeout. This is because the switch can perform ECN marking very
> > aggressively causing the source end host to receive many redundant ECN
> > echoes over a short period of time.
> >
> > Another potential issue is that from the CWND plot above it looks like the
> > end host may be reacting to congestion signals more than once per window,
> > which should not happen (section 5 of RF3168
> > <https://tools.ietf.org/html/rfc3168>). tcp_probe reports SRTT measurements
> > of about 400-500 us and in the plot above the CWND is reduced 6 times
> > within this amount of time.
> >
> > We have not yet tracked down the code path in the kernel code that is
> > causing the behavior described above. Perhaps this is something that you
> > can help us with? We would love to hear your thoughts on this matter and
> > are happy to try other experiments that you suggest.
> >
> > Here is a link
> > <https://drive.google.com/file/d/0Bw-GEX7h5ufiYmpCV2VpOGEtQWs/view?usp=sharing>
> > to
> > download the packet traces if you would like to take a look.
> > han-1_host.pcap is the trace from 10.0.0.1 and han-3_host.pcap is the trace
> > from 10.0.0.3.
> >
> > Looking forward to hearing from you!
> >
> > Best,
> > -Steve

  reply	other threads:[~2017-10-23 23:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACJspmLFdy9i8K=TkzXHnofyFMNoJf9HkYE7On8uG+PREc2Dqw@mail.gmail.com>
2017-10-19 12:43 ` Linux ECN Handling Florian Westphal
2017-10-23 22:15   ` Steve Ibanez [this message]
2017-10-24  1:11     ` Neal Cardwell
2017-11-06 14:08       ` Daniel Borkmann
2017-11-06 23:31         ` Steve Ibanez
2017-11-20  7:31           ` Steve Ibanez
2017-11-20 15:05             ` Neal Cardwell
     [not found]             ` <CADVnQy=q4qBpe0Ymo8dtFTYU_0x0q_XKE+ZvazLQE-ULwu7pQA@mail.gmail.com>
2017-11-20 15:40               ` Eric Dumazet
2017-11-21  5:58               ` Steve Ibanez
2017-11-21 15:01                 ` Neal Cardwell
2017-11-21 15:51                   ` Yuchung Cheng
2017-11-21 16:20                     ` Neal Cardwell
2017-11-21 16:52                       ` Eric Dumazet
2017-11-22  3:02                         ` Steve Ibanez
2017-11-22  3:46                           ` Neal Cardwell
2017-11-27 18:49                             ` Steve Ibanez
2017-12-01 16:35                               ` Neal Cardwell
2017-12-05  5:22                                 ` Steve Ibanez
2017-12-05 15:23                                   ` Neal Cardwell
2017-12-05 19:36                                     ` Steve Ibanez
2017-12-05 20:04                                       ` Neal Cardwell
2017-12-19  5:16                                         ` Steve Ibanez
2017-12-19 15:28                                           ` Neal Cardwell
2017-12-19 22:00                                             ` Steve Ibanez
2017-12-20  0:08                                               ` Neal Cardwell
2017-12-20 19:20                                                 ` Steve Ibanez
2017-12-20 20:17                                                   ` Neal Cardwell
2018-01-02  7:43                                                     ` Steve Ibanez
2018-01-02 16:27                                                       ` Neal Cardwell
2018-01-02 23:57                                                         ` Steve Ibanez
2018-01-03 19:39                                                           ` Neal Cardwell
2018-01-03 22:21                                                             ` Steve Ibanez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACJspmKjAr+q9cFVssXVxWQMCUWe3TNYO77m0nQwzQK4hTCOzA@mail.gmail.com \
    --to=sibanez@stanford.edu \
    --cc=alizadeh@csail.mit.edu \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).