From: Stephen Hemminger <shemminger@vyatta.com>
To: John Heffner <johnwheffner@gmail.com>
Cc: "Marian Ďurkovič" <md@bts.sk>, netdev@vger.kernel.org
Subject: Re: TCP rx window autotuning harmful at LAN context
Date: Mon, 9 Mar 2009 13:33:24 -0700 [thread overview]
Message-ID: <20090309133324.0dd56f82@nehalam> (raw)
In-Reply-To: <1e41a3230903091323j541d1895j2eb69b9f9c11f2f3@mail.gmail.com>
On Mon, 9 Mar 2009 13:23:15 -0700
John Heffner <johnwheffner@gmail.com> wrote:
> On Mon, Mar 9, 2009 at 1:02 PM, Marian Ďurkovič <md@bts.sk> wrote:
> > On Mon, 9 Mar 2009 11:01:52 -0700, John Heffner wrote
> >> On Mon, Mar 9, 2009 at 4:25 AM, Marian Ďurkovič <md@bts.sk> wrote:
> >> > As rx window autotuning is enabled in all recent kernels and with 1 GB
> >> > of RAM the maximum tcp_rmem becomes 4 MB, this problem is spreading rapidly
> >> > and we believe it needs urgent attention. As demontrated above, such huge
> >> > rx window (which is at least 100*BDP of the example above) does not deliver
> >> > any performance gain but instead it seriously harms other hosts and/or
> >> > applications. It should also be noted, that host with autotuning enabled
> >> > steals an unfair share of the total available bandwidth, which might look
> >> > like a "better" performing TCP stack at first sight - however such behaviour
> >> > is not appropriate (RFC2914, section 3.2).
> >>
> >> It's well known that "standard" TCP fills all available drop-tail
> >> buffers, and that this behavior is not desirable.
> >
> > Well, in practice that was always limited by receive window size, which
> > was by default 64 kB on most operating systems. So this undesirable behavior
> > was limited to hosts where receive window was manually increased to huge values.
> >
> > Today, the real effect of autotuning is the same as changing the receive window
> > size to 4 MB on *all* hosts, since there's no mechanism to prevent it from
> > growing the window to maximum even for low RTT paths.
> >
> >> The situation you describe is exactly what congestion control (the
> >> topic of RFC2914) should fix. It is not the role of receive window
> >> (flow control). It is really the sender's job to detect and react to
> >> this, not the receiver's. (We have had this discussion before on
> >> netdev.)
> >
> > It's not of high importance whose job it is according to pure theory.
> > What matters is, that autotuning introduced serious problem at LAN context
> > by disabling any possibility to properly react to increasing RTT. Again,
> > it's not important whether this functionality was there by design or by
> > coincidence, but it was holding the system well-balanced for many years.
>
> This is not a theoretical exercise, but one in good system design.
> This "well-balanced" system was really broken all along, and
> autotuning has exposed this.
>
> A drop-tail queue size of 1000 packets on a local interface is
> questionable, and I think this is the real source of your problem.
> This change was introduced a few years ago on most drivers --
> generally used to be 100 by default. This was partly because TCP
> slow-start has problems when a drop-tail queue is smaller than the
> BDP. (Limited slow-start is meant to address this problem, but
> requires tuning to the right value.) Again, using AQM is likely the
> best solution.
By default, sky2 queue is 511 pkts which is 6.2ms on @ 1G.
Probably, should be half that by default. Also there is
software transmit queue as well, which could be 0 unless some
form of AQM is being done.
>
> > Now, as autotuning is enabled by default in stock kernel, this problem is
> > spreading into LANs without users even knowing what's going on. Therefore
> > I'd like to suggest to look for a decent fix which could be implemented
> > in relatively short time frame. My proposal is this:
> >
> > - measure RTT during the initial phase of TCP connection (first X segments)
> > - compute maximal receive window size depending on measured RTT using
> > configurable constant representing the bandwidth part of BDP
> > - let autotuning do its work upto that limit.
>
> Let's take this proposal, and try it instead at the sender side, as
> part of congestion control. Would this proposal make sense in that
> position? Would you seriously consider it there?
>
> (As a side note, this is in fact what happens if you disable
> timestamps, since TCP cannot get an updated measurement of RTT without
> timestamps, only a lower bound. However, I consider this a limitation
> not a feature.)
>
> -John
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-03-09 20:33 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-09 11:25 TCP rx window autotuning harmful at LAN context Marian Ďurkovič
2009-03-09 18:01 ` John Heffner
2009-03-09 20:05 ` Marian Ďurkovič
2009-03-09 20:24 ` Stephen Hemminger
2009-03-10 0:09 ` David Miller
2009-03-10 0:34 ` Rick Jones
2009-03-10 3:55 ` John Heffner
2009-03-10 17:20 ` Rick Jones
2009-03-11 10:03 ` Andi Kleen
2009-03-11 11:03 ` Marian Ďurkovič
2009-03-11 13:30 ` David Miller
2009-03-11 15:01 ` Andi Kleen
2009-03-11 14:56 ` Marian Ďurkovič
2009-03-11 15:34 ` John Heffner
[not found] ` <20090309195906.M50328@bts.sk>
2009-03-09 20:23 ` John Heffner
2009-03-09 20:33 ` Stephen Hemminger [this message]
2009-03-09 23:52 ` David Miller
2009-03-10 0:09 ` John Heffner
2009-03-10 5:19 ` Eric Dumazet
[not found] ` <20090310104956.GA81181@bts.sk>
2009-03-10 11:30 ` David Miller
2009-03-10 11:46 ` Marian Ďurkovič
2009-03-10 15:23 ` John Heffner
2009-03-10 16:00 ` Marian Ďurkovič
2009-03-10 16:18 ` David Miller
2009-03-11 8:29 ` Marian Ďurkovič
2009-03-11 8:41 ` David Miller
2009-03-11 9:05 ` Marian Ďurkovič
2009-03-11 9:11 ` Eric Dumazet
2009-03-11 13:25 ` David Miller
2009-03-11 9:02 ` Rémi Denis-Courmont
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090309133324.0dd56f82@nehalam \
--to=shemminger@vyatta.com \
--cc=johnwheffner@gmail.com \
--cc=md@bts.sk \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).