From: David Miller <davem@davemloft.net>
To: ilpo.jarvinen@helsinki.fi
Cc: eric.dumazet@gmail.com, leandroal@gmail.com,
netdev@vger.kernel.org, kuznet@ms2.inr.ac.ru
Subject: Re: TCP packet size and delivery packet decisions
Date: Tue, 07 Sep 2010 20:18:43 -0700 (PDT) [thread overview]
Message-ID: <20100907.201843.179933180.davem@davemloft.net> (raw)
In-Reply-To: <alpine.DEB.2.00.1009071443510.26447@wel-95.cs.helsinki.fi>
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Tue, 7 Sep 2010 15:15:25 +0300 (EEST)
[ Alexey, problem is that when receiver's maximum window is miniscule
(f.e. equal to MSS :-), we never send full MSS sized frames due to
our sender size SWS implementation. ]
> On Tue, 7 Sep 2010, Eric Dumazet wrote:
>
>> Le lundi 06 septembre 2010 à 22:30 -0700, David Miller a écrit :
>>
>>
>> > The small 78 byte window is why the sending system is splitting up the
>> > writes into smaller pieces.
>> >
>> > I presume that the system advertises exactly a 78 byte window because
>> > this is how large the commands are. But this is an extremely foolish
>> > and baroque thing to do, and it's why you are having problems.
>>
>> I am not sure why TSO added a "Bound mss with half of window"
>> requirement for tcp_sync_mss()
>
> I've thought it is more related on window behavior in general and
> much much older than TSO (it certainly seems to be old one).
>
> I guess we might run to some SWS issue if MSS < rwin < 2*MSS with your
> patch that are avoided by the current approach?
Right, this clamping is part of RFC1122 silly window syndrome avoidance.
In ancient times we used to do this straight in sendmsg(), which had
the comment:
/* We also need to worry about the window. If
* window < 1/2 the maximum window we've seen
* from this host, don't use it. This is
* sender side silly window prevention, as
* specified in RFC1122. (Note that this is
* different than earlier versions of SWS
* prevention, e.g. RFC813.). What we
* actually do is use the whole MSS. Since
* the results in the right edge of the packet
* being outside the window, it will be queued
* for later rather than sent.
*/
But in January 2000, Alexey Kuznetsov moved this logic into tcp_sync_mss().
netdev-vger-2.6 commit is:
--------------------
commit 214d457eb454a70f0f373371de044403834d8042
Author: davem <davem>
Date: Tue Jan 18 08:24:09 2000 +0000
Merge in bug fixes and small enhancements
from Alexey for the TCP/UDP softnet mega-merge
from the other day.
--------------------
Well, what is the SWS sender rule? RFC1122 states that we should send
data if (wher U == usable window, D == data queued up but not yet sent):
1) if a maximum-sized segment can be sent, i.e, if:
min(D,U) >= Eff.snd.MSS;
2) or if the data is pushed and all queued data can
be sent now, i.e., if:
[SND.NXT = SND.UNA and] PUSHED and D <= U
(the bracketed condition is imposed by the Nagle
algorithm);
3) or if at least a fraction Fs of the maximum window
can be sent, i.e., if:
[SND.NXT = SND.UNA and]
min(D.U) >= Fs * Max(SND.WND);
4) or if data is PUSHed and the override timeout
occurs.
The recommmended value for the "Fs" fraction is 1/2, so that's where
the "one-half" logic comes from.
The current code implements this by pre-chopping the MSS to
half the largest window we've seen advertised by the peer, f.e.
when doing sendmsg() packetization. Packets are packetized
to this 1/2 MAX_WINDOW MSS, represented by mss_now.
Then the send path SWS logic will send any packet that is at least
"mss_now".
Effectively we try to implement test #1 and #3 above at the same
time by just making test #1 and chopping the Eff.snd.MSS by half
the largest receive window we've seen advertised by the peer.
This strategy seems to break down when the peer's MSS and the maximum
receive window we'll ever see from the peer are the same order of
magnitude.
It seems that the conditions above really need to be checked in the
right order, and because we try to combine a later test (case #3) with
an earlier test (case #1) we don't send a full sized frame in this
special scenerio.
next prev parent reply other threads:[~2010-09-08 3:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <AANLkTi=_heUejVf-wmEcKd910gVtpD7Hr=cZ_cs2Q8n9@mail.gmail.com>
2010-09-07 4:20 ` TCP packet size and delivery packet decisions ツ Leandro Melo de Sales
2010-09-07 4:36 ` Eric Dumazet
2010-09-07 5:13 ` ツ Leandro Melo de Sales
2010-09-07 5:16 ` David Miller
2010-09-07 5:21 ` ツ Leandro Melo de Sales
2010-09-07 5:30 ` David Miller
2010-09-07 6:02 ` ツ Leandro Melo de Sales
2010-09-07 6:09 ` Eric Dumazet
2010-09-07 7:16 ` ツ Leandro Melo de Sales
2010-09-07 7:32 ` Eric Dumazet
2010-09-08 14:01 ` ツ Leandro Melo de Sales
2010-09-07 11:39 ` Eric Dumazet
2010-09-07 12:15 ` Ilpo Järvinen
2010-09-08 3:18 ` David Miller [this message]
2010-09-08 12:14 ` Alexey Kuznetsov
2010-09-13 1:23 ` David Miller
2010-09-14 9:37 ` Alexey Kuznetsov
2010-09-15 17:29 ` David Miller
2010-09-07 16:35 ` David Miller
2010-09-07 16:57 ` Rick Jones
2010-09-08 14:06 ` ツ Leandro Melo de Sales
2010-09-08 15:24 ` David Miller
2010-09-08 16:03 ` ツ Leandro Melo de Sales
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100907.201843.179933180.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=ilpo.jarvinen@helsinki.fi \
--cc=kuznet@ms2.inr.ac.ru \
--cc=leandroal@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).