From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: TCP packet size and delivery packet decisions Date: Tue, 07 Sep 2010 20:18:43 -0700 (PDT) Message-ID: <20100907.201843.179933180.davem@davemloft.net> References: <20100906.223010.173858342.davem@davemloft.net> <1283859552.2338.402.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: eric.dumazet@gmail.com, leandroal@gmail.com, netdev@vger.kernel.org, kuznet@ms2.inr.ac.ru To: ilpo.jarvinen@helsinki.fi Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:56255 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756253Ab0IHDSZ convert rfc822-to-8bit (ORCPT ); Tue, 7 Sep 2010 23:18:25 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: =46rom: "Ilpo J=E4rvinen" Date: Tue, 7 Sep 2010 15:15:25 +0300 (EEST) [ Alexey, problem is that when receiver's maximum window is miniscule (f.e. equal to MSS :-), we never send full MSS sized frames due to our sender size SWS implementation. ] > On Tue, 7 Sep 2010, Eric Dumazet wrote: >=20 >> Le lundi 06 septembre 2010 =E0 22:30 -0700, David Miller a =E9crit : >>=20 >>=20 >> > The small 78 byte window is why the sending system is splitting up= the >> > writes into smaller pieces. >> >=20 >> > I presume that the system advertises exactly a 78 byte window beca= use >> > this is how large the commands are. But this is an extremely fool= ish >> > and baroque thing to do, and it's why you are having problems. >>=20 >> I am not sure why TSO added a "Bound mss with half of window" >> requirement for tcp_sync_mss() >=20 > I've thought it is more related on window behavior in general and=20 > much much older than TSO (it certainly seems to be old one). >=20 > I guess we might run to some SWS issue if MSS < rwin < 2*MSS with you= r=20 > patch that are avoided by the current approach? Right, this clamping is part of RFC1122 silly window syndrome avoidance= =2E In ancient times we used to do this straight in sendmsg(), which had the comment: /* We also need to worry about the window. If * window < 1/2 the maximum window we've seen * from this host, don't use it. This is * sender side silly window prevention, as * specified in RFC1122. (Note that this is * different than earlier versions of SWS * prevention, e.g. RFC813.). What we * actually do is use the whole MSS. Since * the results in the right edge of the packet * being outside the window, it will be queued * for later rather than sent. */ But in January 2000, Alexey Kuznetsov moved this logic into tcp_sync_ms= s(). netdev-vger-2.6 commit is: -------------------- commit 214d457eb454a70f0f373371de044403834d8042 Author: davem Date: Tue Jan 18 08:24:09 2000 +0000 Merge in bug fixes and small enhancements from Alexey for the TCP/UDP softnet mega-merge from the other day. -------------------- Well, what is the SWS sender rule? RFC1122 states that we should send data if (wher U =3D=3D usable window, D =3D=3D data queued up but not y= et sent): 1) if a maximum-sized segment can be sent, i.e, if: min(D,U) >=3D Eff.snd.MSS; 2) or if the data is pushed and all queued data can be sent now, i.e., if: [SND.NXT =3D SND.UNA and] PUSHED and D <=3D U (the bracketed condition is imposed by the Nagle algorithm); 3) or if at least a fraction Fs of the maximum window can be sent, i.e., if: [SND.NXT =3D SND.UNA and] min(D.U) >=3D Fs * Max(SND.WND); 4) or if data is PUSHed and the override timeout occurs. The recommmended value for the "Fs" fraction is 1/2, so that's where the "one-half" logic comes from. The current code implements this by pre-chopping the MSS to half the largest window we've seen advertised by the peer, f.e. when doing sendmsg() packetization. Packets are packetized to this 1/2 MAX_WINDOW MSS, represented by mss_now. Then the send path SWS logic will send any packet that is at least "mss_now". Effectively we try to implement test #1 and #3 above at the same time by just making test #1 and chopping the Eff.snd.MSS by half the largest receive window we've seen advertised by the peer. This strategy seems to break down when the peer's MSS and the maximum receive window we'll ever see from the peer are the same order of magnitude. It seems that the conditions above really need to be checked in the right order, and because we try to combine a later test (case #3) with an earlier test (case #1) we don't send a full sized frame in this special scenerio.