netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Heffner <jheffner@psc.edu>
To: Alex Sidorenko <alexandre.sidorenko@hp.com>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: SWS for rcvbuf < MTU
Date: Tue, 13 Mar 2007 15:01:50 -0400	[thread overview]
Message-ID: <45F6F51E.6090905@psc.edu> (raw)
In-Reply-To: <200703051152.27780.alexandre.sidorenko@hp.com>

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Alex Sidorenko wrote:
> Here are the values from live kernel (obtained with 'crash') when the host was 
> in SWS state:
> 
> full_space=708		full_space/2=354
> free_space=393
> window=76
> 
> In this case the test from my original fix, (window < full_space/2),  
> succeeds. But John's test
> 
> free_space > window + full_space/2
> 393          430
> 
> does not. So I suspect that the new fix will not always work. From tcpdump 
> traces we can see that both hosts exchange with 76-byte packets for a long 
> time. From customer's application log we see that it continues to read 
> 76-byte chunks per each read() call - even though more than that is available 
> in the receive buffer. Technically it's OK for read() to return even after 
> reading one byte, so if sk->receive_queue contains multiple 76-byte skbuffs 
> we may return after processing just one skbuff (but we we don't understand 
> the details of why this happens on customer's system).
> 
> Are there any particular reasons why you want to postpone window update until 
> free_space becomes > window + full_space/2 and not as soon as 
> free_space > full_space/2? As the only real-life occurance of SWS shows 
> free_space oscillating slightly above full_space/2, I created the fix 
> specifically to match this phenomena as seen on customer's host. We reach the 
> modified section only when (free_space > full_space/2) so it should be OK to 
> update the window at this point if mss==full_space. 
> 
> So yes, we can test John's fix on customer's host but I doubt it will work for 
> the reasons mentioned above, in brief:
> 
> 'window = free_space' instead of 'window=full_space/2' is OK,
> but the test 'free_space > window + full_space/2' is not for the specific 
> pattern customer sees on his hosts.


Sorry for the long delay in response, I've been on vacation.  I'm okay 
with your patch, and I can't think of any real problem with it, except 
that the behavior is non-standard.  Then again, Linux acking in general 
is non-standard, which has created the bug in the first place. :)  The 
only thing I can think where it might still ack too often is if 
free_space frequently drops just below full_space/2 for a bit then rises 
above full_space/2.

I've also attached a corrected version of my earlier patch that I think 
solves the problem you noted.

Thanks,
   -John

[-- Attachment #2: rcv-sws.patch --]
[-- Type: text/plain, Size: 1180 bytes --]

Do full receiver-side SWS avoidance when rcvbuf < mss.

Signed-off-by: John Heffner <jheffner@psc.edu>

---
commit f4333661026621e15549fb75b37be785e4a1c443
tree 30d46b64ea19634875fdd4656d33f76db526a313
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400
committer John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400

 net/ipv4/tcp_output.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..e621a63 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1605,8 +1605,15 @@ u32 __tcp_select_window(struct sock *sk)
 		 * We also don't do any window rounding when the free space
 		 * is too small.
 		 */
-		if (window <= free_space - mss || window > free_space)
+		if (window <= free_space - mss || window > free_space) {
 			window = (free_space/mss)*mss;
+		} else if (mss == full_space) {
+			/* Do full receive-side SWS avoidance
+			 * when rcvbuf <= mss */
+			window = tcp_receive_window(tp);
+			if (free_space > window + full_space/2)
+				window = free_space;
+		}
 	}
 
 	return window;

  reply	other threads:[~2007-03-13 19:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-02 16:28 SWS for rcvbuf < MTU Alex Sidorenko
2007-03-02 18:54 ` John Heffner
2007-03-02 20:29   ` Alex Sidorenko
2007-03-02 19:25 ` David Miller
2007-03-02 20:21   ` Alex Sidorenko
2007-03-02 20:33     ` David Miller
2007-03-02 21:16       ` John Heffner
2007-03-02 21:38         ` David Miller
2007-03-03 23:40           ` John Heffner
2007-03-05 16:52             ` Alex Sidorenko
2007-03-13 19:01               ` John Heffner [this message]
2007-03-14 16:18                 ` Alex Sidorenko
2007-04-02 20:01                   ` Alex Sidorenko
2007-04-02 20:21                     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F6F51E.6090905@psc.edu \
    --to=jheffner@psc.edu \
    --cc=alexandre.sidorenko@hp.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).