From: John Heffner <jheffner@psc.edu>
To: Alex Sidorenko <alexandre.sidorenko@hp.com>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: SWS for rcvbuf < MTU
Date: Tue, 13 Mar 2007 15:01:50 -0400 [thread overview]
Message-ID: <45F6F51E.6090905@psc.edu> (raw)
In-Reply-To: <200703051152.27780.alexandre.sidorenko@hp.com>
[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]
Alex Sidorenko wrote:
> Here are the values from live kernel (obtained with 'crash') when the host was
> in SWS state:
>
> full_space=708 full_space/2=354
> free_space=393
> window=76
>
> In this case the test from my original fix, (window < full_space/2),
> succeeds. But John's test
>
> free_space > window + full_space/2
> 393 430
>
> does not. So I suspect that the new fix will not always work. From tcpdump
> traces we can see that both hosts exchange with 76-byte packets for a long
> time. From customer's application log we see that it continues to read
> 76-byte chunks per each read() call - even though more than that is available
> in the receive buffer. Technically it's OK for read() to return even after
> reading one byte, so if sk->receive_queue contains multiple 76-byte skbuffs
> we may return after processing just one skbuff (but we we don't understand
> the details of why this happens on customer's system).
>
> Are there any particular reasons why you want to postpone window update until
> free_space becomes > window + full_space/2 and not as soon as
> free_space > full_space/2? As the only real-life occurance of SWS shows
> free_space oscillating slightly above full_space/2, I created the fix
> specifically to match this phenomena as seen on customer's host. We reach the
> modified section only when (free_space > full_space/2) so it should be OK to
> update the window at this point if mss==full_space.
>
> So yes, we can test John's fix on customer's host but I doubt it will work for
> the reasons mentioned above, in brief:
>
> 'window = free_space' instead of 'window=full_space/2' is OK,
> but the test 'free_space > window + full_space/2' is not for the specific
> pattern customer sees on his hosts.
Sorry for the long delay in response, I've been on vacation. I'm okay
with your patch, and I can't think of any real problem with it, except
that the behavior is non-standard. Then again, Linux acking in general
is non-standard, which has created the bug in the first place. :) The
only thing I can think where it might still ack too often is if
free_space frequently drops just below full_space/2 for a bit then rises
above full_space/2.
I've also attached a corrected version of my earlier patch that I think
solves the problem you noted.
Thanks,
-John
[-- Attachment #2: rcv-sws.patch --]
[-- Type: text/plain, Size: 1180 bytes --]
Do full receiver-side SWS avoidance when rcvbuf < mss.
Signed-off-by: John Heffner <jheffner@psc.edu>
---
commit f4333661026621e15549fb75b37be785e4a1c443
tree 30d46b64ea19634875fdd4656d33f76db526a313
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400
committer John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400
net/ipv4/tcp_output.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..e621a63 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1605,8 +1605,15 @@ u32 __tcp_select_window(struct sock *sk)
* We also don't do any window rounding when the free space
* is too small.
*/
- if (window <= free_space - mss || window > free_space)
+ if (window <= free_space - mss || window > free_space) {
window = (free_space/mss)*mss;
+ } else if (mss == full_space) {
+ /* Do full receive-side SWS avoidance
+ * when rcvbuf <= mss */
+ window = tcp_receive_window(tp);
+ if (free_space > window + full_space/2)
+ window = free_space;
+ }
}
return window;
next prev parent reply other threads:[~2007-03-13 19:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-02 16:28 SWS for rcvbuf < MTU Alex Sidorenko
2007-03-02 18:54 ` John Heffner
2007-03-02 20:29 ` Alex Sidorenko
2007-03-02 19:25 ` David Miller
2007-03-02 20:21 ` Alex Sidorenko
2007-03-02 20:33 ` David Miller
2007-03-02 21:16 ` John Heffner
2007-03-02 21:38 ` David Miller
2007-03-03 23:40 ` John Heffner
2007-03-05 16:52 ` Alex Sidorenko
2007-03-13 19:01 ` John Heffner [this message]
2007-03-14 16:18 ` Alex Sidorenko
2007-04-02 20:01 ` Alex Sidorenko
2007-04-02 20:21 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F6F51E.6090905@psc.edu \
--to=jheffner@psc.edu \
--cc=alexandre.sidorenko@hp.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).