netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SWS for rcvbuf < MTU
@ 2007-03-02 16:28 Alex Sidorenko
  2007-03-02 18:54 ` John Heffner
  2007-03-02 19:25 ` David Miller
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Sidorenko @ 2007-03-02 16:28 UTC (permalink / raw)
  To: netdev

Hello,

this is a rare corner case met by one of HP partners on 2.4.20 on IA64. 
Inspecting the sources of the latest 2.6.20.1 (net/ipv4/tcp_output.c) we can 
see that the bug is still there.

Here is a description of the bug and the suggested fix.

The problem occurs when the remote host (not necessarily Linux - in our case 
it was Solaris) does not implement SWS avoidance on sender side. If Linux 
connection socket has rcvbuf<mtu, we can potentially advertise small rcv_wnd 
for a long time (SWS).

The problem is due to SWS avoidance as implemented in __tcp_select_window(). 
Everything works fine when rcvbuf > mtu. But if we use small rcvbuf (set by 
SO_RCVBUF), we can go into SWS mode. Let us for simplicity look only at the 
case when we don't have WS enabled. If we have free_space above full_space/2, 
we reach the following section:


        /* Don't do rounding if we are using window scaling, since the
         * scaled window will not line up with the MSS boundary anyway.
         */
        window = tp->rcv_wnd;
        if (tp->rx_opt.rcv_wscale) {
	    <snip>
        } else {
                /* Get the largest window that is a nice multiple of mss.
                 * Window clamp already applied above.
                 * If our current window offering is within 1 mss of the
                 * free space we just keep it. This prevents the divide
                 * and multiply from happening most of the time.
                 * We also don't do any window rounding when the free space
                 * is too small.
                 */
(1)              if (window <= free_space - mss || window > free_space)
                        window = (free_space/mss)*mss;
        }

        return window;

What happens if we have a small tp->rcv_wnd and rcvbuf <= mss? In this case 
condition (1) is almost always false and as a result we'll return 
unmodified 'window' set to tp->rcv_wnd.  If tp->rcv_wnd is small, it can be 
reused over and over again.

For the case rcvbuf <= mss  __tcp_select_window() returns:

  0		if we have free_space < full_space/2	OK
  mss		if rcvbuf is empty			OK
  tp->rcv_wnd	in other case				Bad


If there is no SWS avoidance on sender side, we can see Linux advertising the 
same small rcv_wnd over and over again. The problem here is that we never 
advertise one-half the receiver's buffer space as described e.g. in

"TCP/IP Illustrated" by Stevens (v.1, Chapter 22.3):

"The normal algorithm is for the receiver not to advertise a larger window 
than it is currently advertising (which can be 0) until the window can be 
increased by either one full-sized segment (i.e. the MSS being received) or by 
one-half the receiver's buffer space, whichever is smaller"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The fix.
--------

We have not been able to reproduce the problem inside HP as it is unclear what 
conditions are needed to bring system into SWS mode (this needs very special 
event timing). HP customer was seeing it every 2-3 days while running a 
custom application (Solaris<->Linux) that was running with low priority on a 
busy host running other custom applications with SCHED_RR. After going into 
SWS mode, his application stayed in it until restarted.

We provided to customer a fix for 2.4.20 only (used by customer in production) 
by adding another test and returning rcvbuf/2 when needed:

--- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
+++ net/ipv4/tcp_output.c       Tue Jan 30 14:24:56 2007
@@ -641,6 +641,7 @@
  * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
  * Regular options like TIMESTAMP are taken into account.
  */
+static const char *SWS_id_string="@#SWS-fix-2";
 u32 __tcp_select_window(struct sock *sk)
 {
        struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
@@ -682,6 +683,9 @@
        window = tp->rcv_wnd;
        if (window <= free_space - mss || window > free_space)
                window = (free_space/mss)*mss;
+        /* A fix for small rcvbuf asid@hp.com */
+       else if (mss == full_space && window < full_space/2)
+               window = full_space/2;

        return window;
 }


Customer has confirmed that this resolves the problem and decreases CPU usage 
by  his custom application - even when there is no SWS.


This is a rare corner case and most users will never meet it. But as the fix 
is trivial, I think it makes sense to include it in upstream sources. 

Regards,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 16:28 SWS for rcvbuf < MTU Alex Sidorenko
@ 2007-03-02 18:54 ` John Heffner
  2007-03-02 20:29   ` Alex Sidorenko
  2007-03-02 19:25 ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: John Heffner @ 2007-03-02 18:54 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

Alex Sidorenko wrote:
[snip]
> --- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
> +++ net/ipv4/tcp_output.c       Tue Jan 30 14:24:56 2007
> @@ -641,6 +641,7 @@
>   * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
>   * Regular options like TIMESTAMP are taken into account.
>   */
> +static const char *SWS_id_string="@#SWS-fix-2";
>  u32 __tcp_select_window(struct sock *sk)
>  {
>         struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
> @@ -682,6 +683,9 @@
>         window = tp->rcv_wnd;
>         if (window <= free_space - mss || window > free_space)
>                 window = (free_space/mss)*mss;
> +        /* A fix for small rcvbuf asid@hp.com */
> +       else if (mss == full_space && window < full_space/2)
> +               window = full_space/2;
> 
>         return window;
>  }

Good analysis of the problem, but the patch does not look quite right. 
In particular, you can't ever announce a zero window. :)

I think this attached patch does the correct SWS avoidance.

Thanks,
   -John


[-- Attachment #2: r-sws.patch --]
[-- Type: text/plain, Size: 912 bytes --]

Do receiver-side SWS avoidance for rcvbuf < MSS.

Signed-off-by: John Heffner <jheffner@psc.edu>

---
commit 38d33181c93a28cf7fb2f9f3377305a04636c054
tree 503f8a9de6e78694bae9fc2eb1c9dd5d26a0b5ed
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner <jheffner@psc.edu> Fri, 02 Mar 2007 13:47:44 -0500
committer John Heffner <jheffner@psc.edu> Fri, 02 Mar 2007 13:47:44 -0500

 net/ipv4/tcp_output.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..688b955 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1607,6 +1607,9 @@ u32 __tcp_select_window(struct sock *sk)
 		 */
 		if (window <= free_space - mss || window > free_space)
 			window = (free_space/mss)*mss;
+		else if (mss == full_space &&
+		         free_space > window + full_space/2)
+			window = free_space;
 	}
 
 	return window;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 16:28 SWS for rcvbuf < MTU Alex Sidorenko
  2007-03-02 18:54 ` John Heffner
@ 2007-03-02 19:25 ` David Miller
  2007-03-02 20:21   ` Alex Sidorenko
  1 sibling, 1 reply; 14+ messages in thread
From: David Miller @ 2007-03-02 19:25 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: netdev

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Fri, 2 Mar 2007 11:28:28 -0500

> Customer has confirmed that this resolves the problem and decreases
> CPU usage by his custom application - even when there is no SWS.

There is rarely ever a reason to set explicit socket receive
buffer sizes, since the kernel dynamically sizes them based
upon how the connection is used.

Why do they set it so low?

It is just as easy to fix their performance bug by simply removing
SO_RCVBUF setting in the application.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 19:25 ` David Miller
@ 2007-03-02 20:21   ` Alex Sidorenko
  2007-03-02 20:33     ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2007-03-02 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On March 2, 2007 02:25:42 pm David Miller wrote:
> From: Alex Sidorenko <alexandre.sidorenko@hp.com>
> Date: Fri, 2 Mar 2007 11:28:28 -0500
>
> > Customer has confirmed that this resolves the problem and decreases
> > CPU usage by his custom application - even when there is no SWS.
>
> There is rarely ever a reason to set explicit socket receive
> buffer sizes, since the kernel dynamically sizes them based
> upon how the connection is used.
>
> Why do they set it so low?
>
> It is just as easy to fix their performance bug by simply removing
> SO_RCVBUF setting in the application.

Hi David,

they told us that they use small rcvbuf to throttle bandwidth for this 
application. I explained it would be better to use TC for this purpose. They 
agreed and will probably redesign their application in the future, but they 
cannot do it right now. For the same reason they have to use the old 2.4.20 
for a while - in big companies the important production software cannot be 
changed quickly. 

The fix I suggested is trivial and should have no impact the case of 
rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.

Regards,
Alex


-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 18:54 ` John Heffner
@ 2007-03-02 20:29   ` Alex Sidorenko
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Sidorenko @ 2007-03-02 20:29 UTC (permalink / raw)
  To: John Heffner; +Cc: netdev

On March 2, 2007 01:54:45 pm John Heffner wrote:
> Alex Sidorenko wrote:
> [snip]
>
> > --- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
> > +++ net/ipv4/tcp_output.c       Tue Jan 30 14:24:56 2007
> > @@ -641,6 +641,7 @@
> >   * Note, we don't "adjust" for TIMESTAMP or SACK option bytes.
> >   * Regular options like TIMESTAMP are taken into account.
> >   */
> > +static const char *SWS_id_string="@#SWS-fix-2";
> >  u32 __tcp_select_window(struct sock *sk)
> >  {
> >         struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
> > @@ -682,6 +683,9 @@
> >         window = tp->rcv_wnd;
> >         if (window <= free_space - mss || window > free_space)
> >                 window = (free_space/mss)*mss;
> > +        /* A fix for small rcvbuf asid@hp.com */
> > +       else if (mss == full_space && window < full_space/2)
> > +               window = full_space/2;
> >
> >         return window;
> >  }
>
> Good analysis of the problem, but the patch does not look quite right.
> In particular, you can't ever announce a zero window. :)

Hi John,

in case when (free_space < full_space/2) we do not reach the modified code and
we will return zero:

    if (free_space < full_space/2) {
            icsk->icsk_ack.quick = 0;
             if (tcp_memory_pressure)
                    tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U*tp->advmss);
             if (free_space < mss)
                    return 0;
    }

Here is how windows look with the fixed kernel (from customer's test):

20:59:45.320758 Node1.logical.40171 > 11.0.0.1.39909: win = 708
20:59:45.322758 Node1.logical.40171 > 11.0.0.1.39909: win = 288
20:59:45.714567 Node1.logical.40171 > 11.0.0.1.39909: win = 354
20:59:45.717110 Node1.logical.40171 > 11.0.0.1.39909: win = 0
20:59:45.719110 Node1.logical.40171 > 11.0.0.1.39909: win = 708
...

Regards,
Alex

> I think this attached patch does the correct SWS avoidance.
>
> Thanks,
>    -John



-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 20:21   ` Alex Sidorenko
@ 2007-03-02 20:33     ` David Miller
  2007-03-02 21:16       ` John Heffner
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-03-02 20:33 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: netdev

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Fri, 2 Mar 2007 15:21:58 -0500

> they told us that they use small rcvbuf to throttle bandwidth for this 
> application. I explained it would be better to use TC for this purpose. They 
> agreed and will probably redesign their application in the future, but they 
> cannot do it right now. For the same reason they have to use the old 2.4.20 
> for a while - in big companies the important production software cannot be 
> changed quickly. 
> 
> The fix I suggested is trivial and should have no impact the case of 
> rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.

I have no objection to the fix, especially John's version.

I was just curious about the app, thanks for the info :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 20:33     ` David Miller
@ 2007-03-02 21:16       ` John Heffner
  2007-03-02 21:38         ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: John Heffner @ 2007-03-02 21:16 UTC (permalink / raw)
  To: David Miller; +Cc: alexandre.sidorenko, netdev

David Miller wrote:
> From: Alex Sidorenko <alexandre.sidorenko@hp.com>
> Date: Fri, 2 Mar 2007 15:21:58 -0500
> 
>> they told us that they use small rcvbuf to throttle bandwidth for this 
>> application. I explained it would be better to use TC for this purpose. They 
>> agreed and will probably redesign their application in the future, but they 
>> cannot do it right now. For the same reason they have to use the old 2.4.20 
>> for a while - in big companies the important production software cannot be 
>> changed quickly. 
>>
>> The fix I suggested is trivial and should have no impact the case of 
>> rcvfbuf>mtu, so I think it makes sense to include it in upstream kernel.
> 
> I have no objection to the fix, especially John's version.
> 
> I was just curious about the app, thanks for the info :)

Please don't apply the patch I sent.  I've been thinking about this a 
bit harder, and it may not fix this particular problem.  (Hard to say 
without knowing exactly what it is.)  As the comment above 
__tcp_select_window() states, we do not do full receive-side SWS 
avoidance because of header prediction.

Alex, you're right I missed that special zero-window case.  I'm still 
not quite sure I'm completely happy with this patch.  I'd like to think 
about this a little bit harder...

Thanks,
   -John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 21:16       ` John Heffner
@ 2007-03-02 21:38         ` David Miller
  2007-03-03 23:40           ` John Heffner
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-03-02 21:38 UTC (permalink / raw)
  To: jheffner; +Cc: alexandre.sidorenko, netdev

From: John Heffner <jheffner@psc.edu>
Date: Fri, 02 Mar 2007 16:16:39 -0500

> Please don't apply the patch I sent.  I've been thinking about this a 
> bit harder, and it may not fix this particular problem.  (Hard to say 
> without knowing exactly what it is.)  As the comment above 
> __tcp_select_window() states, we do not do full receive-side SWS 
> avoidance because of header prediction.
> 
> Alex, you're right I missed that special zero-window case.  I'm still 
> not quite sure I'm completely happy with this patch.  I'd like to think 
> about this a little bit harder...

Ok

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-02 21:38         ` David Miller
@ 2007-03-03 23:40           ` John Heffner
  2007-03-05 16:52             ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: John Heffner @ 2007-03-03 23:40 UTC (permalink / raw)
  To: David Miller; +Cc: alexandre.sidorenko, netdev

David Miller wrote:
> From: John Heffner <jheffner@psc.edu>
> Date: Fri, 02 Mar 2007 16:16:39 -0500
> 
>> Please don't apply the patch I sent.  I've been thinking about this a 
>> bit harder, and it may not fix this particular problem.  (Hard to say 
>> without knowing exactly what it is.)  As the comment above 
>> __tcp_select_window() states, we do not do full receive-side SWS 
>> avoidance because of header prediction.
>>
>> Alex, you're right I missed that special zero-window case.  I'm still 
>> not quite sure I'm completely happy with this patch.  I'd like to think 
>> about this a little bit harder...
> 
> Ok

Alright, I've thought about it a bit more, and I think the patch I sent 
should work.  Alex, any opinion?  Any way you can test this out?

Thanks,
   -John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-03 23:40           ` John Heffner
@ 2007-03-05 16:52             ` Alex Sidorenko
  2007-03-13 19:01               ` John Heffner
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2007-03-05 16:52 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On March 3, 2007 06:40:12 pm John Heffner wrote:
> David Miller wrote:
> > From: John Heffner <jheffner@psc.edu>
> > Date: Fri, 02 Mar 2007 16:16:39 -0500
> >
> >> Please don't apply the patch I sent.  I've been thinking about this a
> >> bit harder, and it may not fix this particular problem.  (Hard to say
> >> without knowing exactly what it is.)  As the comment above
> >> __tcp_select_window() states, we do not do full receive-side SWS
> >> avoidance because of header prediction.
> >>
> >> Alex, you're right I missed that special zero-window case.  I'm still
> >> not quite sure I'm completely happy with this patch.  I'd like to think
> >> about this a little bit harder...
> >
> > Ok
>
> Alright, I've thought about it a bit more, and I think the patch I sent
> should work.  Alex, any opinion?  Any way you can test this out?

Here are the values from live kernel (obtained with 'crash') when the host was 
in SWS state:

full_space=708		full_space/2=354
free_space=393
window=76

In this case the test from my original fix, (window < full_space/2),  
succeeds. But John's test

free_space > window + full_space/2
393          430

does not. So I suspect that the new fix will not always work. From tcpdump 
traces we can see that both hosts exchange with 76-byte packets for a long 
time. From customer's application log we see that it continues to read 
76-byte chunks per each read() call - even though more than that is available 
in the receive buffer. Technically it's OK for read() to return even after 
reading one byte, so if sk->receive_queue contains multiple 76-byte skbuffs 
we may return after processing just one skbuff (but we we don't understand 
the details of why this happens on customer's system).

Are there any particular reasons why you want to postpone window update until 
free_space becomes > window + full_space/2 and not as soon as 
free_space > full_space/2? As the only real-life occurance of SWS shows 
free_space oscillating slightly above full_space/2, I created the fix 
specifically to match this phenomena as seen on customer's host. We reach the 
modified section only when (free_space > full_space/2) so it should be OK to 
update the window at this point if mss==full_space. 

So yes, we can test John's fix on customer's host but I doubt it will work for 
the reasons mentioned above, in brief:

'window = free_space' instead of 'window=full_space/2' is OK,
but the test 'free_space > window + full_space/2' is not for the specific 
pattern customer sees on his hosts.

Thanks,
Alex


-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-05 16:52             ` Alex Sidorenko
@ 2007-03-13 19:01               ` John Heffner
  2007-03-14 16:18                 ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: John Heffner @ 2007-03-13 19:01 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Alex Sidorenko wrote:
> Here are the values from live kernel (obtained with 'crash') when the host was 
> in SWS state:
> 
> full_space=708		full_space/2=354
> free_space=393
> window=76
> 
> In this case the test from my original fix, (window < full_space/2),  
> succeeds. But John's test
> 
> free_space > window + full_space/2
> 393          430
> 
> does not. So I suspect that the new fix will not always work. From tcpdump 
> traces we can see that both hosts exchange with 76-byte packets for a long 
> time. From customer's application log we see that it continues to read 
> 76-byte chunks per each read() call - even though more than that is available 
> in the receive buffer. Technically it's OK for read() to return even after 
> reading one byte, so if sk->receive_queue contains multiple 76-byte skbuffs 
> we may return after processing just one skbuff (but we we don't understand 
> the details of why this happens on customer's system).
> 
> Are there any particular reasons why you want to postpone window update until 
> free_space becomes > window + full_space/2 and not as soon as 
> free_space > full_space/2? As the only real-life occurance of SWS shows 
> free_space oscillating slightly above full_space/2, I created the fix 
> specifically to match this phenomena as seen on customer's host. We reach the 
> modified section only when (free_space > full_space/2) so it should be OK to 
> update the window at this point if mss==full_space. 
> 
> So yes, we can test John's fix on customer's host but I doubt it will work for 
> the reasons mentioned above, in brief:
> 
> 'window = free_space' instead of 'window=full_space/2' is OK,
> but the test 'free_space > window + full_space/2' is not for the specific 
> pattern customer sees on his hosts.


Sorry for the long delay in response, I've been on vacation.  I'm okay 
with your patch, and I can't think of any real problem with it, except 
that the behavior is non-standard.  Then again, Linux acking in general 
is non-standard, which has created the bug in the first place. :)  The 
only thing I can think where it might still ack too often is if 
free_space frequently drops just below full_space/2 for a bit then rises 
above full_space/2.

I've also attached a corrected version of my earlier patch that I think 
solves the problem you noted.

Thanks,
   -John

[-- Attachment #2: rcv-sws.patch --]
[-- Type: text/plain, Size: 1180 bytes --]

Do full receiver-side SWS avoidance when rcvbuf < mss.

Signed-off-by: John Heffner <jheffner@psc.edu>

---
commit f4333661026621e15549fb75b37be785e4a1c443
tree 30d46b64ea19634875fdd4656d33f76db526a313
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400
committer John Heffner <jheffner@psc.edu> Tue, 13 Mar 2007 14:17:03 -0400

 net/ipv4/tcp_output.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..e621a63 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1605,8 +1605,15 @@ u32 __tcp_select_window(struct sock *sk)
 		 * We also don't do any window rounding when the free space
 		 * is too small.
 		 */
-		if (window <= free_space - mss || window > free_space)
+		if (window <= free_space - mss || window > free_space) {
 			window = (free_space/mss)*mss;
+		} else if (mss == full_space) {
+			/* Do full receive-side SWS avoidance
+			 * when rcvbuf <= mss */
+			window = tcp_receive_window(tp);
+			if (free_space > window + full_space/2)
+				window = free_space;
+		}
 	}
 
 	return window;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-13 19:01               ` John Heffner
@ 2007-03-14 16:18                 ` Alex Sidorenko
  2007-04-02 20:01                   ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2007-03-14 16:18 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On March 13, 2007 03:01:50 pm John Heffner wrote:
> Sorry for the long delay in response, I've been on vacation.  I'm okay
> with your patch, and I can't think of any real problem with it, except
> that the behavior is non-standard.  Then again, Linux acking in general
> is non-standard, which has created the bug in the first place. :)  The
> only thing I can think where it might still ack too often is if
> free_space frequently drops just below full_space/2 for a bit then rises
> above full_space/2.
>
> I've also attached a corrected version of my earlier patch that I think
> solves the problem you noted.

Hello John,

I think your corrected version should avoid the problem I mentioned, but I 
will still ask the customer to test the new version. It will be interesting 
to compare traffic patterns with those observed with stock kernel and my 
earlier patch.

It might take some time to do proper testing, I'll inform you as soon as there 
are any results.

Thanks,
Alex


-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-03-14 16:18                 ` Alex Sidorenko
@ 2007-04-02 20:01                   ` Alex Sidorenko
  2007-04-02 20:21                     ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2007-04-02 20:01 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On March 14, 2007 12:18:56 pm Alex Sidorenko wrote:
> On March 13, 2007 03:01:50 pm John Heffner wrote:
> > Sorry for the long delay in response, I've been on vacation.  I'm okay
> > with your patch, and I can't think of any real problem with it, except
> > that the behavior is non-standard.  Then again, Linux acking in general
> > is non-standard, which has created the bug in the first place. :)  The
> > only thing I can think where it might still ack too often is if
> > free_space frequently drops just below full_space/2 for a bit then rises
> > above full_space/2.
> >
> > I've also attached a corrected version of my earlier patch that I think
> > solves the problem you noted.
>
> Hello John,
>
> I think your corrected version should avoid the problem I mentioned, but I
> will still ask the customer to test the new version. It will be interesting
> to compare traffic patterns with those observed with stock kernel and my
> earlier patch.
>
> It might take some time to do proper testing, I'll inform you as soon as
> there are any results.

Hello John,

our customer has tested your patch thoroughly and everything's working fine. 
Please apply the patch if this is not done yet.

Thanks,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs@hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SWS for rcvbuf < MTU
  2007-04-02 20:01                   ` Alex Sidorenko
@ 2007-04-02 20:21                     ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2007-04-02 20:21 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: jheffner, netdev

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Mon, 2 Apr 2007 16:01:13 -0400

> our customer has tested your patch thoroughly and everything's working fine. 
> Please apply the patch if this is not done yet.

Thanks for following through on this Alex, I've been waiting patiently
for your test results :-))

I'll apply and push the patch now.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-04-02 20:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-02 16:28 SWS for rcvbuf < MTU Alex Sidorenko
2007-03-02 18:54 ` John Heffner
2007-03-02 20:29   ` Alex Sidorenko
2007-03-02 19:25 ` David Miller
2007-03-02 20:21   ` Alex Sidorenko
2007-03-02 20:33     ` David Miller
2007-03-02 21:16       ` John Heffner
2007-03-02 21:38         ` David Miller
2007-03-03 23:40           ` John Heffner
2007-03-05 16:52             ` Alex Sidorenko
2007-03-13 19:01               ` John Heffner
2007-03-14 16:18                 ` Alex Sidorenko
2007-04-02 20:01                   ` Alex Sidorenko
2007-04-02 20:21                     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).