netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Raise initial congestion window size / speedup slow start?
       [not found] ` <4C3DD5EB.9070908@tmr.com>
@ 2010-07-14 18:15   ` David Miller
  2010-07-14 18:48     ` Ed W
                       ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: David Miller @ 2010-07-14 18:15 UTC (permalink / raw)
  To: davidsen; +Cc: lists, linux-kernel, netdev

From: Bill Davidsen <davidsen@tmr.com>
Date: Wed, 14 Jul 2010 11:21:15 -0400

> You may have to go into /proc/sys/net/core and crank up the
> rmem_* settings, depending on your distribution.

You should never, ever, have to touch the various networking sysctl
values to get good performance in any normal setup.  If you do, it's a
bug, report it so we can fix it.

I cringe every time someone says to do this, so please do me a favor
and don't spread this further. :-)

For one thing, TCP dynamically adjusts the socket buffer sizes based
upon the behavior of traffic on the connection.

And the TCP memory limit sysctls (not the core socket ones) are sized
based upon available memory.  They are there to protect you from
situations such as having so much memory dedicated to socket buffers
that there is none left to do other things effectively.  It's a
protective limit, rather than a setting meant to increase or improve
performance.  So like the others, leave these alone too.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:15   ` Raise initial congestion window size / speedup slow start? David Miller
@ 2010-07-14 18:48     ` Ed W
  2010-07-14 19:10       ` Stephen Hemminger
  2010-07-14 20:17       ` Rick Jones
  2010-07-15  2:52     ` Bill Fink
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 37+ messages in thread
From: Ed W @ 2010-07-14 18:48 UTC (permalink / raw)
  To: David Miller; +Cc: davidsen, linux-kernel, netdev

On 14/07/2010 19:15, David Miller wrote:
> From: Bill Davidsen<davidsen@tmr.com>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
>    
>> You may have to go into /proc/sys/net/core and crank up the
>> rmem_* settings, depending on your distribution.
>>      
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup.  If you do, it's a
> bug, report it so we can fix it.
>    

Just checking the basics here because I don't think this is a bug so 
much as a, less common installation that differs from the "normal" case.

- When we create a tcp connection we always start with tcp slow start
- This sets the congestion window to effectively 4 packets?
- This applies in both directions?
- Remote sender responds to my hypothetical http request with the first 
4 packets of data
- We need to wait one RTT for the ack to come back and now we can send 
the next 8 packets,
- Wait for the next ack and at 16 packets we are now moving at a 
sensible fraction of the bandwidth delay product?

So just to be clear:
- We don't seem to have any user-space tuning knobs to influence this 
right now?
- In this age of short attention spans, a couple of extra seconds 
between clicking something and it responding is worth optimising (IMHO)
- I think I need to take this to netdev, but anyone else with any ideas 
happy to hear them?

Thanks

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:48     ` Ed W
@ 2010-07-14 19:10       ` Stephen Hemminger
  2010-07-14 21:47         ` Mitchell Erblich
  2010-07-14 20:17       ` Rick Jones
  1 sibling, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2010-07-14 19:10 UTC (permalink / raw)
  To: Ed W; +Cc: David Miller, davidsen, linux-kernel, netdev

On Wed, 14 Jul 2010 19:48:36 +0100
Ed W <lists@wildgooses.com> wrote:

> On 14/07/2010 19:15, David Miller wrote:
> > From: Bill Davidsen<davidsen@tmr.com>
> > Date: Wed, 14 Jul 2010 11:21:15 -0400
> >
> >    
> >> You may have to go into /proc/sys/net/core and crank up the
> >> rmem_* settings, depending on your distribution.
> >>      
> > You should never, ever, have to touch the various networking sysctl
> > values to get good performance in any normal setup.  If you do, it's a
> > bug, report it so we can fix it.
> >    
> 
> Just checking the basics here because I don't think this is a bug so 
> much as a, less common installation that differs from the "normal" case.
> 
> - When we create a tcp connection we always start with tcp slow start
> - This sets the congestion window to effectively 4 packets?
> - This applies in both directions?
> - Remote sender responds to my hypothetical http request with the first 
> 4 packets of data
> - We need to wait one RTT for the ack to come back and now we can send 
> the next 8 packets,
> - Wait for the next ack and at 16 packets we are now moving at a 
> sensible fraction of the bandwidth delay product?
> 
> So just to be clear:
> - We don't seem to have any user-space tuning knobs to influence this 
> right now?
> - In this age of short attention spans, a couple of extra seconds 
> between clicking something and it responding is worth optimising (IMHO)
> - I think I need to take this to netdev, but anyone else with any ideas 
> happy to hear them?
> 
> Thanks
> 
> Ed W

TCP slow start is required by the RFC. It is there to prevent a TCP congestion
collapse. The HTTP problem is exacerbated by things beyond the user's control:
  1. stupid server software that dribbles out data and doesn't used the full
    payload of the packets
  2. web pages with data from multiple sources (ads especially), each of which
    requires a new connection
  3. pages with huge graphics.

Most of this is because of sites that haven't figured out that somebody on a phone
across the globl might not have the same RTT and bandwidth that the developer on a
local network that created them.  Changing the initial cwnd isn't going to fix it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:48     ` Ed W
  2010-07-14 19:10       ` Stephen Hemminger
@ 2010-07-14 20:17       ` Rick Jones
  2010-07-14 20:39         ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 37+ messages in thread
From: Rick Jones @ 2010-07-14 20:17 UTC (permalink / raw)
  To: Ed W; +Cc: David Miller, davidsen, linux-kernel, netdev

Ed W wrote:

> 
> Just checking the basics here because I don't think this is a bug so 
> much as a, less common installation that differs from the "normal" case.
> 
> - When we create a tcp connection we always start with tcp slow start
> - This sets the congestion window to effectively 4 packets?
> - This applies in both directions?

Any TCP sender in some degree of compliance with the RFCs on the topic will 
employ slow-start.

Linux adds the auto-tuning of the receiver's advertised window.  It will start 
at a small size, and then grow it as it sees fit.

> - Remote sender responds to my hypothetical http request with the first 
> 4 packets of data
> - We need to wait one RTT for the ack to come back and now we can send 
> the next 8 packets,
> - Wait for the next ack and at 16 packets we are now moving at a 
> sensible fraction of the bandwidth delay product?

There may be some wrinkles depending on how many ACKs the reciever generates 
(LRO being enabled and such) and how the ACKs get counted.

> So just to be clear:
> - We don't seem to have any user-space tuning knobs to influence this 
> right now?
> - In this age of short attention spans, a couple of extra seconds 
> between clicking something and it responding is worth optimising (IMHO)

There is an effort under way, lead by some folks at Google and including some 
others, to get the RFC's enhanced in support of the concept of larger initial 
congestion windows.  Some of the discussion may be in the "tcpm" mailing list 
(assuming I've not gotten my mailing lists confused).  There may be some 
previous discussion of that work in the netdev archives as well.

rick jones

> - I think I need to take this to netdev, but anyone else with any ideas 
> happy to hear them?
> 
> Thanks
> 
> Ed W
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 20:17       ` Rick Jones
@ 2010-07-14 20:39         ` Hagen Paul Pfeifer
  2010-07-14 21:55           ` David Miller
                             ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-14 20:39 UTC (permalink / raw)
  To: Rick Jones; +Cc: Ed W, David Miller, davidsen, linux-kernel, netdev

* Rick Jones | 2010-07-14 13:17:24 [-0700]:

>There is an effort under way, lead by some folks at Google and
>including some others, to get the RFC's enhanced in support of the
>concept of larger initial congestion windows.  Some of the discussion
>may be in the "tcpm" mailing list (assuming I've not gotten my
>mailing lists confused).  There may be some previous discussion of
>that work in the netdev archives as well.

tcpm is the right mailing list but there is currently no effort to develop
this topic. Why? Because is not a standardization issue, rather it is a
technical issue. You cannot rise the initial CWND and expect a fair behavior.
This was discussed several times and is documented in several documents and
RFCs.

RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
pop's of every two months in netdev and until now I _never_ read a
consolidated contribution.

Partial local issues can already be "fixed" via route specific ip options -
see initcwnd.

HGN







^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 19:10       ` Stephen Hemminger
@ 2010-07-14 21:47         ` Mitchell Erblich
  0 siblings, 0 replies; 37+ messages in thread
From: Mitchell Erblich @ 2010-07-14 21:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ed W, David Miller, davidsen, linux-kernel, netdev


On Jul 14, 2010, at 12:10 PM, Stephen Hemminger wrote:

> On Wed, 14 Jul 2010 19:48:36 +0100
> Ed W <lists@wildgooses.com> wrote:
> 
>> On 14/07/2010 19:15, David Miller wrote:
>>> From: Bill Davidsen<davidsen@tmr.com>
>>> Date: Wed, 14 Jul 2010 11:21:15 -0400
>>> 
>>> 
>>>> You may have to go into /proc/sys/net/core and crank up the
>>>> rmem_* settings, depending on your distribution.
>>>> 
>>> You should never, ever, have to touch the various networking sysctl
>>> values to get good performance in any normal setup.  If you do, it's a
>>> bug, report it so we can fix it.
>>> 
>> 
>> Just checking the basics here because I don't think this is a bug so 
>> much as a, less common installation that differs from the "normal" case.
>> 
>> - When we create a tcp connection we always start with tcp slow start
>> - This sets the congestion window to effectively 4 packets?
>> - This applies in both directions?
>> - Remote sender responds to my hypothetical http request with the first 
>> 4 packets of data
>> - We need to wait one RTT for the ack to come back and now we can send 
>> the next 8 packets,
>> - Wait for the next ack and at 16 packets we are now moving at a 
>> sensible fraction of the bandwidth delay product?
>> 
>> So just to be clear:
>> - We don't seem to have any user-space tuning knobs to influence this 
>> right now?
>> - In this age of short attention spans, a couple of extra seconds 
>> between clicking something and it responding is worth optimising (IMHO)
>> - I think I need to take this to netdev, but anyone else with any ideas 
>> happy to hear them?
>> 
>> Thanks
>> 
>> Ed W
> 
> TCP slow start is required by the RFC. It is there to prevent a TCP congestion
> collapse. The HTTP problem is exacerbated by things beyond the user's control:
>  1. stupid server software that dribbles out data and doesn't used the full
>    payload of the packets
>  2. web pages with data from multiple sources (ads especially), each of which
>    requires a new connection
>  3. pages with huge graphics.
> 
> Most of this is because of sites that haven't figured out that somebody on a phone
> across the globl might not have the same RTT and bandwidth that the developer on a
> local network that created them.  Changing the initial cwnd isn't going to fix it.
> --

IMO, in theory  one of the RFCs state a window with 4 ETH MTU (~6k window)
size packets/segment to allow a fast retransmit if a pkt is dropped.

I thought their is a fast-rexmit knob of 2 or 3 DUPACKs, for faster loss recovery.
Theorecticly it could be set to 1 DUPACK for lossey environments.

Now, the orig slow-start doubles the number of pkts per RTT assuming no loss,
which is a faster ramp up vs the orig congestion avoidance.

Now, with IPv4 with a default of 576 sized segments, without invalidating
the amount of data, 12 pkts could be sent. This would be helpful if your
app only generates smaller buffers,  gets more ACKs in return which sets
the ACK clocking at a faster rate. To compensate for the smaller pkt, the ABC
Experimental  RFC does byte counting to suggest fairness.

During a few round trips, the pkt size could be increased to the 1.5k ETH MTU
and hopefully to even a 9k Jumbo, probing with one increasing sized pkt.
(?to prevent rexmit of the too large pkt, overlap the increasing pkt with the next
one?)

Mitchell Erblich

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 20:39         ` Hagen Paul Pfeifer
@ 2010-07-14 21:55           ` David Miller
  2010-07-14 22:13             ` Hagen Paul Pfeifer
  2010-07-14 22:05           ` Ed W
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-07-14 21:55 UTC (permalink / raw)
  To: hagen; +Cc: rick.jones2, lists, davidsen, linux-kernel, netdev

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Wed, 14 Jul 2010 22:39:19 +0200

> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
> 
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows.  Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused).  There may be some previous discussion of
>>that work in the netdev archives as well.
> 
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
> 
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
> 
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.

Although section 3 of RFC 5681 is a great text, it does not say at all
that increasing the initial CWND would lead to fairness issues.

To be honest, I think google's proposal holds a lot of weight.  If
over time link sizes and speeds are increasing (they are) then nudging
the initial CWND every so often is a legitimate proposal.  Were
someone to claim that utilization is lower than it could be because of
the currenttly specified initial CWND, I would have no problem
believing them.

And I'm happy to make Linux use an increased value once it has
traction in the standardization community.

But for all we know this side discussion about initial CWND settings
could have nothing to do with the issue being reported at the start of
this thread. :-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 20:39         ` Hagen Paul Pfeifer
  2010-07-14 21:55           ` David Miller
@ 2010-07-14 22:05           ` Ed W
  2010-07-14 22:36             ` Hagen Paul Pfeifer
  2010-07-15  4:12           ` Tom Herbert
  2010-07-15  5:09           ` H.K. Jerry Chu
  3 siblings, 1 reply; 37+ messages in thread
From: Ed W @ 2010-07-14 22:05 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev

On 14/07/2010 21:39, Hagen Paul Pfeifer wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>    
>> There is an effort under way, lead by some folks at Google and
>> including some others, to get the RFC's enhanced in support of the
>> concept of larger initial congestion windows.  Some of the discussion
>> may be in the "tcpm" mailing list (assuming I've not gotten my
>> mailing lists confused).  There may be some previous discussion of
>> that work in the netdev archives as well.
>>      
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>    

I'm sure you have covered this to the point you are fed up, but my 
searches turn up only a smattering of posts covering this - could you 
summarise why "you cannot raise the initial cwnd and expect a fair 
behaviour"?

Initial cwnd was changed (increased) in the past (rfc3390) and the RFC 
claims that studies then suggested that the benefits were all positive. 
Some reasonably smart people have suggested that it might be time to 
review the status quo again so it doesn't seem completely obvious that 
the current number is optimal?

> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>    

Sorry, what do you mean by a "consolidated contribution"?

That RFC is a subtle read - it appears to give more specific guidance on 
what to do in certain situations, but I'm not sure I see that it 
improves slow start convergence speed for my situation (large RTT)?  
Would you mind highlighting the new bits for those of us a bit newer to 
the subject?

> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>    

Oh, excellent.  This seems like exactly what I'm after.  (Thanks Stephen 
Hemminger!)

Many thanks

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 21:55           ` David Miller
@ 2010-07-14 22:13             ` Hagen Paul Pfeifer
  2010-07-14 22:19               ` Rick Jones
                                 ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-14 22:13 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, lists, davidsen, linux-kernel, netdev

* David Miller | 2010-07-14 14:55:47 [-0700]:

>Although section 3 of RFC 5681 is a great text, it does not say at all
>that increasing the initial CWND would lead to fairness issues.

Because it is only one side of the medal, probing conservative the available
link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
instances is another.

>To be honest, I think google's proposal holds a lot of weight.  If
>over time link sizes and speeds are increasing (they are) then nudging
>the initial CWND every so often is a legitimate proposal.  Were
>someone to claim that utilization is lower than it could be because of
>the currenttly specified initial CWND, I would have no problem
>believing them.
>
>And I'm happy to make Linux use an increased value once it has
>traction in the standardization community.

Currently I know no working link capacity probing approach, without active
network feedback, to conservatively probing the available link capacity with a
high CWND. I am curious about any future trends.

>But for all we know this side discussion about initial CWND settings
>could have nothing to do with the issue being reported at the start of
>this thread. :-)

;-) sure, but it is often wise to thwart these kind of discussions. It seems
these CWND discussions turn up once every other month. ;-)

Hagen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:13             ` Hagen Paul Pfeifer
@ 2010-07-14 22:19               ` Rick Jones
  2010-07-14 22:40                 ` Hagen Paul Pfeifer
  2010-07-14 22:52               ` Ed W
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Rick Jones @ 2010-07-14 22:19 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: David Miller, lists, davidsen, linux-kernel, netdev

Hagen Paul Pfeifer wrote:
> * David Miller | 2010-07-14 14:55:47 [-0700]:
>>But for all we know this side discussion about initial CWND settings
>>could have nothing to do with the issue being reported at the start of
>>this thread. :-)
> 
> 
> ;-) sure, but it is often wise to thwart these kind of discussions. It seems
> these CWND discussions turn up once every other month. ;-)

Which suggests there is a constant "force" out there yet to be rekoned with. :)

rick jones

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:05           ` Ed W
@ 2010-07-14 22:36             ` Hagen Paul Pfeifer
  2010-07-14 23:01               ` Ed W
  0 siblings, 1 reply; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-14 22:36 UTC (permalink / raw)
  To: Ed W; +Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev

* Ed W | 2010-07-14 23:05:31 [+0100]:

>Initial cwnd was changed (increased) in the past (rfc3390) and the
>RFC claims that studies then suggested that the benefits were all
>positive. Some reasonably smart people have suggested that it might
>be time to review the status quo again so it doesn't seem completely
>obvious that the current number is optimal?

Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
People at google stated that a CWND of 10 seems to be fair in their
measurements. 10 because the test setup was equipped with a reasonable large
link capacity? Do they analyse their modification in environments with a small
BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
happens if TCPM adopts this.

>That RFC is a subtle read - it appears to give more specific guidance
>on what to do in certain situations, but I'm not sure I see that it
>improves slow start convergence speed for my situation (large RTT)?
>Would you mind highlighting the new bits for those of us a bit newer
>to the subject?

The objection/hint was more of general nature - not specific for larger RTTs.
Environments with larger RTTs are disadvantaged because TCP is ACK clocked.
Half-truth statement for my part because RTT fairness is and was an issue at
the development of new congestion control algorithms: BIC, CUBIC and friends.

>>Partial local issues can already be "fixed" via route specific ip options -
>>see initcwnd.
>
>Oh, excellent.  This seems like exactly what I'm after.  (Thanks
>Stephen Hemminger!)

Great, you are welcome! ;-)


Hagen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:19               ` Rick Jones
@ 2010-07-14 22:40                 ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-14 22:40 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, lists, davidsen, linux-kernel, netdev

* Rick Jones | 2010-07-14 15:19:35 [-0700]:

>>;-) sure, but it is often wise to thwart these kind of discussions. It seems
>>these CWND discussions turn up once every other month. ;-)
>
>Which suggests there is a constant "force" out there yet to be rekoned with. :)

;-) I am _not_ unconscious, but the better address for this kind of
discussions is still tcpm.

Hagen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:13             ` Hagen Paul Pfeifer
  2010-07-14 22:19               ` Rick Jones
@ 2010-07-14 22:52               ` Ed W
  2010-07-14 23:01                 ` Hagen Paul Pfeifer
  2010-07-15  3:49               ` Bill Fink
  2010-07-15 10:33               ` Alan Cox
  3 siblings, 1 reply; 37+ messages in thread
From: Ed W @ 2010-07-14 22:52 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev


>> Although section 3 of RFC 5681 is a great text, it does not say at all
>> that increasing the initial CWND would lead to fairness issues.
>>      
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>    

So lets define the problem more succinctly:
- New TCP connections are assumed to have no knowledge of current 
network conditions (bah)
- We desire the connection to consume the maximum amount of bandwidth 
possible, but staying ever so fractionally under the maximum link bandwidth

> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.
>    

Sounds like smarter people than I have played this game, but just to 
chuck out one idea: How about attacking the idea that we have no 
knowledge of network conditions?  After all we have a bunch of 
information about:

1) very good information about the size of the link to the first hop (eg 
the modem/network card reported rate)
2) often a reasonably good idea about the bandwidth to the first 
"restrictive" router along our default path (ie usually the situation is 
there is a pool of high speed network locally, then a more limited 
connectivity between our network and other networks.  We can look at the 
maximum flows through our network device to outside our subnet and infer 
an approximate link speed from that)
3) often moderate quality information about the size of the link between 
us and a specific destination IP

So here goes: the heuristic could be to examine current flows through 
our interface, use this to offer hints to the remote end during SYN 
handshake as to a recommended starting size, and additionally the client 
side can examine the implied RTT of the SYN/ACK to further fine tune the 
initial cwnd?

In practice this could be implemented in other ways such as examining 
recent TCP congestion windows and using some heuristic to start "near" 
those.  Or remembering congestion windows recently used for popular 
destinations?  Also we can benefit the receiver of our data - if we see 
some app open up 16 http connections to some poor server then some of 
those connections will NOT be given large initial cwnd.

Essentially perhaps we can refine our initial cwnd heuristic somewhat if 
we assume better than zero knowledge about the network link?


Out of curiousity, why has it taken so long for active feedback to 
appear?  If every router simply added a hint to the packet as to the max 
bandwidth it can offer then we would appear to be able to make massively 
better decisions on window sizes.  Furthermore routers have the ability 
to put backpressure on classes of traffic as appropriate.  I guess the 
speed at which ECN has been adopted answers the question of why nothing 
more exotic has appeared?

>> But for all we know this side discussion about initial CWND settings
>> could have nothing to do with the issue being reported at the start of
>> this thread. :-)
>>      

Actually the original question was mine and it was literally - can I 
adjust the initial cwnd for users of my very specific satellite network 
which has a high RTT.  I believe Stephen Hemminger has been kind enough 
to recently add the facility to experiment with this to the ip utility 
and so I am now in a position to go do some testing - thanks Stephen


Cheers

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:52               ` Ed W
@ 2010-07-14 23:01                 ` Hagen Paul Pfeifer
  2010-07-14 23:05                   ` Ed W
  0 siblings, 1 reply; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-14 23:01 UTC (permalink / raw)
  To: Ed W; +Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev

* Ed W | 2010-07-14 23:52:02 [+0100]:

>Out of curiousity, why has it taken so long for active feedback to
>appear?  If every router simply added a hint to the packet as to the
>max bandwidth it can offer then we would appear to be able to make
>massively better decisions on window sizes.  Furthermore routers have
>the ability to put backpressure on classes of traffic as appropriate.
>I guess the speed at which ECN has been adopted answers the question
>of why nothing more exotic has appeared?

It is quite late here so I will quickly write two sentence about ECN: one
month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
really sure if it was google) analysed the employment of ECN - the usage was
really low. Search the PDF, it is quite interesting one.

Hagen



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:36             ` Hagen Paul Pfeifer
@ 2010-07-14 23:01               ` Ed W
  0 siblings, 0 replies; 37+ messages in thread
From: Ed W @ 2010-07-14 23:01 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Rick Jones, David Miller, davidsen, linux-kernel, netdev


> Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
> People at google stated that a CWND of 10 seems to be fair in their
> measurements. 10 because the test setup was equipped with a reasonable large
> link capacity? Do they analyse their modification in environments with a small
> BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
> happens if TCPM adopts this.
>    

Well, I personally would shoot for starting from the position of 
assuming better than zero knowledge about our link and incorporating 
that into the initial cwnd estimate...

We know something about the RTT from the syn/ack times, speed of the 
local link and quickly we will learn about median window sizes to other 
destinations, plus additionally the kernel has some knowledge of other 
connections currently in progress.  With all that information perhaps we 
can make a more informed option than just a hard coded magic number? (Oh 
and lets make the option pluggable so that we can soon have 10 different 
kernel options...)

Seems like there is evidence that networks are starting to cluster into groups that would benefit from a range of cwnd options (higher/lower) - perhaps there is some way to choose a reasonable heuristic to cluster these and choose a better starting option?

Cheers

Ed W



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 23:01                 ` Hagen Paul Pfeifer
@ 2010-07-14 23:05                   ` Ed W
  0 siblings, 0 replies; 37+ messages in thread
From: Ed W @ 2010-07-14 23:05 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: David Miller, rick.jones2, davidsen, linux-kernel, netdev

On 15/07/2010 00:01, Hagen Paul Pfeifer wrote:
> It is quite late here so I will quickly write two sentence about ECN: one
> month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
> really sure if it was google) analysed the employment of ECN - the usage was
> really low. Search the PDF, it is quite interesting one.
>    

I would speculate that this is because there is a big warning on ECN 
saying that it may cause you to loose customers who can't connect to 
you... Businesses are driven by needing to support the most common case, 
not the most optimal (witness the pain of html development and needing 
to consider IE6...)

What would be more useful is for google to survey how many devices are 
unable to interoperate with ECN and if that number turned out to be 
extremely low, and this fact were advertised, then I suspect we might 
see a mass increase in it's deployment?  I know I have it turned off on 
all my servers because I worry more about loosing one customer than 
improving the experience for all customers...

Cheers

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:15   ` Raise initial congestion window size / speedup slow start? David Miller
  2010-07-14 18:48     ` Ed W
@ 2010-07-15  2:52     ` Bill Fink
  2010-07-15  4:51     ` H.K. Jerry Chu
  2010-07-15 23:14     ` Bill Davidsen
  3 siblings, 0 replies; 37+ messages in thread
From: Bill Fink @ 2010-07-15  2:52 UTC (permalink / raw)
  To: David Miller; +Cc: davidsen, lists, linux-kernel, netdev

On Wed, 14 Jul 2010, David Miller wrote:

> From: Bill Davidsen <davidsen@tmr.com>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
> 
> > You may have to go into /proc/sys/net/core and crank up the
> > rmem_* settings, depending on your distribution.
> 
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup.  If you do, it's a
> bug, report it so we can fix it.
> 
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
> 
> For one thing, TCP dynamically adjusts the socket buffer sizes based
> upon the behavior of traffic on the connection.
> 
> And the TCP memory limit sysctls (not the core socket ones) are sized
> based upon available memory.  They are there to protect you from
> situations such as having so much memory dedicated to socket buffers
> that there is none left to do other things effectively.  It's a
> protective limit, rather than a setting meant to increase or improve
> performance.  So like the others, leave these alone too.

What's normal?  :-)

netem1% cat /proc/version 
Linux version 2.6.30.10-105.2.23.fc11.x86_64 (mockbuild@x86-01.phx2.fedoraproject.org) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP Thu Feb 11 07:06:34 UTC 2010

Linux TCP autotuning across an 80 ms RTT cross country network path:

netem1% nuttcp -T10 -i1 192.168.1.18
   14.1875 MB /   1.00 sec =  119.0115 Mbps     0 retrans
  558.0000 MB /   1.00 sec = 4680.7169 Mbps     0 retrans
  872.8750 MB /   1.00 sec = 7322.3527 Mbps     0 retrans
  869.6875 MB /   1.00 sec = 7295.5478 Mbps     0 retrans
  858.4375 MB /   1.00 sec = 7201.0165 Mbps     0 retrans
  857.3750 MB /   1.00 sec = 7192.2116 Mbps     0 retrans
  865.5625 MB /   1.00 sec = 7260.7193 Mbps     0 retrans
  872.3750 MB /   1.00 sec = 7318.2095 Mbps     0 retrans
  862.7500 MB /   1.00 sec = 7237.2571 Mbps     0 retrans
  857.6250 MB /   1.00 sec = 7194.1864 Mbps     0 retrans

 7504.2771 MB /  10.09 sec = 6236.5068 Mbps 11 %TX 25 %RX 0 retrans 80.59 msRTT

Manually specified 100 MB TCP socket buffer on the same path:

netem1% nuttcp -T10 -i1 -w100m 192.168.1.18
  106.8125 MB /   1.00 sec =  895.9598 Mbps     0 retrans
 1092.0625 MB /   1.00 sec = 9160.3254 Mbps     0 retrans
 1111.2500 MB /   1.00 sec = 9322.6424 Mbps     0 retrans
 1115.4375 MB /   1.00 sec = 9356.2569 Mbps     0 retrans
 1116.4375 MB /   1.00 sec = 9365.6937 Mbps     0 retrans
 1115.3125 MB /   1.00 sec = 9356.2749 Mbps     0 retrans
 1121.2500 MB /   1.00 sec = 9405.6233 Mbps     0 retrans
 1125.5625 MB /   1.00 sec = 9441.6949 Mbps     0 retrans
 1130.0000 MB /   1.00 sec = 9478.7479 Mbps     0 retrans
 1139.0625 MB /   1.00 sec = 9555.8559 Mbps     0 retrans

10258.5120 MB /  10.20 sec = 8440.3558 Mbps 15 %TX 40 %RX 0 retrans 80.59 msRTT

The manually selected TCP socket buffer size both ramps up
quicker and achieves a much higher steady state rate.

					-Bill

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:13             ` Hagen Paul Pfeifer
  2010-07-14 22:19               ` Rick Jones
  2010-07-14 22:52               ` Ed W
@ 2010-07-15  3:49               ` Bill Fink
  2010-07-15  5:29                 ` H.K. Jerry Chu
  2010-07-16  9:03                 ` Hagen Paul Pfeifer
  2010-07-15 10:33               ` Alan Cox
  3 siblings, 2 replies; 37+ messages in thread
From: Bill Fink @ 2010-07-15  3:49 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: David Miller, rick.jones2, lists, davidsen, linux-kernel, netdev

On Thu, 15 Jul 2010, Hagen Paul Pfeifer wrote:

> * David Miller | 2010-07-14 14:55:47 [-0700]:
> 
> >Although section 3 of RFC 5681 is a great text, it does not say at all
> >that increasing the initial CWND would lead to fairness issues.
> 
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
> 
> >To be honest, I think google's proposal holds a lot of weight.  If
> >over time link sizes and speeds are increasing (they are) then nudging
> >the initial CWND every so often is a legitimate proposal.  Were
> >someone to claim that utilization is lower than it could be because of
> >the currenttly specified initial CWND, I would have no problem
> >believing them.
> >
> >And I'm happy to make Linux use an increased value once it has
> >traction in the standardization community.
> 
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.

A long, long time ago, I suggested a Path BW Discovery mechanism
to the IETF, analogous to the Path MTU Discovery mechanism, but
it didn't get any traction.  Such information could be extremely
useful to TCP endpoints, to determine a maximum window size to
use, to effectively rate limit a much stronger sender from
overpowering a much weaker receiver (for example 10-GigE -> GigE),
resulting in abominable performance across large RTT paths
(as low as 12 Mbps), even in the absence of any real network
contention.

						-Bill

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 20:39         ` Hagen Paul Pfeifer
  2010-07-14 21:55           ` David Miller
  2010-07-14 22:05           ` Ed W
@ 2010-07-15  4:12           ` Tom Herbert
  2010-07-15  7:48             ` Ed W
  2010-07-15  5:09           ` H.K. Jerry Chu
  3 siblings, 1 reply; 37+ messages in thread
From: Tom Herbert @ 2010-07-15  4:12 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Rick Jones, Ed W, David Miller, davidsen, linux-kernel, netdev,
	Jerry Chu, Nandita Dukkipati

On Wed, Jul 14, 2010 at 1:39 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows.  Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused).  There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>

There is an Internet draft
(http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
raising the default Initial Congestion window to 10 segments, as well
as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621).  We
presented this proposal and data supporting it at Anaheim IETF, and
will be following up in Netherlands with more data including some of
which should further address fairness questions.

In terms of Linux implementation, setting ICW via ip route is
sufficient support on the server side.  There is also a proposed patch
which could allow applications to set ICW themselves (in hopes that
application can reduce number of simultaneous connections).  On the
client side we can now adjust the receive window to advertise larger
initial windows.  Among current implementations, Linux advertises the
smallest default receive window of major OSes, so it turns out Linux
clients won't get lower latency benefits currently (so we'll probably
ask to raise the default some day :-)).

Tom

> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>
> HGN
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:15   ` Raise initial congestion window size / speedup slow start? David Miller
  2010-07-14 18:48     ` Ed W
  2010-07-15  2:52     ` Bill Fink
@ 2010-07-15  4:51     ` H.K. Jerry Chu
  2010-07-16 17:01       ` Patrick McManus
  2010-07-15 23:14     ` Bill Davidsen
  3 siblings, 1 reply; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-15  4:51 UTC (permalink / raw)
  To: David Miller; +Cc: davidsen, lists, linux-kernel, netdev

On Wed, Jul 14, 2010 at 11:15 AM, David Miller <davem@davemloft.net> wrote:
> From: Bill Davidsen <davidsen@tmr.com>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
>> You may have to go into /proc/sys/net/core and crank up the
>> rmem_* settings, depending on your distribution.
>
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup.  If you do, it's a
> bug, report it so we can fix it.

Agreed, except there are indeed bugs in the code today in that the
code in various places assumes initcwnd as per RFC3390. So when
initcwnd is raised, that actual value may be limited unnecessarily by
the initial wmem/sk_sndbuf.

Will try to find time to submit a patch.

Jerry

>
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
>
> For one thing, TCP dynamically adjusts the socket buffer sizes based
> upon the behavior of traffic on the connection.
>
> And the TCP memory limit sysctls (not the core socket ones) are sized
> based upon available memory.  They are there to protect you from
> situations such as having so much memory dedicated to socket buffers
> that there is none left to do other things effectively.  It's a
> protective limit, rather than a setting meant to increase or improve
> performance.  So like the others, leave these alone too.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 20:39         ` Hagen Paul Pfeifer
                             ` (2 preceding siblings ...)
  2010-07-15  4:12           ` Tom Herbert
@ 2010-07-15  5:09           ` H.K. Jerry Chu
  3 siblings, 0 replies; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-15  5:09 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Rick Jones, Ed W, David Miller, davidsen, linux-kernel, netdev

On Wed, Jul 14, 2010 at 1:39 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows.  Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused).  There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a

Please don't mislead. Raising the initcwnd is actively being pursued at IETF
right now. If not here, where else? It is following the same path where initcwnd
was first raised in late 90' through rfc2414/rfc3390.

IETF is not a standard organization just for protocol lawyers to play
word games.
It is responsible for solving real technical issues as well.

Jerry

> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>
> HGN
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15  3:49               ` Bill Fink
@ 2010-07-15  5:29                 ` H.K. Jerry Chu
  2010-07-15 19:51                   ` Rick Jones
  2010-07-16  9:03                 ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-15  5:29 UTC (permalink / raw)
  To: Bill Fink
  Cc: Hagen Paul Pfeifer, David Miller, rick.jones2, lists, davidsen,
	linux-kernel, netdev

On Wed, Jul 14, 2010 at 8:49 PM, Bill Fink <billfink@mindspring.com> wrote:
> On Thu, 15 Jul 2010, Hagen Paul Pfeifer wrote:
>
>> * David Miller | 2010-07-14 14:55:47 [-0700]:
>>
>> >Although section 3 of RFC 5681 is a great text, it does not say at all
>> >that increasing the initial CWND would lead to fairness issues.
>>
>> Because it is only one side of the medal, probing conservative the available
>> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
>> instances is another.
>>
>> >To be honest, I think google's proposal holds a lot of weight.  If
>> >over time link sizes and speeds are increasing (they are) then nudging
>> >the initial CWND every so often is a legitimate proposal.  Were
>> >someone to claim that utilization is lower than it could be because of
>> >the currenttly specified initial CWND, I would have no problem
>> >believing them.
>> >
>> >And I'm happy to make Linux use an increased value once it has
>> >traction in the standardization community.
>>
>> Currently I know no working link capacity probing approach, without active
>> network feedback, to conservatively probing the available link capacity with a
>> high CWND. I am curious about any future trends.
>
> A long, long time ago, I suggested a Path BW Discovery mechanism
> to the IETF, analogous to the Path MTU Discovery mechanism, but
> it didn't get any traction.  Such information could be extremely
> useful to TCP endpoints, to determine a maximum window size to
> use, to effectively rate limit a much stronger sender from
> overpowering a much weaker receiver (for example 10-GigE -> GigE),
> resulting in abominable performance across large RTT paths
> (as low as 12 Mbps), even in the absence of any real network
> contention.

Unfortunately that is not going to help initcwnd (unless one can invent a
PBWD protocol from just 3WHS), and the web is dominated by short-lived
connections so the small initcwnd becomes a choke point.

Jerry

>
>                                                -Bill
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15  4:12           ` Tom Herbert
@ 2010-07-15  7:48             ` Ed W
  2010-07-15 17:36               ` Jerry Chu
  0 siblings, 1 reply; 37+ messages in thread
From: Ed W @ 2010-07-15  7:48 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Hagen Paul Pfeifer, Rick Jones, David Miller, davidsen,
	linux-kernel, netdev, Jerry Chu, Nandita Dukkipati

On 15/07/2010 05:12, Tom Herbert wrote:
> There is an Internet draft
> (http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
> raising the default Initial Congestion window to 10 segments, as well
> as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621).
>    

You guys have obviously done a lot of work on this, however, it seems 
that there is a case for introducing some heuristics into the choice of 
init cwnd as well as offering the option to go larger?  An initial size 
of 10 packets is just another magic number that obviously works with the 
median bandwidth delay product on today's networks - can we not do 
better still?

Seems like a bunch of clever folks have already suggested tweaks to the 
steady stage congestion avoidance, but so far everyone is afraid to 
touch the early stage heuristics?

Also would you guys not benefit from wider deployment of ECN?  Can you 
not help find some ways that deployment could be increased?  At present 
there are big warnings all over the option that it causes some problems, 
but there is no quantification of how much and really whether this 
warning is still appropriate?

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 22:13             ` Hagen Paul Pfeifer
                                 ` (2 preceding siblings ...)
  2010-07-15  3:49               ` Bill Fink
@ 2010-07-15 10:33               ` Alan Cox
  3 siblings, 0 replies; 37+ messages in thread
From: Alan Cox @ 2010-07-15 10:33 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: David Miller, rick.jones2, lists, davidsen, linux-kernel, netdev

On Thu, 15 Jul 2010 00:13:01 +0200
Hagen Paul Pfeifer <hagen@jauu.net> wrote:

> * David Miller | 2010-07-14 14:55:47 [-0700]:
> 
> >Although section 3 of RFC 5681 is a great text, it does not say at all
> >that increasing the initial CWND would lead to fairness issues.
> 
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
> 
> >To be honest, I think google's proposal holds a lot of weight.  If
> >over time link sizes and speeds are increasing (they are) then nudging
> >the initial CWND every so often is a legitimate proposal.  Were
> >someone to claim that utilization is lower than it could be because of
> >the currenttly specified initial CWND, I would have no problem
> >believing them.
> >
> >And I'm happy to make Linux use an increased value once it has
> >traction in the standardization community.
> 
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.

Given perfect information from the network nodes you still need to
traverse the network each direction and then return an answer which means
with a 0.5sec end to end time as in the original posting causality itself
demands 1.5 seconds to get an answer which is itself incomplete and
obsolete.

Causality isn't showing any signs of going away soon.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15  7:48             ` Ed W
@ 2010-07-15 17:36               ` Jerry Chu
  0 siblings, 0 replies; 37+ messages in thread
From: Jerry Chu @ 2010-07-15 17:36 UTC (permalink / raw)
  To: Ed W
  Cc: Tom Herbert, Hagen Paul Pfeifer, Rick Jones, David Miller,
	davidsen, linux-kernel, netdev, Nandita Dukkipati

On Thu, Jul 15, 2010 at 12:48 AM, Ed W <lists@wildgooses.com> wrote:
>
> On 15/07/2010 05:12, Tom Herbert wrote:
>>
>> There is an Internet draft
>> (http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
>> raising the default Initial Congestion window to 10 segments, as well
>> as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621).
>>
>
> You guys have obviously done a lot of work on this, however, it seems that there is a case for introducing some heuristics into the choice of init cwnd as well as offering the option to go larger?  An initial size of 10 packets is just another magic number that obviously works with the median bandwidth delay product on today's networks - can we not do better still?
>
> Seems like a bunch of clever folks have already suggested tweaks to the steady stage congestion avoidance, but so far everyone is afraid to touch the early stage heuristics?

This is because there is not enough info for deriving any heuristic.
For initcwnd one is constrained to
only info from 3WHS. This includes a rough estimate of RTT plus all
the bits in the SYN/SYN-ACK
headers. I'm assuming a stateless approach. We've tried a stateful
solution (i.e., seeding initcwnd from
past history) but found its complexity outweigh the gain.
(See http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf)

>
> Also would you guys not benefit from wider deployment of ECN?  Can you not help find some ways that deployment could be increased?  At present there are big warnings all over the option that it causes some problems, but there is no quantification of how much and really whether this warning is still appropriate?

That will add yet another hoop for us to jump over. Also I'm not sure
a couple of bits are sufficient for a
guesstimate of what initcwnd ought to be.

Our reasoning is simple - there has been tremendous b/w growth since
rfc2414 was published. Even the
lowest common denominator (i.e., dialup links) has moved from 9.6Kbps
to 56Kbps. That's a six fold
increase. If you believe initcwnd should grow proportionally to the
buffer sizes in access links, and the
buffer sizes grows proportionally to b/w, then the initcwnd outght to
be 3*6 = 18 today.

We chose a modest increase (10) with the hope to expedite the
standardization process (and would
certainly appreciate helps from folks on this list). 10 is very
conservative considering many deployment
has gone beyond 3, including Linux stack, which allows one additional
pkt if it's the last data pkt.

Longer term it will be nice to find a way to get rid of this fixed,
somewhat arbitrary initcwnd. Mark
Allman's JumpStart is one idea, but it'd be a much longer route.

Jerry

>
> Ed W
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15  5:29                 ` H.K. Jerry Chu
@ 2010-07-15 19:51                   ` Rick Jones
  2010-07-15 20:48                     ` Stephen Hemminger
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Jones @ 2010-07-15 19:51 UTC (permalink / raw)
  To: H.K. Jerry Chu
  Cc: Bill Fink, Hagen Paul Pfeifer, David Miller, lists, davidsen,
	linux-kernel, netdev

I have to wonder if the only heuristic one could employ for divining the initial 
congestion window is to be either pessimistic/conservative or 
optimistic/liberal.  Or for that matter the only one one really needs here?

That's what it comes down to doesn't it?  At any one point in time, we don't 
*really* know the state of the network and whether it can handle the load we 
might wish to put upon it.  We are always reacting to it. Up until now, it has 
been felt necessary to be pessimistic/conservative at time of connection 
establishment and not rely as much on the robustness of the "control" part of 
avoidance and control.

Now, the folks at Google have lots of data to suggest we don't need to be so 
pessimistic/conservative and so we have to decide if we are willing to be more 
optimistic/liberal.  Broadly handwaving, the "netdev we" seems to be willing to 
be more optimistic/liberal in at least a few cases, and the question comes down 
to whether or not the "IETF we" will be similarly willing.

rick jones

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15 19:51                   ` Rick Jones
@ 2010-07-15 20:48                     ` Stephen Hemminger
  2010-07-16  0:23                       ` H.K. Jerry Chu
  0 siblings, 1 reply; 37+ messages in thread
From: Stephen Hemminger @ 2010-07-15 20:48 UTC (permalink / raw)
  To: Rick Jones
  Cc: H.K. Jerry Chu, Bill Fink, Hagen Paul Pfeifer, David Miller,
	lists, davidsen, linux-kernel, netdev

On Thu, 15 Jul 2010 12:51:22 -0700
Rick Jones <rick.jones2@hp.com> wrote:

> I have to wonder if the only heuristic one could employ for divining the initial 
> congestion window is to be either pessimistic/conservative or 
> optimistic/liberal.  Or for that matter the only one one really needs here?
> 
> That's what it comes down to doesn't it?  At any one point in time, we don't 
> *really* know the state of the network and whether it can handle the load we 
> might wish to put upon it.  We are always reacting to it. Up until now, it has 
> been felt necessary to be pessimistic/conservative at time of connection 
> establishment and not rely as much on the robustness of the "control" part of 
> avoidance and control.
> 
> Now, the folks at Google have lots of data to suggest we don't need to be so 
> pessimistic/conservative and so we have to decide if we are willing to be more 
> optimistic/liberal.  Broadly handwaving, the "netdev we" seems to be willing to 
> be more optimistic/liberal in at least a few cases, and the question comes down 
> to whether or not the "IETF we" will be similarly willing.

I am not convinced that a host being aggressive with initial cwnd (Linux) would
not end up unfairly monopolizing available bandwidth compared to older more conservative
implementations (Windows). Whether fairness is important or not is another debate.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-14 18:15   ` Raise initial congestion window size / speedup slow start? David Miller
                       ` (2 preceding siblings ...)
  2010-07-15  4:51     ` H.K. Jerry Chu
@ 2010-07-15 23:14     ` Bill Davidsen
  3 siblings, 0 replies; 37+ messages in thread
From: Bill Davidsen @ 2010-07-15 23:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, linux-kernel

David Miller wrote:
> From: Bill Davidsen <davidsen@tmr.com>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
> 
>> You may have to go into /proc/sys/net/core and crank up the
>> rmem_* settings, depending on your distribution.
> 
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup.  If you do, it's a
> bug, report it so we can fix it.
> 
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
> 
I think transit time measured in 1/10th sec would disqualify this as a "normal 
setup."

High bandwidth and high latency don't work well because you get "send until the 
window is full then wait for ack" and poor performance. I saw this with sat feed 
to Wyoming from GE's Research Center in upstate NY in the late 80's or early 
90's. (I think this was NYserNet at that time). I did feeds from NYC area to 
California and Hawaii with SBC in the early to mid 2k years. In every case 
SunOS, Solaris, AIX and Linux all failed to hit anything like reasonable 
transfer speeds without manually tweaking, and I got the advice on increasing 
window size from network engineers at ISPs and backbone providers.

The O.P. may have other issues, and may benefit from doing other things as well, 
but raising window size is a reasonable thing to do on links with RTT in 
hundreds of ms, and it's easy to try without changing config files.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15 20:48                     ` Stephen Hemminger
@ 2010-07-16  0:23                       ` H.K. Jerry Chu
  0 siblings, 0 replies; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-16  0:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Rick Jones, Bill Fink, Hagen Paul Pfeifer, David Miller, lists,
	davidsen, linux-kernel, netdev

I don't even consider a modest IW increase to 10 is aggressive. The scaling
of IW is only adequate IMO given the huge b/w growth in the past
decade. Remember there could be plenty of flows sending large cwnd
bursts at
twice the bottleneck link rate at any point of time in the network anyway so
the "fairness" question may already be ill-defined. In any case we're
trying to conduct some experiment in a private testbed to hopefully
get some insights
with real data.

Jerry

On Thu, Jul 15, 2010 at 1:48 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Thu, 15 Jul 2010 12:51:22 -0700
> Rick Jones <rick.jones2@hp.com> wrote:
>
>> I have to wonder if the only heuristic one could employ for divining the initial
>> congestion window is to be either pessimistic/conservative or
>> optimistic/liberal.  Or for that matter the only one one really needs here?
>>
>> That's what it comes down to doesn't it?  At any one point in time, we don't
>> *really* know the state of the network and whether it can handle the load we
>> might wish to put upon it.  We are always reacting to it. Up until now, it has
>> been felt necessary to be pessimistic/conservative at time of connection
>> establishment and not rely as much on the robustness of the "control" part of
>> avoidance and control.
>>
>> Now, the folks at Google have lots of data to suggest we don't need to be so
>> pessimistic/conservative and so we have to decide if we are willing to be more
>> optimistic/liberal.  Broadly handwaving, the "netdev we" seems to be willing to
>> be more optimistic/liberal in at least a few cases, and the question comes down
>> to whether or not the "IETF we" will be similarly willing.
>
> I am not convinced that a host being aggressive with initial cwnd (Linux) would
> not end up unfairly monopolizing available bandwidth compared to older more conservative
> implementations (Windows). Whether fairness is important or not is another debate.
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow  start?
  2010-07-15  3:49               ` Bill Fink
  2010-07-15  5:29                 ` H.K. Jerry Chu
@ 2010-07-16  9:03                 ` Hagen Paul Pfeifer
  1 sibling, 0 replies; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-16  9:03 UTC (permalink / raw)
  To: Bill Fink
  Cc: David Miller, rick.jones2, lists, davidsen, linux-kernel, netdev


On Wed, 14 Jul 2010 23:49:17 -0400, Bill Fink wrote:



> A long, long time ago, I suggested a Path BW Discovery mechanism

> to the IETF, analogous to the Path MTU Discovery mechanism, but

> it didn't get any traction.  Such information could be extremely

> useful to TCP endpoints, to determine a maximum window size to

> use, to effectively rate limit a much stronger sender from

> overpowering a much weaker receiver (for example 10-GigE -> GigE),

> resulting in abominable performance across large RTT paths

> (as low as 12 Mbps), even in the absence of any real network

> contention.



Much weaker middlebox? The windowing mechanism should be sufficient to

avoid endpoints from over-commiting.



Anyway, your proposed draft (I didn't searched for it) sound like a

mechanism similar to RFC 4782: Quick-Start for TCP and IP.





   This document specifies an optional Quick-Start mechanism for

   transport protocols, in cooperation with routers, to determine an

   allowed sending rate at the start and, at times, in the middle of a

   data transfer (e.g., after an idle period).  While Quick-Start is

   designed to be used by a range of transport protocols, in this

   document we only specify its use with TCP.  Quick-Start is designed

   to allow connections to use higher sending rates when there is

   significant unused bandwidth along the path, and the sender and all

   of the routers along the path approve the Quick-Start Request.





Cheers, Hagen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-15  4:51     ` H.K. Jerry Chu
@ 2010-07-16 17:01       ` Patrick McManus
  2010-07-16 17:41         ` Ed W
  2010-07-17  0:36         ` H.K. Jerry Chu
  0 siblings, 2 replies; 37+ messages in thread
From: Patrick McManus @ 2010-07-16 17:01 UTC (permalink / raw)
  To: H.K. Jerry Chu; +Cc: David Miller, davidsen, lists, linux-kernel, netdev

On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>  except there are indeed bugs in the code today in that the
> code in various places assumes initcwnd as per RFC3390. So when
> initcwnd is raised, that actual value may be limited unnecessarily by
> the initial wmem/sk_sndbuf.

Thanks for the discussion!

can you tell us more about the impl concerns of initcwnd stored on the
route?

and while I'm asking for info, can you expand on the conclusion
regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
only read the slides.. maybe the paper has more info?)

article and slides much appreciated and very interetsing. I've long been
of the opinion that the downsides of being too aggressive once in a
while aren't all that serious anymore.. as someone else said in a
non-reservation world you are always trying to predict the future anyhow
and therefore overflowing a queue is always possible no matter how
conservative.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-16 17:01       ` Patrick McManus
@ 2010-07-16 17:41         ` Ed W
  2010-07-17  1:23           ` H.K. Jerry Chu
  2010-07-17  0:36         ` H.K. Jerry Chu
  1 sibling, 1 reply; 37+ messages in thread
From: Ed W @ 2010-07-16 17:41 UTC (permalink / raw)
  To: Patrick McManus
  Cc: H.K. Jerry Chu, David Miller, davidsen, linux-kernel, netdev


> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)
>    

My guess is that this result is specific to google and their servers?

I guess we can probably stereotype the world into two pools of devices:

1) Devices in a pool of fast networking, but connected to the rest of 
the world through a relatively slow router
2) Devices connected via a high speed network and largely the bottleneck 
device is many hops down the line and well away from us

I'm thinking here 1) client users behind broadband routers, wireless, 
3G, dialup, etc and 2) public servers that have obviously been 
deliberately placed in locations with high levels of interconnectivity.

I think history information could be more useful for clients in category 
1) because there is a much higher probability that their most 
restrictive device is one hop away and hence affects all connections and 
relatively occasionally the bottleneck is multiple hops away.  For 
devices in category 2) it's much harder because the restriction will 
usually be lots of hops away and effectively you are trying to figure 
out and cache the speed of every ADSL router out there...  For sure you 
can probably figure out how to cluster this stuff and say that pool 
there is 56K dialup, that pool there is "broadband", that pool is cell 
phone, etc, but probably it's hard to do better than that?

So my guess is this is why google have had poor results investigating 
cwnd caching?

However, I would suggest that whilst it's of little value for the server 
side, it still remains a very interesting idea for the client side and 
the cache hit ratio would seem to be dramatically higher here?


I haven't studied the code, but given there is a userspace ability to 
change init cwnd through the IP utility, it would seem likely that 
relatively little coding would now be required to implement some kind of 
limited cwnd caching and experiment with whether this is a valuable 
addition?  I would have thought if you are only fiddling with devices 
behind a broadband router then there is little chance of you "crashing 
the internet" with these kind of experiments?

Good luck

Ed W

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-16 17:01       ` Patrick McManus
  2010-07-16 17:41         ` Ed W
@ 2010-07-17  0:36         ` H.K. Jerry Chu
  2010-07-19 17:08           ` Rick Jones
  1 sibling, 1 reply; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-17  0:36 UTC (permalink / raw)
  To: Patrick McManus; +Cc: David Miller, davidsen, lists, linux-kernel, netdev

On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus@ducksong.com> wrote:
> On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>>  except there are indeed bugs in the code today in that the
>> code in various places assumes initcwnd as per RFC3390. So when
>> initcwnd is raised, that actual value may be limited unnecessarily by
>> the initial wmem/sk_sndbuf.
>
> Thanks for the discussion!
>
> can you tell us more about the impl concerns of initcwnd stored on the
> route?

We have found two issues when altering initcwnd through the ip route cmd:
1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
defaulted to a small value of 16KB). This problem has been made obscured
by the TSO code, which fudges the flow control limit (and could be a bug by
itself).

2. the congestion backoff code is supposed to take inflight, rather than cwnd,
but initcwnd presents a special case. I don't fully understand the code yet to
propose a fix.

>
> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)

This is partly due to our load balancer policy resulting in poor cache hit,
partly due to the sheer volumes of remote clients. Some of colleagues
tried to change the host cache to a /24 subnet cache but the result wasn't
that good either (sorry I don't remember all the details.)

>
> article and slides much appreciated and very interetsing. I've long been
> of the opinion that the downsides of being too aggressive once in a
> while aren't all that serious anymore.. as someone else said in a
> non-reservation world you are always trying to predict the future anyhow
> and therefore overflowing a queue is always possible no matter how
> conservative.

Please voice your support to TCPM then :)

Jerry

>
>
>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-16 17:41         ` Ed W
@ 2010-07-17  1:23           ` H.K. Jerry Chu
  0 siblings, 0 replies; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-17  1:23 UTC (permalink / raw)
  To: Ed W; +Cc: Patrick McManus, David Miller, davidsen, linux-kernel, netdev

On Fri, Jul 16, 2010 at 10:41 AM, Ed W <lists@wildgooses.com> wrote:
>
>> and while I'm asking for info, can you expand on the conclusion
>> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
>> only read the slides.. maybe the paper has more info?)
>>
>
> My guess is that this result is specific to google and their servers?
>
> I guess we can probably stereotype the world into two pools of devices:
>
> 1) Devices in a pool of fast networking, but connected to the rest of the
> world through a relatively slow router
> 2) Devices connected via a high speed network and largely the bottleneck
> device is many hops down the line and well away from us
>
> I'm thinking here 1) client users behind broadband routers, wireless, 3G,
> dialup, etc and 2) public servers that have obviously been deliberately
> placed in locations with high levels of interconnectivity.
>
> I think history information could be more useful for clients in category 1)
> because there is a much higher probability that their most restrictive
> device is one hop away and hence affects all connections and relatively
> occasionally the bottleneck is multiple hops away.  For devices in category
> 2) it's much harder because the restriction will usually be lots of hops
> away and effectively you are trying to figure out and cache the speed of
> every ADSL router out there...  For sure you can probably figure out how to
> cluster this stuff and say that pool there is 56K dialup, that pool there is
> "broadband", that pool is cell phone, etc, but probably it's hard to do
> better than that?
>
> So my guess is this is why google have had poor results investigating cwnd
> caching?

Actually we have investigated two type of caches, a short-history limited size
internal cache that is subject to some LRU replacement policy hence
much limiting
the cache hit rate, and a long-history external cache, which provides much more
accurate cwnd history per subnet but with high complexity and
deployment headache.

Also we have set out for a much more ambitious goal, to not just speed
up our own
services, but also provide a solution that could benefit the whole web
(see http://code.google.com/speed/index.html). The latter pretty much
precludes a complex
external cache scheme mentioned above.

Jerry

>
> However, I would suggest that whilst it's of little value for the server
> side, it still remains a very interesting idea for the client side and the
> cache hit ratio would seem to be dramatically higher here?
>
>
> I haven't studied the code, but given there is a userspace ability to change
> init cwnd through the IP utility, it would seem likely that relatively
> little coding would now be required to implement some kind of limited cwnd
> caching and experiment with whether this is a valuable addition?  I would
> have thought if you are only fiddling with devices behind a broadband router
> then there is little chance of you "crashing the internet" with these kind
> of experiments?
>
> Good luck
>
> Ed W
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-17  0:36         ` H.K. Jerry Chu
@ 2010-07-19 17:08           ` Rick Jones
  2010-07-19 22:51             ` H.K. Jerry Chu
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Jones @ 2010-07-19 17:08 UTC (permalink / raw)
  To: H.K. Jerry Chu
  Cc: Patrick McManus, David Miller, davidsen, lists, linux-kernel,
	netdev

H.K. Jerry Chu wrote:
> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus@ducksong.com> wrote:
>>can you tell us more about the impl concerns of initcwnd stored on the
>>route?
> 
> 
> We have found two issues when altering initcwnd through the ip route cmd:
> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
> defaulted to a small value of 16KB). This problem has been made obscured
> by the TSO code, which fudges the flow control limit (and could be a bug by
> itself).

I'll ask my Emily Litella question of the day and inquire as to why that would 
be unique to altering initcwnd via the route?

The slightly less Emily Litella-esque question is why an appliction with a 
desire to know it could send more than 16K at one time wouldn't have either 
asked via its install docs to have the minimum tweaked (certainly if one is 
already tweaking routes...), or "gone all the way" and made an explicit 
setsockopt(SO_SNDBUF) call?  We are in a realm of applications for which there 
was a proposal to allow them to pick their own initcwnd right?  Having them pick 
an SO_SNDBUF size would seem to be no more to ask.

rick jones

sendbuf_init = max(tcp_mem,initcwnd)?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-19 17:08           ` Rick Jones
@ 2010-07-19 22:51             ` H.K. Jerry Chu
  2010-07-19 23:42               ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 37+ messages in thread
From: H.K. Jerry Chu @ 2010-07-19 22:51 UTC (permalink / raw)
  To: Rick Jones
  Cc: Patrick McManus, David Miller, davidsen, lists, linux-kernel,
	netdev

 Mon, Jul 19, 2010 at 10:08 AM, Rick Jones <rick.jones2@hp.com> wrote:
> H.K. Jerry Chu wrote:
>>
>> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <mcmanus@ducksong.com>
>> wrote:
>>>
>>> can you tell us more about the impl concerns of initcwnd stored on the
>>> route?
>>
>>
>> We have found two issues when altering initcwnd through the ip route cmd:
>> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
>> defaulted to a small value of 16KB). This problem has been made obscured
>> by the TSO code, which fudges the flow control limit (and could be a bug
>> by
>> itself).
>
> I'll ask my Emily Litella question of the day and inquire as to why that
> would be unique to altering initcwnd via the route?
>
> The slightly less Emily Litella-esque question is why an appliction with a
> desire to know it could send more than 16K at one time wouldn't have either
> asked via its install docs to have the minimum tweaked (certainly if one is
> already tweaking routes...), or "gone all the way" and made an explicit
> setsockopt(SO_SNDBUF) call?  We are in a realm of applications for which
> there was a proposal to allow them to pick their own initcwnd right?  Having

Per app setting of initcwnd is just one case. Another is per route setting of
initcwnd basis through the ip route cmd. For the latter the initcwnd change is
more or less supposed to be transparent to apps.

This wasn't a big issue and can probably be easily fixed by
initializing sk_sndbuf
to max(tcp_wmem[1], initcwnd) as you alluded to below. It is just our
experiements got hindered by this little bug but we weren't aware of it sooner
due to TSO fudging sndbuf.

Jerry

> them pick an SO_SNDBUF size would seem to be no more to ask.
>
> rick jones
>
> sendbuf_init = max(tcp_mem,initcwnd)?
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Raise initial congestion window size / speedup slow start?
  2010-07-19 22:51             ` H.K. Jerry Chu
@ 2010-07-19 23:42               ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 37+ messages in thread
From: Hagen Paul Pfeifer @ 2010-07-19 23:42 UTC (permalink / raw)
  To: H.K. Jerry Chu
  Cc: Rick Jones, Patrick McManus, David Miller, davidsen, lists,
	linux-kernel, netdev, Stephen Hemminger, Alan Cox

Maybe someone is interested: on the Transport Modeling Research Group (TMRG)
mailing list a new thread named "Proposal to increase TCP initial CWND"
starts one day ago.

Cheers, Hagen


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2010-07-19 23:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4C3D94E3.9080103@wildgooses.com>
     [not found] ` <4C3DD5EB.9070908@tmr.com>
2010-07-14 18:15   ` Raise initial congestion window size / speedup slow start? David Miller
2010-07-14 18:48     ` Ed W
2010-07-14 19:10       ` Stephen Hemminger
2010-07-14 21:47         ` Mitchell Erblich
2010-07-14 20:17       ` Rick Jones
2010-07-14 20:39         ` Hagen Paul Pfeifer
2010-07-14 21:55           ` David Miller
2010-07-14 22:13             ` Hagen Paul Pfeifer
2010-07-14 22:19               ` Rick Jones
2010-07-14 22:40                 ` Hagen Paul Pfeifer
2010-07-14 22:52               ` Ed W
2010-07-14 23:01                 ` Hagen Paul Pfeifer
2010-07-14 23:05                   ` Ed W
2010-07-15  3:49               ` Bill Fink
2010-07-15  5:29                 ` H.K. Jerry Chu
2010-07-15 19:51                   ` Rick Jones
2010-07-15 20:48                     ` Stephen Hemminger
2010-07-16  0:23                       ` H.K. Jerry Chu
2010-07-16  9:03                 ` Hagen Paul Pfeifer
2010-07-15 10:33               ` Alan Cox
2010-07-14 22:05           ` Ed W
2010-07-14 22:36             ` Hagen Paul Pfeifer
2010-07-14 23:01               ` Ed W
2010-07-15  4:12           ` Tom Herbert
2010-07-15  7:48             ` Ed W
2010-07-15 17:36               ` Jerry Chu
2010-07-15  5:09           ` H.K. Jerry Chu
2010-07-15  2:52     ` Bill Fink
2010-07-15  4:51     ` H.K. Jerry Chu
2010-07-16 17:01       ` Patrick McManus
2010-07-16 17:41         ` Ed W
2010-07-17  1:23           ` H.K. Jerry Chu
2010-07-17  0:36         ` H.K. Jerry Chu
2010-07-19 17:08           ` Rick Jones
2010-07-19 22:51             ` H.K. Jerry Chu
2010-07-19 23:42               ` Hagen Paul Pfeifer
2010-07-15 23:14     ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).