netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP Limited slow start
@ 2006-06-02 23:13 Stephen Hemminger
  2006-06-03  1:54 ` [RFC] TCP limited " Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2006-06-02 23:13 UTC (permalink / raw)
  To: Xiaoliang Wei, John Heffner; +Cc: netdev

Has anyone done an implementation of RFC3742 for Linux? It looks interesting, but
would need some integration with current ABC code.

There was some evidence of a version in old Web100 code, but it's gone now. Was
it deemed a mistake?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC] TCP limited slow start
  2006-06-02 23:13 TCP Limited slow start Stephen Hemminger
@ 2006-06-03  1:54 ` Stephen Hemminger
  2006-06-03 16:46   ` John Heffner
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2006-06-03  1:54 UTC (permalink / raw)
  To: David Miller, John Heffner; +Cc: netdev

Rolled my sleeve's up and gave this a try...

This is a implementation of Sally Floyd's Limited Slow Start
for Large Congestion Windows.

Summary from RFC:
   Limited Slow-Start introduces a parameter, "max_ssthresh", and
   modifies the slow-start mechanism for values of the congestion window
   where "cwnd" is greater than "max_ssthresh".  That is, during Slow-
   Start, when

      cwnd <= max_ssthresh,

   cwnd is increased by one MSS (MAXIMUM SEGMENT SIZE) for every
   arriving ACK (acknowledgement) during slow-start, as is always the
   case.  During Limited Slow-Start, when

      max_ssthresh < cwnd <= ssthresh,

   the invariant is maintained so that the congestion window is
   increased during slow-start by at most max_ssthresh/2 MSS per round-
   trip time.  This is done as follows:

      For each arriving ACK in slow-start:
        If (cwnd <= max_ssthresh)
           cwnd += MSS;
        else
           K = int(cwnd/(0.5 max_ssthresh));
           cwnd += int(MSS/K);

   Thus, during Limited Slow-Start the window is increased by 1/K MSS
   for each arriving ACK, for K = int(cwnd/(0.5 max_ssthresh)), instead
   of by 1 MSS as in standard slow-start [RFC2581].

---

 Documentation/networking/ip-sysctl.txt |    8 +++++-
 include/linux/sysctl.h                 |    1 +
 include/net/tcp.h                      |    1 +
 net/ipv4/sysctl_net_ipv4.c             |    8 ++++++
 net/ipv4/tcp_cong.c                    |   46 ++++++++++++++++++++------------
 net/ipv4/tcp_input.c                   |    1 +
 6 files changed, 47 insertions(+), 18 deletions(-)

0884f45c9f21c50dd9117b2fc02bf5436be3c3bf
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f12007b..9869298 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -103,9 +103,15 @@ TCP variables: 
 
 tcp_abc - INTEGER
 	Controls Appropriate Byte Count defined in RFC3465. If set to
-	0 then does congestion avoid once per ack. 1 is conservative
+	0 then does congestion avoid once per ack. 1 (default) is conservative
 	value, and 2 is more agressive.
 
+tcp_limited_ssthresh - INTEGER
+	Controls the increase of the congestion window during slow start as
+	defined in RFC3742. The purpose is to slow the growth of the congestion
+	window on high delay networks where agressive growth can cause losses
+	of 1000's of packets. Default is 100 packets.
+
 tcp_syn_retries - INTEGER
 	Number of times initial SYNs for an active TCP connection attempt
 	will be retransmitted. Should not be higher than 255. Default value
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 76eaeff..a455165 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -403,6 +403,7 @@ enum
  	NET_TCP_MTU_PROBING=113,
 	NET_TCP_BASE_MSS=114,
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
+	NET_TCP_LIMITED_SSTHRESH=116,
 };
 
 enum {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 575636f..3a14861 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -225,6 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
+extern int sysctl_tcp_limited_ssthresh;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6b6c3ad..d1358d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -688,6 +688,14 @@ #endif
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
+	{
+		.ctl_name	= NET_TCP_LIMITED_SSTHRESH,
+		.procname	= "tcp_max_ssthresh",
+		.data		= &sysctl_tcp_limited_ssthresh,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
 	{ .ctl_name = 0 }
 };
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 857eefc..a27c792 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -180,25 +180,37 @@ int tcp_set_congestion_control(struct so
  */
 void tcp_slow_start(struct tcp_sock *tp)
 {
-	if (sysctl_tcp_abc) {
-		/* RFC3465: Slow Start
-		 * TCP sender SHOULD increase cwnd by the number of
-		 * previously unacknowledged bytes ACKed by each incoming
-		 * acknowledgment, provided the increase is not more than L
-		 */
-		if (tp->bytes_acked < tp->mss_cache)
-			return;
-
-		/* We MAY increase by 2 if discovered delayed ack */
-		if (sysctl_tcp_abc > 1 && tp->bytes_acked > 2*tp->mss_cache) {
-			if (tp->snd_cwnd < tp->snd_cwnd_clamp)
-				tp->snd_cwnd++;
-		}
+	/* RFC3465: Apprpriate Byte Coute Slow Start
+	 * TCP sender SHOULD increase cwnd by the number of
+	 * previously unacknowledged bytes ACKed by each incoming
+	 * acknowledgment, provided the increase is not more than L
+	 */
+	if (sysctl_tcp_abc && tp->bytes_acked < tp->mss_cache)
+		return;
+
+	/* RFC3742: limited slow start
+	 * the window is increased by 1/K MSS for each arriving ACK, 
+	 * for K = int(cwnd/(0.5 max_ssthresh))
+	 */
+	if (sysctl_tcp_limited_ssthresh
+	    && tp->snd_cwnd > sysctl_tcp_limited_ssthresh) {
+		u32 k = max(tp->snd_cwnd / (sysctl_tcp_limited_ssthresh >> 1), 1U);
+ 		if (++tp->snd_cwnd_cnt >= k) {
+ 			if (tp->snd_cwnd < tp->snd_cwnd_clamp)
+ 				tp->snd_cwnd++;
+ 			tp->snd_cwnd_cnt = 0;
+ 		} 
+	} else {
+		/* ABC: We MAY increase by 2 if discovered delayed ack */
+		if (sysctl_tcp_abc > 1
+		    && tp->bytes_acked > 2*tp->mss_cache 
+		    && tp->snd_cwnd < tp->snd_cwnd_clamp)
+			tp->snd_cwnd++;
+
+		if (tp->snd_cwnd < tp->snd_cwnd_clamp)
+			tp->snd_cwnd++;
 	}
 	tp->bytes_acked = 0;
-
-	if (tp->snd_cwnd < tp->snd_cwnd_clamp)
-		tp->snd_cwnd++;
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 718d0f2..80dd5e4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -90,6 +90,7 @@ int sysctl_tcp_nometrics_save;
 
 int sysctl_tcp_moderate_rcvbuf = 1;
 int sysctl_tcp_abc = 1;
+int sysctl_tcp_limited_ssthresh = 100;
 
 #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
 #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
-- 
1.3.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] TCP limited slow start
  2006-06-03  1:54 ` [RFC] TCP limited " Stephen Hemminger
@ 2006-06-03 16:46   ` John Heffner
  2006-06-05 17:17     ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: John Heffner @ 2006-06-03 16:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Stephen Hemminger wrote:
> Rolled my sleeve's up and gave this a try...
> 
> This is a implementation of Sally Floyd's Limited Slow Start
> for Large Congestion Windows.

Limited slow start is useful as a work-around for bottleneck queues that 
are inappropriately short.  I don't think it's good to run it all the 
time by default (with a max_ssthresh < infinity), because it slows down 
flows on healthy paths, and introduces another non-scalable parameter to 
TCP.

I see it as potentially useful as a per-route parameter, where you set 
it deliberately to work around some known problematic path.  A sysctl 
with a default value of infinity might be okay as well.

Practically speaking, we've had this in the Web100 patch for a long time 
(and still do, look for WAD_MaxSsthresh), but I've never found it all 
that useful.  If the bottleneck queue is too short, you usually end up 
getting screwed other ways too.

   -John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] TCP limited slow start
  2006-06-03 16:46   ` John Heffner
@ 2006-06-05 17:17     ` Stephen Hemminger
  0 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2006-06-05 17:17 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On Sat, 03 Jun 2006 12:46:57 -0400
John Heffner <jheffner@psc.edu> wrote:

> Stephen Hemminger wrote:
> > Rolled my sleeve's up and gave this a try...
> > 
> > This is a implementation of Sally Floyd's Limited Slow Start
> > for Large Congestion Windows.
> 
> Limited slow start is useful as a work-around for bottleneck queues that 
> are inappropriately short.  I don't think it's good to run it all the 
> time by default (with a max_ssthresh < infinity), because it slows down 
> flows on healthy paths, and introduces another non-scalable parameter to 
> TCP.
> 
> I see it as potentially useful as a per-route parameter, where you set 
> it deliberately to work around some known problematic path.  A sysctl 
> with a default value of infinity might be okay as well.
> 
> Practically speaking, we've had this in the Web100 patch for a long time 
> (and still do, look for WAD_MaxSsthresh), but I've never found it all 
> that useful.  If the bottleneck queue is too short, you usually end up 
> getting screwed other ways too.
> 
>    -John

I moved it off to tcp_highspeed.c only. That is seems appropriate because
that is where you put the related RFC.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-06-05 17:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02 23:13 TCP Limited slow start Stephen Hemminger
2006-06-03  1:54 ` [RFC] TCP limited " Stephen Hemminger
2006-06-03 16:46   ` John Heffner
2006-06-05 17:17     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).