netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFT] BIC TCP delayed ack compensation
  2005-02-09 18:59 ` 2.6.10 TCP troubles -- suggested patch Stephen Hemminger
@ 2005-02-22 21:50   ` Stephen Hemminger
  2005-02-22 23:30     ` John Heffner
  2005-02-22 23:38     ` Baruch Even
  0 siblings, 2 replies; 15+ messages in thread
From: Stephen Hemminger @ 2005-02-22 21:50 UTC (permalink / raw)
  To: Hubert Tonneau, cliff white
  Cc: Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller

This patch which was extracted from BIC TCP 1.1 compensates
for systems (like MaxOSX) that don't ACK every other packet.
It has no impact for normal transfers, but might help with problems
with Mac like Hubert found.


diff -Nru a/include/linux/tcp.h b/include/linux/tcp.h
--- a/include/linux/tcp.h	2005-02-22 13:44:12 -08:00
+++ b/include/linux/tcp.h	2005-02-22 13:44:12 -08:00
@@ -433,6 +433,7 @@
 		__u32 	last_max_cwnd;	/* last maximium snd_cwnd */
 		__u32	last_cwnd;	/* the last snd_cwnd */
 		__u32   last_stamp;     /* time when updated last_cwnd */
+		__u32	delayed_ack;	/* ratio of packets/ACKs */
 	} bictcp;
 };
 
diff -Nru a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h	2005-02-22 13:44:12 -08:00
+++ b/include/net/tcp.h	2005-02-22 13:44:12 -08:00
@@ -508,6 +508,8 @@
 #define BICTCP_BETA_SCALE    1024	/* Scale factor beta calculation
 					 * max_cwnd = snd_cwnd * beta
 					 */
+#define BICTCP_DELAY_SCALE   1024	/* Scale for delayed_ack ratio */
+
 #define BICTCP_MAX_INCREMENT 32		/*
 					 * Limit on the amount of
 					 * increment allowed during
diff -Nru a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c	2005-02-22 13:44:12 -08:00
+++ b/net/ipv4/tcp_input.c	2005-02-22 13:44:12 -08:00
@@ -339,6 +339,7 @@
 	tp->bictcp.last_max_cwnd = 0;
 	tp->bictcp.last_cwnd = 0;
 	tp->bictcp.last_stamp = 0;
+	tp->bictcp.delayed_ack = 2 * BICTCP_DELAY_SCALE;
 }
 
 /* 5. Recalculate window clamp after socket hit its memory bounds. */
@@ -2075,6 +2076,13 @@
 			/* linear increase */
 			tp->bictcp.cnt = tp->snd_cwnd / BICTCP_MAX_INCREMENT;
 	}
+
+	/* compensate for delayed ack's */
+	tp->bictcp.cnt = (tp->bictcp.cnt * BICTCP_DELAY_SCALE)
+		/ tp->bictcp.delayed_ack;
+	if (tp->bictcp.cnt == 0)
+		tp->bictcp.cnt = 1;
+
 	return tp->bictcp.cnt;
 }
 
@@ -2418,6 +2426,7 @@
 	__u32 now = tcp_time_stamp;
 	int acked = 0;
 	__s32 seq_rtt = -1;
+	__u32 cnt = 0;
 
 	while ((skb = skb_peek(&sk->sk_write_queue)) &&
 	       skb != sk->sk_send_head) {
@@ -2472,7 +2481,13 @@
 		tcp_packets_out_dec(tp, skb);
 		__skb_unlink(skb, skb->list);
 		sk_stream_free_skb(sk, skb);
+		++cnt;
 	}
+
+	/* compute average packets per ACK (scaled by 1024) */
+	if (cnt > 0 && tcp_is_bic(tp) && tp->ca_state == TCP_CA_Open)
+		tp->bictcp.delayed_ack = (15 * tp->bictcp.delayed_ack) / 16
+			+ (BICTCP_DELAY_SCALE/16) * cnt;
 
 	if (acked&FLAG_ACKED) {
 		tcp_ack_update_rtt(tp, acked, seq_rtt);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
@ 2005-02-22 22:22 Hubert Tonneau
  2005-02-23  0:58 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Hubert Tonneau @ 2005-02-22 22:22 UTC (permalink / raw)
  To: Stephen Hemminger, cliff white
  Cc: Alexey Kuznetsov, netdev, Injong Rhee, David S. Miller

Stephen Hemminger wrote:
>
> This patch which was extracted from BIC TCP 1.1 compensates
> for systems (like MaxOSX) that don't ACK every other packet.
> It has no impact for normal transfers, but might help with problems
> with Mac like Hubert found.

No, it's even worse.

2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB or data)
2.6.9 to gigabit connected MacOSX: 5 seconds
2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds
2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-22 21:50   ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger
@ 2005-02-22 23:30     ` John Heffner
  2005-02-22 23:38     ` Baruch Even
  1 sibling, 0 replies; 15+ messages in thread
From: John Heffner @ 2005-02-22 23:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Has there been any discussion of implementing ABC (RFC3465) in Linux?

Thanks,
  -John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-22 21:50   ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger
  2005-02-22 23:30     ` John Heffner
@ 2005-02-22 23:38     ` Baruch Even
  2005-02-23  1:04       ` Yee-Ting Li
  1 sibling, 1 reply; 15+ messages in thread
From: Baruch Even @ 2005-02-22 23:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hubert Tonneau, cliff white, Alexey Kuznetsov, netdev,
	Injong Rhee, David S. Miller, Yee-Ting Li, Doug Leith

Stephen Hemminger wrote:
> This patch which was extracted from BIC TCP 1.1 compensates
> for systems (like MaxOSX) that don't ACK every other packet.
> It has no impact for normal transfers, but might help with problems
> with Mac like Hubert found.

We have a version of ABC (Appropriate Byte Counting) implementation of 
RFC 3465, which we hope to submit soon for inclusion in the kernel which 
should be a more appropriate solution for this. The RFC is a well 
defined standard whereas this patch has not received any reviewing by 
the networking community.

This solution is just a band-aid for only one congestion control, as 
opposed to a generic solution. It is also prone to make BIC more 
aggressive according to our testing.

I'll try to post our ABC patch tomorrow, time permitting.

One thing to note is that accounting for delayed acking is not an overly 
important feature, from our testing it only speeds up convergence by a 
small factor and doesn't change the correctness of the algorithms.

Baruch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-22 22:22 [RFT] BIC TCP delayed ack compensation Hubert Tonneau
@ 2005-02-23  0:58 ` Stephen Hemminger
  2005-02-23 18:32 ` Injong Rhee
  2005-02-23 18:37 ` Injong Rhee
  2 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2005-02-23  0:58 UTC (permalink / raw)
  To: Hubert Tonneau
  Cc: cliff white, Alexey Kuznetsov, netdev, Injong Rhee,
	David S. Miller

On Tue, 22 Feb 2005 22:22:42 GMT
Hubert Tonneau <hubert.tonneau@fullpliant.org> wrote:

> Stephen Hemminger wrote:
> >
> > This patch which was extracted from BIC TCP 1.1 compensates
> > for systems (like MaxOSX) that don't ACK every other packet.
> > It has no impact for normal transfers, but might help with problems
> > with Mac like Hubert found.
> 
> No, it's even worse.
> 
> 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB or data)
> 2.6.9 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds
> 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
> 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds

Thanks, that is really interesting...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-22 23:38     ` Baruch Even
@ 2005-02-23  1:04       ` Yee-Ting Li
  2005-02-23 15:28         ` Yee-Ting Li
  0 siblings, 1 reply; 15+ messages in thread
From: Yee-Ting Li @ 2005-02-23  1:04 UTC (permalink / raw)
  To: netdev
  Cc: Doug Leith, David S. Miller, Injong Rhee, Yee-Ting Li,
	Baruch Even, Hubert Tonneau, cliff white, Alexey Kuznetsov,
	Stephen Hemminger

On Feb 22, 2005, at 23:38, Baruch Even wrote:
> We have a version of ABC (Appropriate Byte Counting) implementation of 
> RFC 3465, which we hope to submit soon for inclusion in the kernel 
> which should be a more appropriate solution for this. The RFC is a 
> well defined standard whereas this patch has not received any 
> reviewing by the networking community.

Please find enclosed a version of our implementation of RFC3465 ABC for 
Linux 2.6.11-rc4.

There is in-built protection, as defined by the RFC, to prevent large 
bursts of packets should acks arrive acknowledging more than abc_L 
packets (sysctl_tcp_abc_L). The entire abc patch can be switched on or 
off using sysctl_tcp_abc={1|0} respectively. As this is also a RFT, it 
is switched ON by default and has the abc_L value of 2 which MAY be 
used (according to the RFC).

Note that an abc_L of 1 will be more conservative than what is 
available with normal clocking of delayed acks. Note that there is 
currently no built in mechanism to prevent abc_L being set to over 2; 
the RFC defines that abc_L MUST NOT be greater than 2.

This patch also has the advantage of working for all protocols 
currently in the kernel (except vegas which doesn't require it).



Signed-off-by: Yee-Ting Li <Yee-Ting.Li@may.ie>

Index: linux-2.6.11-rc4/include/linux/sysctl.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/sysctl.h	Sun Feb 13 03:06:53 
2005
+++ linux-2.6.11-rc4/include/linux/sysctl.h	Tue Feb 22 23:48:30 2005
@@ -344,6 +344,8 @@
  	NET_TCP_DEFAULT_WIN_SCALE=105,
  	NET_TCP_MODERATE_RCVBUF=106,
  	NET_TCP_TSO_WIN_DIVISOR=107,
+	NET_TCP_ABC=108,
+	NET_TCP_ABC_L=109,
  };

  enum {
Index: linux-2.6.11-rc4/include/linux/tcp.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/tcp.h	Sun Feb 13 03:06:23 2005
+++ linux-2.6.11-rc4/include/linux/tcp.h	Tue Feb 22 23:39:41 2005
@@ -366,6 +366,8 @@

  	__u32	total_retrans;	/* Total retransmits for entire connection */

+	__u32	bytes_acked;	/* Appropiate Byte Counting - RFC3465 */
+	
  	/* The syn_wait_lock is necessary only to avoid proc interface having
  	 * to grab the main lock sock while browsing the listening hash
  	 * (otherwise it's deadlock prone).
Index: linux-2.6.11-rc4/include/net/tcp.h
===================================================================
--- linux-2.6.11-rc4.orig/include/net/tcp.h	Sun Feb 13 03:05:28 2005
+++ linux-2.6.11-rc4/include/net/tcp.h	Tue Feb 22 23:47:59 2005
@@ -609,6 +609,10 @@
  extern int sysctl_tcp_moderate_rcvbuf;
  extern int sysctl_tcp_tso_win_divisor;

+/* RFC3465 - ABC */
+extern int sysctl_tcp_abc;
+extern int sysctl_tcp_abc_L;
+
  extern atomic_t tcp_memory_allocated;
  extern atomic_t tcp_sockets_allocated;
  extern int tcp_memory_pressure;
@@ -1366,6 +1370,7 @@
  static inline void tcp_enter_cwr(struct tcp_sock *tp)
  {
  	tp->prior_ssthresh = 0;
+	tp->bytes_acked=0;
  	if (tp->ca_state < TCP_CA_CWR) {
  		__tcp_enter_cwr(tp);
  		tcp_set_ca_state(tp, TCP_CA_CWR);
Index: linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/sysctl_net_ipv4.c	Sun Feb 13 
03:07:01 2005
+++ linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c	Tue Feb 22 23:46:18 2005
@@ -682,6 +682,22 @@
  		.mode		= 0644,
  		.proc_handler	= &proc_dointvec,
  	},
+    	{
+		.ctl_name	= NET_TCP_ABC,
+		.procname	= "tcp_abc",
+		.data		= &sysctl_tcp_abc,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+    	{
+		.ctl_name	= NET_TCP_ABC_L,
+		.procname	= "tcp_abc_L",
+		.data		= &sysctl_tcp_abc_L,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
  	{ .ctl_name = 0 }
  };

Index: linux-2.6.11-rc4/net/ipv4/tcp.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp.c	Sun Feb 13 03:05:50 2005
+++ linux-2.6.11-rc4/net/ipv4/tcp.c	Tue Feb 22 23:28:28 2005
@@ -1825,6 +1825,7 @@
  	tp->packets_out = 0;
  	tp->snd_ssthresh = 0x7fffffff;
  	tp->snd_cwnd_cnt = 0;
+	tp->bytes_acked = 0;
  	tcp_set_ca_state(tp, TCP_CA_Open);
  	tcp_clear_retrans(tp);
  	tcp_delack_init(tp);
Index: linux-2.6.11-rc4/net/ipv4/tcp_input.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp_input.c	Tue Feb 22 23:27:44 2005
+++ linux-2.6.11-rc4/net/ipv4/tcp_input.c	Wed Feb 23 00:25:44 2005
@@ -92,6 +92,11 @@

  int sysctl_tcp_moderate_rcvbuf = 1;

+/* RFC 3465 - ABC */
+int sysctl_tcp_abc = 1;
+int sysctl_tcp_abc_L = 2;   /* The RFC definess 1 as being a more 
conservative value */
+			    /* that SHOULD be used, however, we use 2 as it MAY be used */
+
  /* Default values of the Vegas variables, in fixed-point representation
   * with V_PARAM_SHIFT bits to the right of the binary point.
   */
@@ -1287,6 +1292,7 @@
  	tp->snd_cwnd_cnt   = 0;
  	tp->snd_cwnd_stamp = tcp_time_stamp;

+	tp->bytes_acked = 0;
  	tcp_clear_retrans(tp);

  	/* Push undo marker, if it was plain RTO and nothing
@@ -1945,6 +1951,8 @@
  			TCP_ECN_queue_cwr(tp);
  		}

+		tp->bytes_acked = 0;
+			
  		tp->snd_cwnd_cnt = 0;
  		tcp_set_ca_state(tp, TCP_CA_Recovery);
  	}
@@ -2100,6 +2108,24 @@
  	tp->snd_cwnd_stamp = tcp_time_stamp;
  }

+/* This is a wrapper function to handle RFC3465 - ABC. As per the RFC, 
the abc_L
+ * value defines a burst moderation to prevent sending large bursts of 
packets
+ * should an ack acknowledge many packets. abc_L MUST NOT be larger 
than 2. */
+static __inline__ void reno_cong_avoid_abc( struct tcp_sock *tp, int 
mss_now )
+{
+	int incrs_applied = 0;
+	
+	if (sysctl_tcp_abc && !tp->nonagle)
+	{
+		while (tp->bytes_acked > mss_now && incrs_applied < 
sysctl_tcp_abc_L) {
+			tp->bytes_acked -= mss_now;
+			reno_cong_avoid( tp );
+		}
+	} else
+		reno_cong_avoid( tp );
+}
+
+
  /* This is based on the congestion detection/avoidance scheme 
described in
   *    Lawrence S. Brakmo and Larry L. Peterson.
   *    "TCP Vegas: End to end congestion avoidance on a global 
internet."
@@ -2322,12 +2348,15 @@
  	tp->snd_cwnd_stamp = tcp_time_stamp;
  }

-static inline void tcp_cong_avoid(struct tcp_sock *tp, u32 ack, u32 
seq_rtt)
+static inline void tcp_cong_avoid(struct sock *sk, u32 ack, u32 
seq_rtt)
  {
+    	struct tcp_sock *tp = tcp_sk(sk);
+	int mss_now = tcp_current_mss(sk,1);
+
  	if (tcp_vegas_enabled(tp))
  		vegas_cong_avoid(tp, ack, seq_rtt);
  	else
-		reno_cong_avoid(tp);
+		reno_cong_avoid_abc(tp, mss_now);
  }

  /* Restart timer after forward progress on connection.
@@ -2890,6 +2919,9 @@
  	if (before(ack, prior_snd_una))
  		goto old_ack;

+	if ( sysctl_tcp_abc && tp->ca_state < TCP_CA_CWR )
+	    tp->bytes_acked += ack - prior_snd_una;	
+	
  	if (!(flag&FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
  		/* Window is constant, pure forward advance.
  		 * No more checks are required.
@@ -2940,12 +2972,12 @@
  		if ((flag & FLAG_DATA_ACKED) &&
  		    (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd) &&
  		    tcp_may_raise_cwnd(tp, flag))
-			tcp_cong_avoid(tp, ack, seq_rtt);
+			tcp_cong_avoid(sk, ack, seq_rtt);
  		tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag);
  	} else {
  		if ((flag & FLAG_DATA_ACKED) &&
  		    (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd))
-			tcp_cong_avoid(tp, ack, seq_rtt);
+			tcp_cong_avoid(sk, ack, seq_rtt);
  	}

  	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
Index: linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp_minisocks.c	Sun Feb 13 03:07:01 
2005
+++ linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c	Tue Feb 22 23:28:28 2005
@@ -769,6 +769,8 @@
  		newtp->snd_cwnd = 2;
  		newtp->snd_cwnd_cnt = 0;

+		newtp->bytes_acked = 0;
+
  		newtp->frto_counter = 0;
  		newtp->frto_highmark = 0;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23  1:04       ` Yee-Ting Li
@ 2005-02-23 15:28         ` Yee-Ting Li
  0 siblings, 0 replies; 15+ messages in thread
From: Yee-Ting Li @ 2005-02-23 15:28 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Stephen Hemminger, Yee-Ting Li, Baruch Even,
	Doug Leith

Opps! checking through the code, i've realised that i forgot to 
increment the incrs_applied counter to account for burst moderation. 
Please find enclosed the correct (full) implementation of RFC3465 (the 
only change from the previous is the addition of incrs_applied++ in the 
while loop).

 From our tests with Linux receivers, this burst moderation will make a 
difference at very high speeds (>200Mbit/sec) as they do not always 
acknowledge for every other packet.

Apologies for any inconvenience.

Yee.


On Feb 23, 2005, at 01:04, Yee-Ting Li wrote:

> On Feb 22, 2005, at 23:38, Baruch Even wrote:
>> We have a version of ABC (Appropriate Byte Counting) implementation 
>> of RFC 3465, which we hope to submit soon for inclusion in the kernel 
>> which should be a more appropriate solution for this. The RFC is a 
>> well defined standard whereas this patch has not received any 
>> reviewing by the networking community.
>
> Please find enclosed a version of our implementation of RFC3465 ABC 
> for Linux 2.6.11-rc4.
>
> There is in-built protection, as defined by the RFC, to prevent large 
> bursts of packets should acks arrive acknowledging more than abc_L 
> packets (sysctl_tcp_abc_L). The entire abc patch can be switched on or 
> off using sysctl_tcp_abc={1|0} respectively. As this is also a RFT, it 
> is switched ON by default and has the abc_L value of 2 which MAY be 
> used (according to the RFC).
>
> Note that an abc_L of 1 will be more conservative than what is 
> available with normal clocking of delayed acks. Note that there is 
> currently no built in mechanism to prevent abc_L being set to over 2; 
> the RFC defines that abc_L MUST NOT be greater than 2.
>
> This patch also has the advantage of working for all protocols 
> currently in the kernel (except vegas which doesn't require it).
>


Signed-off-by: Yee-Ting Li <Yee-Ting.Li@may.ie>

Index: linux-2.6.11-rc4/include/linux/sysctl.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/sysctl.h	Sun Feb 13 03:06:53 
2005
+++ linux-2.6.11-rc4/include/linux/sysctl.h	Tue Feb 22 23:48:30 2005
@@ -344,6 +344,8 @@
  	NET_TCP_DEFAULT_WIN_SCALE=105,
  	NET_TCP_MODERATE_RCVBUF=106,
  	NET_TCP_TSO_WIN_DIVISOR=107,
+	NET_TCP_ABC=108,
+	NET_TCP_ABC_L=109,
  };

  enum {
Index: linux-2.6.11-rc4/include/linux/tcp.h
===================================================================
--- linux-2.6.11-rc4.orig/include/linux/tcp.h	Sun Feb 13 03:06:23 2005
+++ linux-2.6.11-rc4/include/linux/tcp.h	Tue Feb 22 23:39:41 2005
@@ -366,6 +366,8 @@

  	__u32	total_retrans;	/* Total retransmits for entire connection */

+	__u32	bytes_acked;	/* Appropiate Byte Counting - RFC3465 */
+	
  	/* The syn_wait_lock is necessary only to avoid proc interface having
  	 * to grab the main lock sock while browsing the listening hash
  	 * (otherwise it's deadlock prone).
Index: linux-2.6.11-rc4/include/net/tcp.h
===================================================================
--- linux-2.6.11-rc4.orig/include/net/tcp.h	Sun Feb 13 03:05:28 2005
+++ linux-2.6.11-rc4/include/net/tcp.h	Tue Feb 22 23:47:59 2005
@@ -609,6 +609,10 @@
  extern int sysctl_tcp_moderate_rcvbuf;
  extern int sysctl_tcp_tso_win_divisor;

+/* RFC3465 - ABC */
+extern int sysctl_tcp_abc;
+extern int sysctl_tcp_abc_L;
+
  extern atomic_t tcp_memory_allocated;
  extern atomic_t tcp_sockets_allocated;
  extern int tcp_memory_pressure;
@@ -1366,6 +1370,7 @@
  static inline void tcp_enter_cwr(struct tcp_sock *tp)
  {
  	tp->prior_ssthresh = 0;
+	tp->bytes_acked=0;
  	if (tp->ca_state < TCP_CA_CWR) {
  		__tcp_enter_cwr(tp);
  		tcp_set_ca_state(tp, TCP_CA_CWR);
Index: linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/sysctl_net_ipv4.c	Sun Feb 13 
03:07:01 2005
+++ linux-2.6.11-rc4/net/ipv4/sysctl_net_ipv4.c	Tue Feb 22 23:46:18 2005
@@ -682,6 +682,22 @@
  		.mode		= 0644,
  		.proc_handler	= &proc_dointvec,
  	},
+    	{
+		.ctl_name	= NET_TCP_ABC,
+		.procname	= "tcp_abc",
+		.data		= &sysctl_tcp_abc,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+    	{
+		.ctl_name	= NET_TCP_ABC_L,
+		.procname	= "tcp_abc_L",
+		.data		= &sysctl_tcp_abc_L,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
  	{ .ctl_name = 0 }
  };

Index: linux-2.6.11-rc4/net/ipv4/tcp.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp.c	Sun Feb 13 03:05:50 2005
+++ linux-2.6.11-rc4/net/ipv4/tcp.c	Tue Feb 22 23:28:28 2005
@@ -1825,6 +1825,7 @@
  	tp->packets_out = 0;
  	tp->snd_ssthresh = 0x7fffffff;
  	tp->snd_cwnd_cnt = 0;
+	tp->bytes_acked = 0;
  	tcp_set_ca_state(tp, TCP_CA_Open);
  	tcp_clear_retrans(tp);
  	tcp_delack_init(tp);
Index: linux-2.6.11-rc4/net/ipv4/tcp_input.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp_input.c	Tue Feb 22 23:27:44 2005
+++ linux-2.6.11-rc4/net/ipv4/tcp_input.c	Wed Feb 23 15:18:57 2005
@@ -92,6 +92,11 @@

  int sysctl_tcp_moderate_rcvbuf = 1;

+/* RFC 3465 - ABC */
+int sysctl_tcp_abc = 1;
+int sysctl_tcp_abc_L = 2;   /* The RFC definess 1 as being a more 
conservative value */
+			    /* that SHOULD be used, however, we use 2 as it MAY be used */
+
  /* Default values of the Vegas variables, in fixed-point representation
   * with V_PARAM_SHIFT bits to the right of the binary point.
   */
@@ -1287,6 +1292,7 @@
  	tp->snd_cwnd_cnt   = 0;
  	tp->snd_cwnd_stamp = tcp_time_stamp;

+	tp->bytes_acked = 0;
  	tcp_clear_retrans(tp);

  	/* Push undo marker, if it was plain RTO and nothing
@@ -1945,6 +1951,8 @@
  			TCP_ECN_queue_cwr(tp);
  		}

+		tp->bytes_acked = 0;
+			
  		tp->snd_cwnd_cnt = 0;
  		tcp_set_ca_state(tp, TCP_CA_Recovery);
  	}
@@ -2100,6 +2108,25 @@
  	tp->snd_cwnd_stamp = tcp_time_stamp;
  }

+/* This is a wrapper function to handle RFC3465 - ABC. As per the RFC, 
the abc_L
+ * value defines a burst moderation to prevent sending large bursts of 
packets
+ * should an ack acknowledge many packets. abc_L MUST NOT be larger 
than 2. */
+static __inline__ void reno_cong_avoid_abc( struct tcp_sock *tp, int 
mss_now )
+{
+	int incrs_applied = 0;
+	
+	if (sysctl_tcp_abc && !tp->nonagle)
+	{
+		while (tp->bytes_acked > mss_now && incrs_applied < 
sysctl_tcp_abc_L) {
+			tp->bytes_acked -= mss_now;
+			reno_cong_avoid( tp );
+			incrs_applied++;
+		}
+	} else
+		reno_cong_avoid( tp );
+}
+
+
  /* This is based on the congestion detection/avoidance scheme 
described in
   *    Lawrence S. Brakmo and Larry L. Peterson.
   *    "TCP Vegas: End to end congestion avoidance on a global 
internet."
@@ -2322,12 +2349,15 @@
  	tp->snd_cwnd_stamp = tcp_time_stamp;
  }

-static inline void tcp_cong_avoid(struct tcp_sock *tp, u32 ack, u32 
seq_rtt)
+static inline void tcp_cong_avoid(struct sock *sk, u32 ack, u32 
seq_rtt)
  {
+    	struct tcp_sock *tp = tcp_sk(sk);
+	int mss_now = tcp_current_mss(sk,1);
+
  	if (tcp_vegas_enabled(tp))
  		vegas_cong_avoid(tp, ack, seq_rtt);
  	else
-		reno_cong_avoid(tp);
+		reno_cong_avoid_abc(tp, mss_now);
  }

  /* Restart timer after forward progress on connection.
@@ -2890,6 +2920,9 @@
  	if (before(ack, prior_snd_una))
  		goto old_ack;

+	if ( sysctl_tcp_abc && tp->ca_state < TCP_CA_CWR )
+	    tp->bytes_acked += ack - prior_snd_una;	
+	
  	if (!(flag&FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
  		/* Window is constant, pure forward advance.
  		 * No more checks are required.
@@ -2940,12 +2973,12 @@
  		if ((flag & FLAG_DATA_ACKED) &&
  		    (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd) &&
  		    tcp_may_raise_cwnd(tp, flag))
-			tcp_cong_avoid(tp, ack, seq_rtt);
+			tcp_cong_avoid(sk, ack, seq_rtt);
  		tcp_fastretrans_alert(sk, prior_snd_una, prior_packets, flag);
  	} else {
  		if ((flag & FLAG_DATA_ACKED) &&
  		    (tcp_vegas_enabled(tp) || prior_in_flight >= tp->snd_cwnd))
-			tcp_cong_avoid(tp, ack, seq_rtt);
+			tcp_cong_avoid(sk, ack, seq_rtt);
  	}

  	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
Index: linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c
===================================================================
--- linux-2.6.11-rc4.orig/net/ipv4/tcp_minisocks.c	Sun Feb 13 03:07:01 
2005
+++ linux-2.6.11-rc4/net/ipv4/tcp_minisocks.c	Tue Feb 22 23:28:28 2005
@@ -769,6 +769,8 @@
  		newtp->snd_cwnd = 2;
  		newtp->snd_cwnd_cnt = 0;

+		newtp->bytes_acked = 0;
+
  		newtp->frto_counter = 0;
  		newtp->frto_highmark = 0;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [RFT] BIC TCP delayed ack compensation
  2005-02-22 22:22 [RFT] BIC TCP delayed ack compensation Hubert Tonneau
  2005-02-23  0:58 ` Stephen Hemminger
@ 2005-02-23 18:32 ` Injong Rhee
  2005-02-23 19:36   ` Stephen Hemminger
  2005-02-23 18:37 ` Injong Rhee
  2 siblings, 1 reply; 15+ messages in thread
From: Injong Rhee @ 2005-02-23 18:32 UTC (permalink / raw)
  To: 'Hubert Tonneau', 'Stephen Hemminger',
	'cliff white'
  Cc: 'Alexey Kuznetsov', netdev, 'David S. Miller'



> -----Original Message-----
> From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org]
> Sent: Tuesday, February 22, 2005 5:23 PM
> 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB
> or data)
> 2.6.9 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds

It seems that there are other problems with this version of Linux. Is
there any way we can find out what the problems are. Is this with BIC?
If not, there are some parts not working. If it is with BIC, we would
like to look into this problem.

> 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
> 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [RFT] BIC TCP delayed ack compensation
  2005-02-22 22:22 [RFT] BIC TCP delayed ack compensation Hubert Tonneau
  2005-02-23  0:58 ` Stephen Hemminger
  2005-02-23 18:32 ` Injong Rhee
@ 2005-02-23 18:37 ` Injong Rhee
  2005-02-23 19:26   ` David S. Miller
  2 siblings, 1 reply; 15+ messages in thread
From: Injong Rhee @ 2005-02-23 18:37 UTC (permalink / raw)
  To: 'Hubert Tonneau', 'Stephen Hemminger',
	'cliff white'
  Cc: 'Alexey Kuznetsov', netdev, 'David S. Miller'



> -----Original Message-----
> From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org]
> Sent: Tuesday, February 22, 2005 5:23 PM
> 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB
> or data)
> 2.6.9 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds
> 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
> 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
> 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds

Another way to test whether this is related to the os or bic
implementation is to test it with our bic patch 1.1. + Linux 2.4. It
will tell whether the original implementation of BIC has something to
do with the performance with respect to MacOS. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23 18:37 ` Injong Rhee
@ 2005-02-23 19:26   ` David S. Miller
  2005-02-23 22:04     ` John Heffner
  0 siblings, 1 reply; 15+ messages in thread
From: David S. Miller @ 2005-02-23 19:26 UTC (permalink / raw)
  To: Injong Rhee; +Cc: hubert.tonneau, shemminger, cliffw, kuznet, netdev

On Wed, 23 Feb 2005 13:37:35 -0500
"Injong Rhee" <rhee@eos.ncsu.edu> wrote:

> 
> 
> > -----Original Message-----
> > From: Hubert Tonneau [mailto:hubert.tonneau@fullpliant.org]
> > Sent: Tuesday, February 22, 2005 5:23 PM
> > 2.6.9 to 100 Mbps connected MacOSX: 15 seconds (for roughly 100 MB
> > or data)
> > 2.6.9 to gigabit connected MacOSX: 5 seconds
> > 2.6.10-ac11 to 100 Mbps connected MacOSX: 325 seconds
> > 2.6.10-ac11 to gigabit connected MacOSX: 5 seconds
> > 2.6.10-ac11+BIC to 100 Mbps connected MacOSX: 620 seconds
> > 2.6.10-ac11+BIC to gigabit connected MacOSX: 5 seconds
> 
> Another way to test whether this is related to the os or bic
> implementation is to test it with our bic patch 1.1. + Linux 2.4. It
> will tell whether the original implementation of BIC has something to
> do with the performance with respect to MacOS. 

I don't think BIC has much to do with this problem.  MacOS-X does delayed
ACKs until a PSH is seen and this kills performance if we don't PSH often
enough.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23 18:32 ` Injong Rhee
@ 2005-02-23 19:36   ` Stephen Hemminger
  0 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2005-02-23 19:36 UTC (permalink / raw)
  To: Injong Rhee
  Cc: 'Hubert Tonneau', 'cliff white',
	'Alexey Kuznetsov', netdev, 'David S. Miller'

An interesting test would be to repeat the slow case: 2.6.10-ac11 over 100Mbps

With first TCP Reno (old default).
	sysctl -w net.ipv4.tcp_bic=0
then TCP Westwood.
	sysctl -w net.ipv4.tcp_bic=0
	sysctl -w net.ipv4.tcp_westwood=1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
@ 2005-02-23 21:54 Hubert Tonneau
  0 siblings, 0 replies; 15+ messages in thread
From: Hubert Tonneau @ 2005-02-23 21:54 UTC (permalink / raw)
  To: Stephen Hemminger, Injong Rhee
  Cc: 'cliff white', 'Alexey Kuznetsov', netdev,
	'David S. Miller'

Stephen Hemminger wrote:
>
> An interesting test would be to repeat the slow case: 2.6.10-ac11 over 100Mbps
> 
> With first TCP Reno (old default).
> 	sysctl -w net.ipv4.tcp_bic=0

No change.

> then TCP Westwood.
> 	sysctl -w net.ipv4.tcp_bic=0
> 	sysctl -w net.ipv4.tcp_westwood=1

No change.

Now Linux 2.6.11-rc4 with Injong Rhee abc patch:

No change.

Looks like David S. Miller is right.
Now, what I still don't understand is, if it's PSH/ACK related, why does
the gigabit connected Mac works nicely whereas the 100 Mbps connected one
does not ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23 19:26   ` David S. Miller
@ 2005-02-23 22:04     ` John Heffner
  2005-02-23 22:10       ` David S. Miller
  0 siblings, 1 reply; 15+ messages in thread
From: John Heffner @ 2005-02-23 22:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: hubert.tonneau, netdev

On Wed, 23 Feb 2005, David S. Miller wrote:

> I don't think BIC has much to do with this problem.  MacOS-X does delayed
> ACKs until a PSH is seen and this kills performance if we don't PSH often
> enough.


I looked at the trace last night and I wonder if PSH is a red herring.
For example:

16:42:21.837931 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37545601 win 57184 <nop,nop,timestamp 1709240872 641486>
16:42:21.837937 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37545601:37547049(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837940 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37547049:37548497(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837941 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37548497:37549945(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837943 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37549945:37551393(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837945 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37551393:37552841(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837947 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37552841:37554289(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.837949 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37554289:37555737(1448) ack 122802 win 1460 <nop,nop,timestamp 641685 1709240872> NBT Packet
16:42:21.838979 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37552841 win 65535 <nop,nop,timestamp 1709240872 641685>
16:42:21.838985 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37555737:37557185(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.838987 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37557185:37558633(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.838989 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37558633:37560081(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.838991 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37560081:37561529(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.838992 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37561529:37562977(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.838994 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: . 37562977:37564425(1448) ack 122802 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:21.839172 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: P 122802:122853(51) ack 37554289 win 65128 <nop,nop,timestamp 1709240872 641685> NBT Packet
16:42:21.839178 IP 10.107.96.7.32801 > 10.107.96.230.netbios-ssn: P 37564425:37565873(1448) ack 122853 win 1460 <nop,nop,timestamp 641686 1709240872> NBT Packet
16:42:22.037976 IP 10.107.96.230.netbios-ssn > 10.107.96.7.32801: . ack 37565873 win 53548 <nop,nop,timestamp 1709240872 641685>

Maybe this has something to do with the bi-directional nature of the flow?
Mac OS delaying ACK to try to piggyback on data or something like that.
One signature I noticed is that it seems the last packet sent by the Mac
before the long delack timeout is always a small data packet.  (I didn't
rigorously verify this but it seems true.)

  -John

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23 22:04     ` John Heffner
@ 2005-02-23 22:10       ` David S. Miller
  2005-02-23 22:19         ` John Heffner
  0 siblings, 1 reply; 15+ messages in thread
From: David S. Miller @ 2005-02-23 22:10 UTC (permalink / raw)
  To: John Heffner; +Cc: hubert.tonneau, netdev

On Wed, 23 Feb 2005 17:04:08 -0500 (EST)
John Heffner <jheffner@psc.edu> wrote:

> Maybe this has something to do with the bi-directional nature of the flow?
> Mac OS delaying ACK to try to piggyback on data or something like that.
> One signature I noticed is that it seems the last packet sent by the Mac
> before the long delack timeout is always a small data packet.  (I didn't
> rigorously verify this but it seems true.)

I should be more specific when I say "PSH".  Mac OS-X's algorithm is basically
that it always delays ACKs to the delayed ACK timeout when the header prediction
fast path is hit.  One way to "miss" the header prediction fast path is to
set PSH (this is actually a bug, Linux fixed this long ago, PSH should be ignored
for header prediction fast path checking).

When the fast path is missed, it does the usual "every 2 full sized frames"
ACK'ing.

Out of order data can cause the missing of the fast path as well.
That can only be determined if we had dumps from the Mac's perspective
however.

Anyways, this Mac OS-X behavior has pretty much been universally agreed
to as a severe bug, at least on this list :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFT] BIC TCP delayed ack compensation
  2005-02-23 22:10       ` David S. Miller
@ 2005-02-23 22:19         ` John Heffner
  0 siblings, 0 replies; 15+ messages in thread
From: John Heffner @ 2005-02-23 22:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: hubert.tonneau, netdev

On Wed, 23 Feb 2005, David S. Miller wrote:

> On Wed, 23 Feb 2005 17:04:08 -0500 (EST)
> John Heffner <jheffner@psc.edu> wrote:
>
> > Maybe this has something to do with the bi-directional nature of the flow?
> > Mac OS delaying ACK to try to piggyback on data or something like that.
> > One signature I noticed is that it seems the last packet sent by the Mac
> > before the long delack timeout is always a small data packet.  (I didn't
> > rigorously verify this but it seems true.)
>
> I should be more specific when I say "PSH".  Mac OS-X's algorithm is basically
> that it always delays ACKs to the delayed ACK timeout when the header prediction
> fast path is hit.  One way to "miss" the header prediction fast path is to
> set PSH (this is actually a bug, Linux fixed this long ago, PSH should be ignored
> for header prediction fast path checking).

The point is it appears to be delaying ack even when PSH is set.


> Anyways, this Mac OS-X behavior has pretty much been universally agreed
> to as a severe bug, at least on this list :-)

Yep.  The Mac behavior is clearly bizarre. :)

  -John

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-02-23 22:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-22 22:22 [RFT] BIC TCP delayed ack compensation Hubert Tonneau
2005-02-23  0:58 ` Stephen Hemminger
2005-02-23 18:32 ` Injong Rhee
2005-02-23 19:36   ` Stephen Hemminger
2005-02-23 18:37 ` Injong Rhee
2005-02-23 19:26   ` David S. Miller
2005-02-23 22:04     ` John Heffner
2005-02-23 22:10       ` David S. Miller
2005-02-23 22:19         ` John Heffner
  -- strict thread matches above, loose matches on Subject: below --
2005-02-23 21:54 Hubert Tonneau
     [not found] <050QTJA12@server5.heliogroup.fr>
2005-02-09 18:59 ` 2.6.10 TCP troubles -- suggested patch Stephen Hemminger
2005-02-22 21:50   ` [RFT] BIC TCP delayed ack compensation Stephen Hemminger
2005-02-22 23:30     ` John Heffner
2005-02-22 23:38     ` Baruch Even
2005-02-23  1:04       ` Yee-Ting Li
2005-02-23 15:28         ` Yee-Ting Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).