TCP Pacing

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TCP Pacing
@ 2006-09-12 17:58 Daniele Lacamera
  2006-09-12 18:21 ` Arnaldo Carvalho de Melo
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Daniele Lacamera @ 2006-09-12 17:58 UTC (permalink / raw)
  To: Stephen Hemminger, David S. Miller
  Cc: netdev, Carlo Caini, Rosario Firrincieli, Giovanni Pau

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Hello,

Please let me insist once again on the importance of adding a TCP Pacing 
mechanism in our TCP, as many people are including this algorithm in 
their congestion control proposals. Recent researches have found out 
that it really can help improving performance in different scenarios, 
like satellites and long-delay high-speed channels (>100ms RTT, Gbit). 
Hybla module itself is cripple without this feature in its natural 
scenario. 

The following patch is totally non-invasive: it has a config option and 
a sysctl switch, both turned off by default. When the config option is 
enabled, it adds only 6B to the tcp_sock.

Signed-off by: Daniele Lacamera <root@danielinux.net>
--- 







[-- Attachment #2: TCP_Pacing.diff --]
[-- Type: text/x-diff, Size: 10098 bytes --]

diff -ruN linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt linux-pacing/Documentation/networking/ip-sysctl.txt
--- linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/Documentation/networking/ip-sysctl.txt	2006-09-12 16:38:14.000000000 +0200
@@ -369,6 +369,12 @@
 	be timed out after an idle period.
 	Default: 1
 
+tcp_pacing - BOOLEAN
+	If set, enable time-based TCP segment sending, instead of normal
+	ack-based sending. A software timer is set every time a new ack 
+	is received, then packets are spreaded across round-trip time.
+	Default: 0
+
 IP Variables:
 
 ip_local_port_range - 2 INTEGERS
diff -ruN linux-2.6.18-rc6/include/linux/sysctl.h linux-pacing/include/linux/sysctl.h
--- linux-2.6.18-rc6/include/linux/sysctl.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/linux/sysctl.h	2006-09-12 18:13:38.000000000 +0200
@@ -411,6 +411,7 @@
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
 	NET_TCP_DMA_COPYBREAK=116,
 	NET_TCP_SLOW_START_AFTER_IDLE=117,
+	NET_TCP_PACING=118,
 };
 
 enum {
diff -ruN linux-2.6.18-rc6/include/linux/tcp.h linux-pacing/include/linux/tcp.h
--- linux-2.6.18-rc6/include/linux/tcp.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/linux/tcp.h	2006-09-12 16:45:32.000000000 +0200
@@ -356,6 +356,17 @@
 		__u32		  probe_seq_start;
 		__u32		  probe_seq_end;
 	} mtu_probe;
+	
+#ifdef CONFIG_TCP_PACING
+/* TCP Pacing structure */
+	struct {
+		struct timer_list timer;
+		__u16   count;
+		__u16   burst;
+		__u8    lock;
+		__u8    delta;
+	} pacing;
+#endif
 };
 
 static inline struct tcp_sock *tcp_sk(const struct sock *sk)
diff -ruN linux-2.6.18-rc6/include/net/tcp.h linux-pacing/include/net/tcp.h
--- linux-2.6.18-rc6/include/net/tcp.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/net/tcp.h	2006-09-12 17:07:49.000000000 +0200
@@ -227,6 +227,9 @@
 extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
+#ifdef CONFIG_TCP_PACING
+extern int sysctl_tcp_pacing;
+#endif
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
@@ -449,6 +452,11 @@
 extern unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
 extern unsigned int tcp_current_mss(struct sock *sk, int large);
 
+#ifdef CONFIG_TCP_PACING
+extern void tcp_pacing_recalc_delta(struct sock *sk);
+extern void tcp_pacing_reset_timer(struct sock *sk);
+#endif
+
 /* tcp.c */
 extern void tcp_get_info(struct sock *, struct tcp_info *);
 
diff -ruN linux-2.6.18-rc6/net/ipv4/Kconfig linux-pacing/net/ipv4/Kconfig
--- linux-2.6.18-rc6/net/ipv4/Kconfig	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/Kconfig	2006-09-12 16:59:37.000000000 +0200
@@ -572,6 +572,20 @@
 	loss packets.
 	See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+config TCP_PACING
+	bool "TCP Pacing"
+	depends on EXPERIMENTAL
+	select HZ_1000
+	default n
+	---help---
+	Many researchers have observed that TCP's congestion control mechanisms 
+	can lead to bursty traffic flows on modern high-speed networks, with a 
+	negative impact on overall network efficiency. A proposed solution to this 
+	problem is to evenly space, or "pace", data sent into the network over an 
+	entire round-trip time, so that data is not sent in a burst.
+	To enable this feature, please refer to Documentation/networking/ip-sysctl.txt.
+	If unsure, say N.
+	
 endmenu
 
 config TCP_CONG_BIC
diff -ruN linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c linux-pacing/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/sysctl_net_ipv4.c	2006-09-12 18:33:36.000000000 +0200
@@ -697,6 +697,16 @@
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
+#ifdef CONFIG_TCP_PACING
+	{
+		.ctl_name	= NET_TCP_PACING,
+		.procname	= "tcp_pacing",
+		.data		= &sysctl_tcp_pacing,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+#endif
 	{ .ctl_name = 0 }
 };
 
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_input.c linux-pacing/net/ipv4/tcp_input.c
--- linux-2.6.18-rc6/net/ipv4/tcp_input.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_input.c	2006-09-12 17:11:38.000000000 +0200
@@ -2569,6 +2569,11 @@
 			tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1);
 	}
 
+#ifdef CONFIG_TCP_PACING
+	if(sysctl_tcp_pacing)
+		tcp_pacing_recalc_delta(sk);
+#endif
+
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
 		dst_confirm(sk->sk_dst_cache);
 
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_output.c linux-pacing/net/ipv4/tcp_output.c
--- linux-2.6.18-rc6/net/ipv4/tcp_output.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_output.c	2006-09-12 18:12:38.000000000 +0200
@@ -62,6 +62,10 @@
 /* By default, RFC2861 behavior.  */
 int sysctl_tcp_slow_start_after_idle = 1;
 
+#ifdef CONFIG_TCP_PACING
+int sysctl_tcp_pacing=0;
+#endif
+
 static void update_send_head(struct sock *sk, struct tcp_sock *tp,
 			     struct sk_buff *skb)
 {
@@ -414,7 +418,13 @@
 		
 	if (tcp_packets_in_flight(tp) == 0)
 		tcp_ca_event(sk, CA_EVENT_TX_START);
-
+	
+#ifdef CONFIG_TCP_PACING
+	if(sysctl_tcp_pacing) {
+		tcp_pacing_reset_timer(sk);
+		tp->pacing.lock = 1;
+	}
+#endif
 	th = (struct tcphdr *) skb_push(skb, tcp_header_size);
 	skb->h.th = th;
 	skb_set_owner_w(skb, sk);
@@ -1085,7 +1095,15 @@
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 send_win, cong_win, limit, in_flight;
-
+	
+#ifdef CONFIG_TCP_PACING
+	/* TCP Pacing conflicts with this algorithm.
+	 * When Pacing is enabled, don't try to defer.
+	 */
+	if(sysctl_tcp_pacing)
+		return 0;
+#endif
+	
 	if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)
 		return 0;
 
@@ -1308,7 +1326,12 @@
 
 		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
 			break;
-
+		
+#ifdef CONFIG_TCP_PACING
+		if (sysctl_tcp_pacing && tp->pacing.lock)
+			return 0;
+#endif
+		
 		if (tso_segs == 1) {
 			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
 						     (tcp_skb_is_last(sk, skb) ?
@@ -1323,6 +1346,10 @@
 		if (tso_segs > 1) {
 			limit = tcp_window_allows(tp, skb,
 						  mss_now, cwnd_quota);
+#ifdef CONFIG_TCP_PACING
+		if (sysctl_tcp_pacing && sent_pkts >= tp->pacing.burst)
+			tp->pacing.lock=1;
+#endif
 
 			if (skb->len < limit) {
 				unsigned int trim = skb->len % mss_now;
@@ -1733,6 +1760,11 @@
 		}
 	}
 
+#ifdef CONFIG_TCP_PACING
+	if (sysctl_tcp_pacing && tp->pacing.lock)
+		return -EAGAIN;
+#endif
+
 	/* Make a copy, if the first transmission SKB clone we made
 	 * is still in somebody's hands, else make a clone.
 	 */
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_timer.c linux-pacing/net/ipv4/tcp_timer.c
--- linux-2.6.18-rc6/net/ipv4/tcp_timer.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_timer.c	2006-09-12 18:03:17.000000000 +0200
@@ -36,10 +36,21 @@
 static void tcp_delack_timer(unsigned long);
 static void tcp_keepalive_timer (unsigned long data);
 
+#ifdef CONFIG_TCP_PACING
+static void tcp_pacing_timer(unsigned long data);
+#endif
+
 void tcp_init_xmit_timers(struct sock *sk)
 {
 	inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
 				  &tcp_keepalive_timer);
+	
+#ifdef CONFIG_TCP_PACING
+	init_timer(&(tcp_sk(sk)->pacing.timer));
+	tcp_sk(sk)->pacing.timer.function=&tcp_pacing_timer;
+	tcp_sk(sk)->pacing.timer.data = (unsigned long) sk;
+#endif
+
 }
 
 EXPORT_SYMBOL(tcp_init_xmit_timers);
@@ -522,3 +533,115 @@
 	bh_unlock_sock(sk);
 	sock_put(sk);
 }
+
+#ifdef CONFIG_TCP_PACING
+/*
+ * This is the timer used to spread packets.
+ * a delta value is computed on rtt/cwnd,
+ * and will be our expire interval.
+ * The timer has to be restarted when a segment is sent out.
+ */
+static void tcp_pacing_timer(unsigned long data)
+{
+	struct sock *sk = (struct sock*)data;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if(!sysctl_tcp_pacing)
+		return;
+
+	bh_lock_sock(sk);
+	if (sock_owned_by_user(sk)) {
+		/* Try again later */
+		if (!mod_timer(&tp->pacing.timer, jiffies + 1))
+			sock_hold(sk);
+		goto out_unlock;
+	}
+
+	if (sk->sk_state == TCP_CLOSE)
+		goto out;
+
+	/* Unlock sending, so when next ack is received it will pass.
+	 *If there are no packets scheduled, do nothing.
+	 */
+	tp->pacing.lock=0;
+	
+	if(!sk->sk_send_head){
+		/* Sending queue empty */
+		goto out;
+	}
+	
+	/*  Handler */
+	tcp_push_pending_frames(sk,tp);
+
+	out:
+	if (tcp_memory_pressure)
+		sk_stream_mem_reclaim(sk);
+
+	out_unlock:
+		bh_unlock_sock(sk);
+		sock_put(sk);
+}
+
+void tcp_pacing_reset_timer(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	__u32 timeout = jiffies+tp->pacing.delta;
+
+	if(!sysctl_tcp_pacing)
+		return;
+	if (!mod_timer(&tp->pacing.timer, timeout))
+			sock_hold(sk);
+}
+EXPORT_SYMBOL(tcp_pacing_reset_timer);
+
+/*
+ * This routine computes tcp_pacing delay, using
+ * a simplified uniform pacing policy.
+ */
+void tcp_pacing_recalc_delta(struct sock *sk)
+{
+       struct tcp_sock *tp=tcp_sk(sk);
+       __u32 window=(tp->snd_cwnd)<<3;
+       __u32 srtt = tp->srtt;
+       __u32 round=0;
+       __u32 curmss=tp->mss_cache;
+       int state=inet_csk(sk)->icsk_ca_state;
+
+       if( (state==TCP_CA_Recovery) &&(tp->snd_cwnd < tp->snd_ssthresh))
+		window=(tp->snd_ssthresh)<<3;
+
+       if( (tp->snd_wnd/curmss) < tp->snd_cwnd )
+		window = (tp->snd_wnd/curmss)<<3;
+
+       if (window>1 && srtt){
+               if (window <= srtt){
+                       tp->pacing.delta=(srtt/window);
+			if(srtt%window)
+				round=( (srtt/(srtt%window)) / tp->pacing.delta);
+			if (tp->pacing.count >= (round-1) &&(round>1)){
+				tp->pacing.delta++;
+				tp->pacing.count=0;
+			}
+			tp->pacing.burst=1;
+		} else {
+			tp->pacing.delta=1;
+			tp->pacing.burst=(window/srtt);
+			if(window%srtt)
+				round=( (window/(window%srtt)) * tp->pacing.burst);
+			if (tp->pacing.count >= (round-1) && (round>1)){
+				tp->pacing.burst++;
+				tp->pacing.count=0;
+			}
+		}
+	} else {
+		tp->pacing.delta=0;
+		tp->pacing.burst=1;
+       }
+}
+
+EXPORT_SYMBOL(tcp_pacing_recalc_delta);
+
+#endif
+
+
+

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-12 17:58 TCP Pacing Daniele Lacamera
@ 2006-09-12 18:21 ` Arnaldo Carvalho de Melo
  2006-09-12 21:26 ` Ian McDonald
  2006-09-13  3:41 ` Stephen Hemminger
  2 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2006-09-12 18:21 UTC (permalink / raw)
  To: root
  Cc: Stephen Hemminger, David S. Miller, netdev, Carlo Caini,
	Rosario Firrincieli, Giovanni Pau

On 9/12/06, Daniele Lacamera <root@danielinux.net> wrote:
> Hello,
>
> Please let me insist once again on the importance of adding a TCP Pacing
> mechanism in our TCP, as many people are including this algorithm in
> their congestion control proposals. Recent researches have found out
> that it really can help improving performance in different scenarios,
> like satellites and long-delay high-speed channels (>100ms RTT, Gbit).
> Hybla module itself is cripple without this feature in its natural
> scenario.
>
> The following patch is totally non-invasive: it has a config option and
> a sysctl switch, both turned off by default. When the config option is
> enabled, it adds only 6B to the tcp_sock.
>
> Signed-off by: Daniele Lacamera <root@danielinux.net>
> ---
>
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_input.c
linux-pacing/net/ipv4/tcp_input.c
--- linux-2.6.18-rc6/net/ipv4/tcp_input.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_input.c	2006-09-12 17:11:38.000000000 +0200
@@ -2569,6 +2569,11 @@
 			tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1);
 	}

Without getting into the merits of the pacing technique:

+#ifdef CONFIG_TCP_PACING
+	if(sysctl_tcp_pacing)
+		tcp_pacing_recalc_delta(sk);
+#endif

Please rewrite the patch so as to avoid adding that many #ifdefs to
the common code, replacing above code with:

tcp_pacing_recalc_delta(sk);

That is defined in a header (net/tcp.h) as:

#ifdef CONFIG_TCP_PACING
extern void __tcp_pacing_recalc_delta(struct sock *sk);
extern int sysctl_tcp_pacing;

static inline void tcp_pacing_recalc_delta(struct sock *sk)
{
        if (sysctl_tcp_pacing) /* notice the space after ( */
               __tcp_pacing_recalc_delta(sk);
}
#else
static inline void tcp_pacing_recalc_delta(struct sock *sk) {};
#endif

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-12 17:58 TCP Pacing Daniele Lacamera
  2006-09-12 18:21 ` Arnaldo Carvalho de Melo
@ 2006-09-12 21:26 ` Ian McDonald
  2006-09-13  8:18   ` Daniele Lacamera
  2006-09-13  3:41 ` Stephen Hemminger
  2 siblings, 1 reply; 11+ messages in thread
From: Ian McDonald @ 2006-09-12 21:26 UTC (permalink / raw)
  To: root
  Cc: Stephen Hemminger, David S. Miller, netdev, Carlo Caini,
	Rosario Firrincieli, Giovanni Pau

On 9/13/06, Daniele Lacamera <root@danielinux.net> wrote:
> Hello,
>
> Please let me insist once again on the importance of adding a TCP Pacing
> mechanism in our TCP, as many people are including this algorithm in
> their congestion control proposals. Recent researches have found out
> that it really can help improving performance in different scenarios,
> like satellites and long-delay high-speed channels (>100ms RTT, Gbit).
> Hybla module itself is cripple without this feature in its natural
> scenario.

Where is the published research? If you are going to mention research
you need URLs to papers and please put this in source code too so
people can check.
>
> The following patch is totally non-invasive: it has a config option and
> a sysctl switch, both turned off by default. When the config option is
> enabled, it adds only 6B to the tcp_sock.
>
I agree with Arnaldo's comments and also would add I don't like having
to select 1000 as HZ unit. Something is wrong if you need this as I
can run higher resolution timers without having to do this....

Haven't reviewed the rest of the code or tested.

Ian
-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-12 17:58 TCP Pacing Daniele Lacamera
  2006-09-12 18:21 ` Arnaldo Carvalho de Melo
  2006-09-12 21:26 ` Ian McDonald
@ 2006-09-13  3:41 ` Stephen Hemminger
  2006-09-13  8:18   ` Daniele Lacamera
  2 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2006-09-13  3:41 UTC (permalink / raw)
  To: root
  Cc: David S. Miller, netdev, Carlo Caini, Rosario Firrincieli,
	Giovanni Pau

On Tue, 12 Sep 2006 19:58:21 +0200
Daniele Lacamera <root@danielinux.net> wrote:

> Hello,
> 
> Please let me insist once again on the importance of adding a TCP Pacing 
> mechanism in our TCP, as many people are including this algorithm in 
> their congestion control proposals. Recent researches have found out 
> that it really can help improving performance in different scenarios, 
> like satellites and long-delay high-speed channels (>100ms RTT, Gbit). 
> Hybla module itself is cripple without this feature in its natural 
> scenario. 
> 
> The following patch is totally non-invasive: it has a config option and 
> a sysctl switch, both turned off by default. When the config option is 
> enabled, it adds only 6B to the tcp_sock.

Yes, but tcp_sock is already greater than 1024 on 64 bit, and needs
a diet.

> 
> Signed-off by: Daniele Lacamera <root@danielinux.net>

Pacing in itself isn't a bad idea, but:
  * Code needs to follow standard whitespace rules
 	- blanks around operators		
	- blank after keyword
	- Avoid (needless) paraenthesis
Bad:
       if( (state==TCP_CA_Recovery) &&(tp->snd_cwnd <
tp->snd_ssthresh))
		window=(tp->snd_ssthresh)<<3;
Good:
	if (state == TCP_CA_Recovery && tp->snd_cwnd < tp->snd_ssthresh)
		window = tp->snd_ssthresh << 3;

  * Since it is most useful over long delay links, maybe it should
    be a route parameter.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-13  3:41 ` Stephen Hemminger
@ 2006-09-13  8:18   ` Daniele Lacamera
  2006-09-14  1:21     ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Daniele Lacamera @ 2006-09-13  8:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, Carlo Caini, Rosario Firrincieli,
	Giovanni Pau

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote:
> Pacing in itself isn't a bad idea, but:
<cut>
> * Since it is most useful over long delay links, maybe it should be a 
route parameter.

What does this mean? Should I move the sysctl switch elsewhere?

A new (cleaner) patch follows.
Thanks to you all for your attention & advices.

Signed-off by: Daniele Lacamera <root@danielinux.net>
--- 


[-- Attachment #2: TCP_Pacing.diff --]
[-- Type: text/x-diff, Size: 10844 bytes --]

diff -ruN linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt linux-pacing/Documentation/networking/ip-sysctl.txt
--- linux-2.6.18-rc6/Documentation/networking/ip-sysctl.txt	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/Documentation/networking/ip-sysctl.txt	2006-09-12 16:38:14.000000000 +0200
@@ -369,6 +369,12 @@
 	be timed out after an idle period.
 	Default: 1
 
+tcp_pacing - BOOLEAN
+	If set, enable time-based TCP segment sending, instead of normal
+	ack-based sending. A software timer is set every time a new ack 
+	is received, then packets are spreaded across round-trip time.
+	Default: 0
+
 IP Variables:
 
 ip_local_port_range - 2 INTEGERS
diff -ruN linux-2.6.18-rc6/include/linux/sysctl.h linux-pacing/include/linux/sysctl.h
--- linux-2.6.18-rc6/include/linux/sysctl.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/linux/sysctl.h	2006-09-12 18:13:38.000000000 +0200
@@ -411,6 +411,7 @@
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
 	NET_TCP_DMA_COPYBREAK=116,
 	NET_TCP_SLOW_START_AFTER_IDLE=117,
+	NET_TCP_PACING=118,
 };
 
 enum {
diff -ruN linux-2.6.18-rc6/include/linux/tcp.h linux-pacing/include/linux/tcp.h
--- linux-2.6.18-rc6/include/linux/tcp.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/linux/tcp.h	2006-09-12 16:45:32.000000000 +0200
@@ -356,6 +356,17 @@
 		__u32		  probe_seq_start;
 		__u32		  probe_seq_end;
 	} mtu_probe;
+	
+#ifdef CONFIG_TCP_PACING
+/* TCP Pacing structure */
+	struct {
+		struct timer_list timer;
+		__u16   count;
+		__u16   burst;
+		__u8    lock;
+		__u8    delta;
+	} pacing;
+#endif
 };
 
 static inline struct tcp_sock *tcp_sk(const struct sock *sk)
diff -ruN linux-2.6.18-rc6/include/net/tcp.h linux-pacing/include/net/tcp.h
--- linux-2.6.18-rc6/include/net/tcp.h	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/include/net/tcp.h	2006-09-13 09:33:02.000000000 +0200
@@ -449,6 +449,58 @@
 extern unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
 extern unsigned int tcp_current_mss(struct sock *sk, int large);
 
+#ifdef CONFIG_TCP_PACING
+extern int sysctl_tcp_pacing;
+extern void __tcp_pacing_recalc_delta(struct sock *sk);
+extern void __tcp_pacing_reset_timer(struct sock *sk);
+static inline void tcp_pacing_recalc_delta(struct sock *sk)
+{
+	if (sysctl_tcp_pacing) 
+		__tcp_pacing_recalc_delta(sk);
+}
+
+static inline void tcp_pacing_reset_timer(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		__tcp_pacing_reset_timer(sk);
+}
+
+static inline void tcp_pacing_lock_tx(struct sock *sk)
+{
+	if (sysctl_tcp_pacing) 
+		tcp_sk(sk)->pacing.lock=1;
+}
+
+static inline int tcp_pacing_locked(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		return tcp_sk(sk)->pacing.lock;
+	else 
+		return 0;
+}
+
+static inline int tcp_pacing_enabled(struct sock *sk)
+{
+	return sysctl_tcp_pacing;
+}
+
+static inline int tcp_pacing_burst(struct sock *sk)
+{
+	if (sysctl_tcp_pacing)
+		return tcp_sk(sk)->pacing.burst;
+	else 
+		return 0;
+}
+	
+#else
+static inline void tcp_pacing_recalc_delta(struct sock *sk) {};
+static inline void tcp_pacing_reset_timer(struct sock *sk) {};
+static inline void tcp_pacing_lock_tx(struct sock *sk) {};
+#define tcp_pacing_locked(sk) 0 
+#define tcp_pacing_enabled(sk) 0
+#define tcp_pacing_burst(sk) 0
+#endif
+
 /* tcp.c */
 extern void tcp_get_info(struct sock *, struct tcp_info *);
 
diff -ruN linux-2.6.18-rc6/net/ipv4/Kconfig linux-pacing/net/ipv4/Kconfig
--- linux-2.6.18-rc6/net/ipv4/Kconfig	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/Kconfig	2006-09-13 09:31:27.000000000 +0200
@@ -572,6 +572,19 @@
 	loss packets.
 	See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf
 
+config TCP_PACING
+	bool "TCP Pacing"
+	depends on EXPERIMENTAL
+	default n
+	---help---
+	Many researchers have observed that TCP's congestion control mechanisms 
+	can lead to bursty traffic flows on modern high-speed networks, with a 
+	negative impact on overall network efficiency. A proposed solution to this 
+	problem is to evenly space, or "pace", data sent into the network over an 
+	entire round-trip time, so that data is not sent in a burst.
+	To enable this feature, please refer to Documentation/networking/ip-sysctl.txt.
+	If unsure, say N.
+	
 endmenu
 
 config TCP_CONG_BIC
diff -ruN linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c linux-pacing/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.18-rc6/net/ipv4/sysctl_net_ipv4.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/sysctl_net_ipv4.c	2006-09-12 18:33:36.000000000 +0200
@@ -697,6 +697,16 @@
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
+#ifdef CONFIG_TCP_PACING
+	{
+		.ctl_name	= NET_TCP_PACING,
+		.procname	= "tcp_pacing",
+		.data		= &sysctl_tcp_pacing,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+#endif
 	{ .ctl_name = 0 }
 };
 
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_input.c linux-pacing/net/ipv4/tcp_input.c
--- linux-2.6.18-rc6/net/ipv4/tcp_input.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_input.c	2006-09-13 08:08:32.000000000 +0200
@@ -2569,6 +2569,8 @@
 			tcp_cong_avoid(sk, ack, seq_rtt, prior_in_flight, 1);
 	}
 
+	tcp_pacing_recalc_delta(sk);
+
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
 		dst_confirm(sk->sk_dst_cache);
 
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_output.c linux-pacing/net/ipv4/tcp_output.c
--- linux-2.6.18-rc6/net/ipv4/tcp_output.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_output.c	2006-09-13 09:19:05.000000000 +0200
@@ -414,6 +414,9 @@
 		
 	if (tcp_packets_in_flight(tp) == 0)
 		tcp_ca_event(sk, CA_EVENT_TX_START);
+	
+	tcp_pacing_reset_timer(sk);
+	tcp_pacing_lock_tx(sk);
 
 	th = (struct tcphdr *) skb_push(skb, tcp_header_size);
 	skb->h.th = th;
@@ -1086,6 +1089,12 @@
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 send_win, cong_win, limit, in_flight;
 
+	/* TCP Pacing conflicts with this algorithm.
+	 * When Pacing is enabled, don't try to defer.
+	 */
+	if (tcp_pacing_enabled(sk))
+		return 0;
+	
 	if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)
 		return 0;
 
@@ -1309,6 +1318,9 @@
 		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
 			break;
 
+		if (tcp_pacing_locked(sk))
+			return 0;
+		
 		if (tso_segs == 1) {
 			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
 						     (tcp_skb_is_last(sk, skb) ?
@@ -1323,6 +1335,8 @@
 		if (tso_segs > 1) {
 			limit = tcp_window_allows(tp, skb,
 						  mss_now, cwnd_quota);
+		if (tcp_pacing_enabled(sk) && sent_pkts >= tcp_pacing_burst(sk))
+			tcp_pacing_lock_tx(sk);
 
 			if (skb->len < limit) {
 				unsigned int trim = skb->len % mss_now;
@@ -1733,6 +1747,9 @@
 		}
 	}
 
+	if (tcp_pacing_locked(sk))
+		return -EAGAIN;
+
 	/* Make a copy, if the first transmission SKB clone we made
 	 * is still in somebody's hands, else make a clone.
 	 */
diff -ruN linux-2.6.18-rc6/net/ipv4/tcp_timer.c linux-pacing/net/ipv4/tcp_timer.c
--- linux-2.6.18-rc6/net/ipv4/tcp_timer.c	2006-09-04 04:19:48.000000000 +0200
+++ linux-pacing/net/ipv4/tcp_timer.c	2006-09-13 09:10:58.000000000 +0200
@@ -19,6 +19,9 @@
  *		Arnt Gulbrandsen, <agulbra@nvg.unit.no>
  *		Jorge Cwik, <jorge@laser.satlink.net>
  */
+/* Changes:
+ * 		Daniele Lacamera, <root@danielinux.net> TCP Pacing algorithm
+ */
 
 #include <linux/module.h>
 #include <net/tcp.h>
@@ -36,10 +39,22 @@
 static void tcp_delack_timer(unsigned long);
 static void tcp_keepalive_timer (unsigned long data);
 
+#ifdef CONFIG_TCP_PACING
+int sysctl_tcp_pacing = 0;
+static void tcp_pacing_timer(unsigned long data);
+#endif
+
 void tcp_init_xmit_timers(struct sock *sk)
 {
 	inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
 				  &tcp_keepalive_timer);
+	
+#ifdef CONFIG_TCP_PACING
+	init_timer(&(tcp_sk(sk)->pacing.timer));
+	tcp_sk(sk)->pacing.timer.function = &tcp_pacing_timer;
+	tcp_sk(sk)->pacing.timer.data = (unsigned long) sk;
+#endif
+
 }
 
 EXPORT_SYMBOL(tcp_init_xmit_timers);
@@ -522,3 +537,117 @@
 	bh_unlock_sock(sk);
 	sock_put(sk);
 }
+
+#ifdef CONFIG_TCP_PACING
+/* Routines for TCP Pacing.
+ *
+ * Amit Aggarwal, Stefan Savage, and Thomas Anderson, "Understanding the Performance of TCP Pacing"
+ * Proc. of the IEEE INFOCOM 2000 Conference on Computer Communications, March 2000, pages 1157 - 1165.
+ *
+ * This is the timer used to spread packets.
+ * a delta value is computed on rtt/cwnd,
+ * and will be our expire interval.
+ */
+static void tcp_pacing_timer(unsigned long data)
+{
+	struct sock *sk = (struct sock*) data;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (!sysctl_tcp_pacing)
+		return;
+
+	bh_lock_sock(sk);
+	if (sock_owned_by_user(sk)) {
+		/* Try again later */
+		if (!mod_timer(&tp->pacing.timer, jiffies + 1))
+			sock_hold(sk);
+		goto out_unlock;
+	}
+
+	if (sk->sk_state == TCP_CLOSE)
+		goto out;
+
+	/* Unlock sending, so when next ack is received it will pass.
+	 * If there are no packets scheduled, do nothing.
+	 */
+	tp->pacing.lock = 0;
+	
+	if (!sk->sk_send_head){
+		/* Sending queue empty */
+		goto out;
+	}
+	
+	/* Handler */
+	tcp_push_pending_frames(sk, tp);
+
+	out:
+	if (tcp_memory_pressure)
+		sk_stream_mem_reclaim(sk);
+
+	out_unlock:
+		bh_unlock_sock(sk);
+		sock_put(sk);
+}
+
+/* 
+ * The timer has to be restarted when a segment is sent out.
+ */
+void __tcp_pacing_reset_timer(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	__u32 timeout = jiffies + tp->pacing.delta;
+
+	if (!mod_timer(&tp->pacing.timer, timeout))
+			sock_hold(sk);
+}
+EXPORT_SYMBOL(__tcp_pacing_reset_timer);
+
+/*
+ * This routine computes tcp_pacing delay, using
+ * a simplified uniform pacing policy.
+ */
+void __tcp_pacing_recalc_delta(struct sock *sk)
+{
+       struct tcp_sock *tp = tcp_sk(sk);
+       __u32 window = (tp->snd_cwnd)<<3;
+       __u32 srtt = tp->srtt;
+       __u32 round = 0;
+       __u32 curmss = tp->mss_cache;
+       int state = inet_csk(sk)->icsk_ca_state;
+
+       if (state == TCP_CA_Recovery && tp->snd_cwnd < tp->snd_ssthresh)
+		window = tp->snd_ssthresh << 3;
+
+       if (tp->snd_wnd/curmss < tp->snd_cwnd)
+		window = (tp->snd_wnd / curmss) << 3;
+
+       if (window>1 && srtt){
+               if (window <= srtt){
+                       tp->pacing.delta = (srtt/window);
+			if (srtt % window)
+				round=((srtt / (srtt % window)) / tp->pacing.delta);
+			if (tp->pacing.count >= (round - 1) && round > 1){
+				tp->pacing.delta++;
+				tp->pacing.count = 0;
+			}
+			tp->pacing.burst = 1;
+		} else {
+			tp->pacing.delta = 1;
+			tp->pacing.burst = (window / srtt);
+			if (window % srtt)
+				round=( (window / (window % srtt)) * tp->pacing.burst);
+			if (tp->pacing.count >= (round - 1) && (round > 1)){
+				tp->pacing.burst++;
+				tp->pacing.count = 0;
+			}
+		}
+	} else {
+		tp->pacing.delta = 0;
+		tp->pacing.burst = 1;
+       }
+}
+
+EXPORT_SYMBOL(__tcp_pacing_recalc_delta);
+
+#endif
+

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-12 21:26 ` Ian McDonald
@ 2006-09-13  8:18   ` Daniele Lacamera
  2006-09-13 15:46     ` Daniele Lacamera
  2006-09-13 18:30     ` Ian McDonald
  0 siblings, 2 replies; 11+ messages in thread
From: Daniele Lacamera @ 2006-09-13  8:18 UTC (permalink / raw)
  To: Ian McDonald; +Cc: netdev, Carlo Caini, Rosario Firrincieli, Giovanni Pau

On Tuesday 12 September 2006 23:26, Ian McDonald wrote:
> Where is the published research? If you are going to mention research
> you need URLs to papers and please put this in source code too so
> people can check.

I added the main reference to the code. I am going to give you all the 
pointers on this research, mainly recent congestion control proposals 
that include pacing.

> I agree with Arnaldo's comments and also would add I don't like having
> to select 1000 as HZ unit. Something is wrong if you need this as I
> can run higher resolution timers without having to do this....

I removed that select in Kconfig, I agree it doesn't make sense at all, 
for portability. However, pacing works with 1ms resolution, so maybe 
a "depends HZ_1000" is still required. (How do you run 1ms timers with 
HZ!=1000?)

Thanks

-- 
Daniele Lacamera
root@danielinux.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-13  8:18   ` Daniele Lacamera
@ 2006-09-13 15:46     ` Daniele Lacamera
  2006-09-16  0:41       ` Xiaoliang (David) Wei
  2006-09-13 18:30     ` Ian McDonald
  1 sibling, 1 reply; 11+ messages in thread
From: Daniele Lacamera @ 2006-09-13 15:46 UTC (permalink / raw)
  To: root, Ian McDonald; +Cc: netdev, Carlo Caini, Rosario Firrincieli, Giovanni Pau

As Ian requested, some of the papers published about Pacing.

* Main reference:
-----------------

Amit Aggarwal, Stefan Savage, and Thomas Anderson.   
"Understanding the Performance of TCP Pacing". 
Proc. of the IEEE INFOCOM 2000 Conference on Computer Communications, 
March 2000, pages 1157 - 1165.

* IETF RFC:
-----------

H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, M.Sooriyabandara, 
"TCP Performance Implications of Network Path Asymmetry",
IETF RFC 3449, December 2002.

* Other works:
--------------

C. Caini, R. Firrincieli, 
"Packet spreading techniques to avoid bursty traffic in satellite TCP 
connections". 
In. Proceedings of IEEE VTC Spring ’04

Q.Ye, M.H. MacGregor, 
"Pacing to Improve SACK TCP Resilience", 
2005 Spring Simulation Multiconference, DASD, pp. 39-45, 2005

Young-Soo Choi; Kong-Won Lee; Tae-Man Han; You-Ze Cho;
"High-speed TCP protocols with pacing for fairness and TCP friendliness"
TENCON 2004. 2004 IEEE Region 10 Conference
Volume C,  21-24 Nov. 2004 Page(s):13 - 16 Vol. 3 

Razdan, A.; Nandan, A.; Wang, R.; Sanadidi, M.Y.; Gerla, M.;
"Enhancing TCP performance in networks with small buffers"
Computer Communications and Networks, 2002. Proceedings. Eleventh 
International Conference on
14-16 Oct. 2002 Page(s):39 - 44 

Moonsoo Kang; Jeonghoon Mo;
"On the Pacing Technique for High Speed TCP Over Optical Burst Switching 
Networks"
Advanced Communication Technology, 2006. ICACT 2006. The 8th 
International Conference
Volume 2, 20-22 Feb. 2006 Page(s):1421 - 1424 

Mark Allman, Ethan Blanton
"Notes on burst mitigation for transport protocols", 
April 2005 ACM SIGCOMM Computer Communication Review, Volume 35 Issue 2 
Publisher: ACM Press

J. Kulik, R. Coulter, D. Rockwell, and C. Partridge, 
"A Simulation Study of Paced TCP", 
BBN Technical Memorandum No. 1218, 1999. 

* Congestion Control proposals that include Pacing:
---------------------------------------------------

G. Marfia, C. Palazzi, G. Pau, M. Gerla, M. Sanadidi and M. Roccetti, 
"TCP Libra: Balancing Flows over Heterogeneous
Propagation Scenarios", submitted for publication in Proceedings of ACM 
SIGMETRICS/Performance 2006. 

Carlo Caini and Rosario Firrincieli, 
"TCP Hybla: a TCP enhancement for heterogeneous networks", 
INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING 2004; 
22:547–566 

D. X. Wei, C. Jin, S. H. Low and S. Hegde.
"FAST TCP: motivation, architecture, algorithms", 
performance IEEE/ACM Transactions on Networking, to appear in 2007 

-- 
Daniele Lacamera
root{at}danielinux.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-13  8:18   ` Daniele Lacamera
  2006-09-13 15:46     ` Daniele Lacamera
@ 2006-09-13 18:30     ` Ian McDonald
  1 sibling, 0 replies; 11+ messages in thread
From: Ian McDonald @ 2006-09-13 18:30 UTC (permalink / raw)
  To: root; +Cc: netdev, Carlo Caini, Rosario Firrincieli, Giovanni Pau

On 9/13/06, Daniele Lacamera <root@danielinux.net> wrote:
> On Tuesday 12 September 2006 23:26, Ian McDonald wrote:
> > Where is the published research? If you are going to mention research
> > you need URLs to papers and please put this in source code too so
> > people can check.
>
> I added the main reference to the code. I am going to give you all the
> pointers on this research, mainly recent congestion control proposals
> that include pacing.

Thanks
>
> > I agree with Arnaldo's comments and also would add I don't like having
> > to select 1000 as HZ unit. Something is wrong if you need this as I
> > can run higher resolution timers without having to do this....
>
> I removed that select in Kconfig, I agree it doesn't make sense at all,
> for portability. However, pacing works with 1ms resolution, so maybe
> a "depends HZ_1000" is still required. (How do you run 1ms timers with
> HZ!=1000?)
>
The HZ refers to time slices per second mostly for user space - e.g.
how often to task switch.
-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-13  8:18   ` Daniele Lacamera
@ 2006-09-14  1:21     ` Stephen Hemminger
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2006-09-14  1:21 UTC (permalink / raw)
  To: root
  Cc: David S. Miller, netdev, Carlo Caini, Rosario Firrincieli,
	Giovanni Pau

On Wed, 13 Sep 2006 10:18:31 +0200
Daniele Lacamera <root@danielinux.net> wrote:

> On Wednesday 13 September 2006 05:41, Stephen Hemminger wrote:
> > Pacing in itself isn't a bad idea, but:
> <cut>
> > * Since it is most useful over long delay links, maybe it should be a 
> route parameter.
>

Look into rtnetlink and how we keep track of route metrics, and
add a new per route state variable. Need to update iproute2 (ip command)
as well.

> What does this mean? Should I move the sysctl switch elsewhere?
> 
> A new (cleaner) patch follows.
> Thanks to you all for your attention & advices.
> 
> Signed-off by: Daniele Lacamera <root@danielinux.net>

You may also want into look into high resolution timer (hrtimer),
the resolution doesn't get finer than HZ without using -rt patches.
But the ktime interface is cleaner than the normal timer math.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-13 15:46     ` Daniele Lacamera
@ 2006-09-16  0:41       ` Xiaoliang (David) Wei
  2006-09-19 11:31         ` Daniele Lacamera
  0 siblings, 1 reply; 11+ messages in thread
From: Xiaoliang (David) Wei @ 2006-09-16  0:41 UTC (permalink / raw)
  To: root; +Cc: Ian McDonald, netdev, Carlo Caini, Rosario Firrincieli,
	Giovanni Pau

On 9/13/06, Daniele Lacamera <root@danielinux.net> wrote:
> As Ian requested, some of the papers published about Pacing.
>

Hi Daniel,
     Thank you very much for the patch and the reference summary. For
the implementation and performance of pacing, I just have a few
suggestion/clarification/support data:

First, in the implementation in the patch, it seems to me that the
paced gap is set to RTT/cwnd in CA_Open state. This might leads to
slower growth of congestion window. See our simulation results at
http://www.cs.caltech.edu/~weixl/technical/ns2pacing/index.html

If this pacing algorithm is used in a network with non-paced flows, it
is very likely to lose its fair share of bandwidth. So, I'd suggest to
use a pacing gap of RTT/max{cwnd+1, min{ssthresh, cwnd*2}} where
max{cwnd+1, min{ssthresh, cwnd*2}} is the expected congestion window
in *next RTT*. As shown in the our simulation results, this
modification will eliminate the slower growth problem.

> * Main reference:
> -----------------
>
> Amit Aggarwal, Stefan Savage, and Thomas Anderson.
> "Understanding the Performance of TCP Pacing".
> Proc. of the IEEE INFOCOM 2000 Conference on Computer Communications,
> March 2000, pages 1157 - 1165.

This main reference (Infocom2000) does not say pacing is always
improving. In fact, it says pacing might have poorer performance, in
term of average throughput, than non-paced flows in many cases. We
have some detailed study on the issue and our understanding are:
1. For loss based congestion control algorithm: if we care fairness
convergence, pacing helps; If we care aggregate/average throughput,
pacing does not help (and usually leads to lower average rate) unless
the bottleneck buffer size is extremely small.  This is due to the
fact that pacing usually introduces high loss synchronization rate in
the paced flows. We have a technical report at
http://www.cs.caltech.edu/~weixl/pacing/sync.pdf.

2. For delay based congestion control algorithm: pacing does always
help to eliminate the noise due to burstiness.

> Carlo Caini and Rosario Firrincieli,
> "TCP Hybla: a TCP enhancement for heterogeneous networks",
> INTERNATIONAL JOURNAL OF SATELLITE COMMUNICATIONS AND NETWORKING 2004;
> 22:547–566

For TCP Hybla, we do have some simulation results to show that Hybla
introduces huge loss in start-up phase, if pacing is not deployed.
(Look for the figures of "hybla" at
http://www.cs.caltech.edu/~weixl/technical/ns2linux/index.html)

Thanks.

-David

-- 
Xiaoliang (David) Wei      Graduate Student, CS@Caltech
http://davidwei.org
***********************************************

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP Pacing
  2006-09-16  0:41       ` Xiaoliang (David) Wei
@ 2006-09-19 11:31         ` Daniele Lacamera
  0 siblings, 0 replies; 11+ messages in thread
From: Daniele Lacamera @ 2006-09-19 11:31 UTC (permalink / raw)
  To: Xiaoliang (David) Wei
  Cc: Ian McDonald, netdev, Carlo Caini, Rosario Firrincieli,
	Giovanni Pau

On Saturday 16 September 2006 02:41, Xiaoliang (David) Wei wrote:
> Hi Daniel,
>      Thank you very much for the patch and the reference summary. For
> the implementation and performance of pacing, I just have a few
> suggestion/clarification/support data:
> 
> First, in the implementation in the patch, it seems to me that the
> paced gap is set to RTT/cwnd in CA_Open state. This might leads to
> slower growth of congestion window. See our simulation results at
> http://www.cs.caltech.edu/~weixl/technical/ns2pacing/index.html

Hi David.
Thank you for having pointed this out. It's very interesting.
Actually, we already knew about delta calculation based on expected 
congestion window. Carlo and Rosario had studied this matter in deep, 
considered different options (VTC04), and came to the conclusion that 
thought rtt/cwnd solution slows down cwnd growth, the difference is not 
very relevant, so we have preferred to implement the most conservative 
one, which is sligthly simpler and fits all the congestion control 
algorithms.

> If this pacing algorithm is used in a network with non-paced flows, it
> is very likely to lose its fair share of bandwidth. So, I'd suggest to
> use a pacing gap of RTT/max{cwnd+1, min{ssthresh, cwnd*2}} where
> max{cwnd+1, min{ssthresh, cwnd*2}} is the expected congestion window
> in *next RTT*. As shown in the our simulation results, this
> modification will eliminate the slower growth problem.

The expected window value depends on the congestion control algorithm, 
the formula you suggests fits newreno increments, while other congstion 
control options may have different cwnd_expected. 
I don't exclude we may have an additional 'plug' in each congestion 
control module for pacing delta calculation, if this makes sense.

> > * Main reference:
> > -----------------
> This main reference (Infocom2000) does not say pacing is always
> improving. In fact, it says pacing might have poorer performance, in
> term of average throughput, than non-paced flows in many cases.

I have proposed to use this as main reference because it gives a general 
description and it is one of the most cited about the argument.

> For TCP Hybla, we do have some simulation results to show that Hybla
> introduces huge loss in start-up phase, if pacing is not deployed.
> (Look for the figures of "hybla" at
> http://www.cs.caltech.edu/~weixl/technical/ns2linux/index.html)

The initial overshoot in Hybla is a known issue. Cwnd increments are 
calculated on RTT, so the longer the RTT, the bigger the initial 
burstiness. 
The way to counteract overshoot is to use both pacing and an initial 
slow-start threshold estimation, like that one suggested in [1]. 
This is what we have been using for all our tests, in simulation (ns-2),  
emulation (linux+nistnet), and satellites. (See [2] and [3]).
As for pacing, I'd like to have bandwidth estimation feature included in 
future versions of hybla module as soon as we can consider it "stable".

HAND.

-- 
Daniele

[1] J. Hoe, "Improving the Start-up Behavior of a Congestion Control 
Scheme for TCP", ACM Sigcomm, Aug. 1996.

[2] C. Caini, R. Firrincieli and D. Lacamera, "TCP Performance 
Evaluation: Methodologies and Applications", SPECTS 2005, Philadelphia, 
July 2005.

[3] C. Caini, R. Firrincieli and D. Lacamera, "A Linux Based Multi TCP 
Implementation for Experimental Evaluation of TCP Enhancements", SPECTS 
2005, Philadelphia, July 2005.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-09-19 11:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-12 17:58 TCP Pacing Daniele Lacamera
2006-09-12 18:21 ` Arnaldo Carvalho de Melo
2006-09-12 21:26 ` Ian McDonald
2006-09-13  8:18   ` Daniele Lacamera
2006-09-13 15:46     ` Daniele Lacamera
2006-09-16  0:41       ` Xiaoliang (David) Wei
2006-09-19 11:31         ` Daniele Lacamera
2006-09-13 18:30     ` Ian McDonald
2006-09-13  3:41 ` Stephen Hemminger
2006-09-13  8:18   ` Daniele Lacamera
2006-09-14  1:21     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).