public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: add setsockopt to disable slow start after idle
@ 2010-04-10  1:30 Cristian KLEIN
  2010-04-10  5:13 ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Cristian KLEIN @ 2010-04-10  1:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cristian KLEIN

Allows user-space to override the sysctl
net.ipv4.tcp_slow_start_after_idle, on a per-socket bases, using
setsockopt().

Slow start after idle can harm some scientific applications which
interleave computation and communication. Assume we have an iterative
applications, each iteration consisting of a computation and a
communication phase. If the computation phase takes long enough (i.e.
more that 2*RTT), the communication phase will always slow start and
might never reach the wire speed.

This patch allows each application to disable slow start after idle,
just like we allow delay-sensitive applications (e.g. telnet, SSH) to
disable NAGLE.

Signed-off-by: Cristian KLEIN <cristiklein@gmail.com>
---
 include/linux/tcp.h      |    4 +++-
 net/ipv4/tcp.c           |   10 ++++++++++
 net/ipv4/tcp_ipv4.c      |    1 +
 net/ipv4/tcp_minisocks.c |    1 +
 net/ipv4/tcp_output.c    |    4 ++--
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..132aab0 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -105,6 +105,7 @@ enum {
 #define TCP_COOKIE_TRANSACTIONS	15	/* TCP Cookie Transactions */
 #define TCP_THIN_LINEAR_TIMEOUTS 16      /* Use linear timeouts for thin streams*/
 #define TCP_THIN_DUPACK         17      /* Fast retrans. after 1 dupack */
+#define TCP_SLOW_START_AFTER_IDLE 18    /* Slow start after transmission idle */
 
 /* for TCP_INFO socket option */
 #define TCPI_OPT_TIMESTAMPS	1
@@ -345,7 +346,8 @@ struct tcp_sock {
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
 		thin_dupack : 1,/* Fast retransmit on first dupack      */
-		unused      : 2;
+		slow_start_after_idle : 1,/* Slow start after transmission idle */
+		unused      : 1;
 
 /* RTT measurement */
 	u32	srtt;		/* smoothed round trip time << 3	*/
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6afb6d8..3cf3863 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2252,6 +2252,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		}
 		break;
 
+	case TCP_SLOW_START_AFTER_IDLE:
+		if (val)
+			tp->slow_start_after_idle = 1;
+		else
+			tp->slow_start_after_idle = 0;
+		break;
+
 	case TCP_THIN_LINEAR_TIMEOUTS:
 		if (val < 0 || val > 1)
 			err = -EINVAL;
@@ -2497,6 +2504,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_NODELAY:
 		val = !!(tp->nonagle&TCP_NAGLE_OFF);
 		break;
+	case TCP_SLOW_START_AFTER_IDLE:
+		val = tp->slow_start_after_idle;
+		break;
 	case TCP_CORK:
 		val = !!(tp->nonagle&TCP_NAGLE_CORK);
 		break;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f4df5f9..1380902 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1449,6 +1449,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		newtp->advmss = tcp_sk(sk)->rx_opt.user_mss;
 
 	tcp_initialize_rcv_mss(newsk);
+	newtp->slow_start_after_idle = sysctl_tcp_slow_start_after_idle;
 
 #ifdef CONFIG_TCP_MD5SIG
 	/* Copy over the MD5 key from the original socket */
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 4199bc6..1e6f0bb 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -495,6 +495,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 			newtp->rx_opt.ts_recent_stamp = 0;
 			newtp->tcp_header_len = sizeof(struct tcphdr);
 		}
+		newtp->slow_start_after_idle = sysctl_tcp_slow_start_after_idle;
 #ifdef CONFIG_TCP_MD5SIG
 		newtp->md5sig_info = NULL;	/*XXX*/
 		if (newtp->af_specific->md5_lookup(sk, newsk))
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f181b78..175d499 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -154,7 +154,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const u32 now = tcp_time_stamp;
 
-	if (sysctl_tcp_slow_start_after_idle &&
+	if (tp->slow_start_after_idle &&
 	    (!tp->packets_out && (s32)(now - tp->lsndtime) > icsk->icsk_rto))
 		tcp_cwnd_restart(sk, __sk_dst_get(sk));
 
@@ -1279,7 +1279,7 @@ static void tcp_cwnd_validate(struct sock *sk)
 		if (tp->packets_out > tp->snd_cwnd_used)
 			tp->snd_cwnd_used = tp->packets_out;
 
-		if (sysctl_tcp_slow_start_after_idle &&
+		if (tp->slow_start_after_idle &&
 		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto)
 			tcp_cwnd_application_limited(sk);
 	}
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10  1:30 [PATCH] tcp: add setsockopt to disable slow start after idle Cristian KLEIN
@ 2010-04-10  5:13 ` David Miller
  2010-04-10  5:15   ` David Miller
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Miller @ 2010-04-10  5:13 UTC (permalink / raw)
  To: cristiklein; +Cc: linux-kernel

From: Cristian KLEIN <cristiklein@gmail.com>
Date: Sat, 10 Apr 2010 03:30:15 +0200

> Allows user-space to override the sysctl
> net.ipv4.tcp_slow_start_after_idle, on a per-socket bases, using
> setsockopt().
> 
> Slow start after idle can harm some scientific applications which
> interleave computation and communication. Assume we have an iterative
> applications, each iteration consisting of a computation and a
> communication phase. If the computation phase takes long enough (i.e.
> more that 2*RTT), the communication phase will always slow start and
> might never reach the wire speed.
> 
> This patch allows each application to disable slow start after idle,
> just like we allow delay-sensitive applications (e.g. telnet, SSH) to
> disable NAGLE.
> 
> Signed-off-by: Cristian KLEIN <cristiklein@gmail.com>

We specifically did not add a socket option for this facility.

It is a very dangerous option to enable, and depends deeply
upon the characteristics of your network and the paths by
which remote hosts are reached.

Therefore, only the system administrator can determine whether it is
safe to enable this, and that's why it can only be changed via sysctl.
Lettting arbitrary applications change this aspect of TCP is beyond
dangerous.

I will not be applying this patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10  5:13 ` David Miller
@ 2010-04-10  5:15   ` David Miller
  2010-04-10 12:09   ` Cristian KLEIN
  2010-04-11 20:48   ` Andi Kleen
  2 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-04-10  5:15 UTC (permalink / raw)
  To: cristiklein; +Cc: linux-kernel


And BTW you should CC: netdev@vger.kernel.org for all networking
patches.  Sending it only to linux-kernel makes it not reach
most of the networking developers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10  5:13 ` David Miller
  2010-04-10  5:15   ` David Miller
@ 2010-04-10 12:09   ` Cristian KLEIN
  2010-04-10 22:47     ` David Miller
  2010-04-11 20:48   ` Andi Kleen
  2 siblings, 1 reply; 9+ messages in thread
From: Cristian KLEIN @ 2010-04-10 12:09 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, netdev

On 10/04/2010 07:13, David Miller wrote:
> From: Cristian KLEIN<cristiklein@gmail.com>
> Date: Sat, 10 Apr 2010 03:30:15 +0200
>
>> Allows user-space to override the sysctl
>> net.ipv4.tcp_slow_start_after_idle, on a per-socket bases, using
>> setsockopt().
>>
>> Slow start after idle can harm some scientific applications which
>> interleave computation and communication. Assume we have an iterative
>> applications, each iteration consisting of a computation and a
>> communication phase. If the computation phase takes long enough (i.e.
>> more that 2*RTT), the communication phase will always slow start and
>> might never reach the wire speed.
>>
>> This patch allows each application to disable slow start after idle,
>> just like we allow delay-sensitive applications (e.g. telnet, SSH) to
>> disable NAGLE.
>>
>> Signed-off-by: Cristian KLEIN<cristiklein@gmail.com>
>
> We specifically did not add a socket option for this facility.
>
> It is a very dangerous option to enable, and depends deeply
> upon the characteristics of your network and the paths by
> which remote hosts are reached.
>
> Therefore, only the system administrator can determine whether it is
> safe to enable this, and that's why it can only be changed via sysctl.
> Lettting arbitrary applications change this aspect of TCP is beyond
> dangerous.
>
> I will not be applying this patch.

Could you please explain me why it is dangerous? To me it seems that 
it's just like allowing applications to disable NAGLE or to choose a 
congestion control algorithm.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10 12:09   ` Cristian KLEIN
@ 2010-04-10 22:47     ` David Miller
  2010-04-11 18:45       ` Cristian KLEIN
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-04-10 22:47 UTC (permalink / raw)
  To: cristiklein; +Cc: linux-kernel, netdev

From: Cristian KLEIN <cristiklein@gmail.com>
Date: Sat, 10 Apr 2010 14:09:03 +0200

> Could you please explain me why it is dangerous? To me it seems that
> it's just like allowing applications to disable NAGLE or to choose a
> congestion control algorithm.

Because you can cause undue congestion to other people on the network
because you are believing path information that has been outdated and
has not been validated by sending data for a certain amount of time.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10 22:47     ` David Miller
@ 2010-04-11 18:45       ` Cristian KLEIN
  2010-04-11 20:18         ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Cristian KLEIN @ 2010-04-11 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, netdev

On 11/04/2010 00:47, David Miller wrote:
> From: Cristian KLEIN<cristiklein@gmail.com>
> Date: Sat, 10 Apr 2010 14:09:03 +0200
>
>> Could you please explain me why it is dangerous? To me it seems that
>> it's just like allowing applications to disable NAGLE or to choose a
>> congestion control algorithm.
>
> Because you can cause undue congestion to other people on the network
> because you are believing path information that has been outdated and
> has not been validated by sending data for a certain amount of time.

I consider your argument an important concern, but I'm not quite 
convinced this patch is so bad.

An application which does not need this behaviour will continue to slow 
start after idle by default.

Without this patch, an application which needs this behaviour (i.e. not 
to slow start after idle) is forced to implement its own UDP-based 
protocol with all the congestion control, retransmission etc. Undue 
congestion might still occur.


If you don't agree with the above two points, would you consider 
accepting a patch with an allow_user_fast_start_after_idle sysctl?

Cristi.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-11 18:45       ` Cristian KLEIN
@ 2010-04-11 20:18         ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-04-11 20:18 UTC (permalink / raw)
  To: cristiklein; +Cc: linux-kernel, netdev

From: Cristian KLEIN <cristiklein@gmail.com>
Date: Sun, 11 Apr 2010 20:45:10 +0200

> Without this patch, an application which needs this behaviour
> (i.e. not to slow start after idle) is forced to implement its own
> UDP-based protocol with all the congestion control, retransmission
> etc. Undue congestion might still occur.

Ask your system administrator to set the existing sysctl, because it
is a physical network attribute whether this behavior is safe or not.

And if it is safe, it is safe for all applications, there is no reason
for one application to ask for it and others to not.  If it's legal,
it helps all applications without exception.

Your attempts to tie this to NAGLE is complete nonsense.

NAGLE changes acking behavior, whereas this feature controls in what
way we trust congestion control information we've probed for in the
past.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-10  5:13 ` David Miller
  2010-04-10  5:15   ` David Miller
  2010-04-10 12:09   ` Cristian KLEIN
@ 2010-04-11 20:48   ` Andi Kleen
  2010-04-11 21:41     ` David Miller
  2 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2010-04-11 20:48 UTC (permalink / raw)
  To: David Miller; +Cc: cristiklein, linux-kernel

David Miller <davem@davemloft.net> writes:
>
> It is a very dangerous option to enable, and depends deeply
> upon the characteristics of your network and the paths by
> which remote hosts are reached.
>
> Therefore, only the system administrator can determine whether it is
> safe to enable this, and that's why it can only be changed via sysctl.
> Lettting arbitrary applications change this aspect of TCP is beyond
> dangerous.
>
> I will not be applying this patch.

It should be safe as long as you don't have any packet loss (which
means the network can take it). Or I am missing something?

So perhaps allow it, but force disable it on the first retransmit?

There could be still some over subscription in the first window,
but hopefully not too bad.

Ok it could be still gamed by opening lots of sockets, but that
problem is there anyways.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tcp: add setsockopt to disable slow start after idle
  2010-04-11 20:48   ` Andi Kleen
@ 2010-04-11 21:41     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-04-11 21:41 UTC (permalink / raw)
  To: andi; +Cc: cristiklein, linux-kernel

From: Andi Kleen <andi@firstfloor.org>
Date: Sun, 11 Apr 2010 22:48:19 +0200

> It should be safe as long as you don't have any packet loss (which
> means the network can take it). Or I am missing something?
> 
> So perhaps allow it, but force disable it on the first retransmit?

Unless you can detect packet loss or queueing delay incurred by
everyone else on the internet, this won't work.

Look, just drop this discussion, people simply don't understand the
implications of what they are suggesting.

The only thing I am willing to consider is a per-route attribute for
this setting, because that would be another legitimate to make this
setting.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-04-11 21:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-10  1:30 [PATCH] tcp: add setsockopt to disable slow start after idle Cristian KLEIN
2010-04-10  5:13 ` David Miller
2010-04-10  5:15   ` David Miller
2010-04-10 12:09   ` Cristian KLEIN
2010-04-10 22:47     ` David Miller
2010-04-11 18:45       ` Cristian KLEIN
2010-04-11 20:18         ` David Miller
2010-04-11 20:48   ` Andi Kleen
2010-04-11 21:41     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox