netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-30 10:48 apetlund
  2009-10-30 17:33 ` Rick Jones
  0 siblings, 1 reply; 22+ messages in thread
From: apetlund @ 2009-10-30 10:48 UTC (permalink / raw)
  To: Rick Jones
  Cc: Andreas Petlund, Ilpo Järvinen, Arnd Hannemann, Eric Dumazet,
	Netdev, LKML, shemminger, David Miller

> Just how thin can a thin stream be when a thin stream is found thin? (to
the
> cadence of "How much wood could a woodchuck chuck if a woodchuck could
chuck wood?")
>
> Does a stream get so thin that a user's send could not be split into
four,
> sub-MSS TCP segments?

That was a nifty idea: Anti-Nagle the segments to be able to trigger fast
retransmissions. I think it is possible.

Besides using more resources on each send, this scheme will introduce the
need to delay parts of the segment, which is undesirable for
time-dependent applications (the intended target of the mechanisms).

I think it would be fun to implement and play around with such a mechanism
to see the effects.

Regards,
Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-30 15:27 apetlund
  0 siblings, 0 replies; 22+ messages in thread
From: apetlund @ 2009-10-30 15:27 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Andreas Petlund, Eric Dumazet, Arnd Hannemann, Netdev, LKML,
	shemminger, David Miller

> On Thu, 29 Oct 2009, apetlund@simula.no wrote:
>
>> > Andreas Petlund a écrit :
>> >
>> >> The removal of exponential backoff on a general basis has been
investigated and discussed already, for instance here:
>> >> http://ccr.sigcomm.org/online/?q=node/416
>> >> Such steps are, however considered drastic, and I agree that caution
>> must be made to thoroughly investigate the effects of such changes. The
changes introduced by the proposed patches, however, are not
>> default
>> >> behaviour, but an option for applications that suffer from the
thin-stream TCP increased retransmission latencies. They will, as
>> such,
>> not affect all streams. In addition, the changes will only be active for
>> >> streams which are perpetually thin or in the early phase of
expanding
>> their cwnd. Also, experiments performed on congested bottlenecks with
tail-drop queues show very little (if any at all) effect on goodput for
the modified scenario compared to a scenario with unmodified TCP
streams.
>> >> Graphs both for latency-results and fairness tests can be found
here:
>> http://folk.uio.no/apetlund/lktmp/
>> >
>> > There should be a limit to linear timeouts, to say ... no more than 6
>> retransmits
>> > (eventually tunable), then switch to exponential backoff. Maybe your
>> patch
>> > already implement such heuristic ?
>> The limitation you suggest to the linear timeouts makes very good
sense.
>> Our experiments performed on the Internet indicate that it is extremely
rare that more than 6 retransmissions are needed to recover. It is not
included in the current patch, so I will include this in the next
iteration.
>
> I've heard that BSD would use linear for first three and then
exponential
> but this is based on some gossip (which could well turn out to be a
myth)
> rather than checking it out myself. But if it is true, it certainly
hasn't
> been that devastating.
>> > True link collapses do happen, it would be good if not all streams
>> wakeup
>> > in the same
>> > second and make recovery very slow.
>> >
>> Each stream will have its own schedule for wakeup, so such events will
still be subject to coincidence. The timer granularity of the TCP
wakeup
>> timer will also influence how many streams will wake at the same time. The
>> experiments we have performed on severely congested bottlenecks (link
above) indicate that the modifications will not create a large negative
effect. In fact, when goodput is drastically reduced due to severe
overload, regular TCP and the LT and dupACK modifications seem to
perform
>> nearly identically. Other scenarios may exist where different effects can
>> be observed, and I am open to suggestions for further testing.
>
> Could you point out where exactly where the goodput results? ...I only
seem to find latency results which is not exactly the same. I don't
except
> some that is in order of what Nagle talks (32kbps -> 40bps irc) but
10-50%
> goodput reduction over a relatively short period of time (until RTTs top
RTOs once again preventing spurious RTOs and thus also segment
duplication
> due to retransmissions ceases).

The plot can be found here:
http://folk.uio.no/apetlund/lktmp/n-vs-n-fairness.pdf
I'm sorry that I didn't explain at once, as the parameters and setup is
not obvious. The boxplot shows aggregate throughput of all the unmodified,
greedy TCP New Reno streams when competing with thin streams using TCP New
Reno, linear timeouts, modified dupACK, RDB (which is not included  this
patch set) and the combination of all the modifications. The streams
compete for a 1Mbps bottleneck that use tc with a tail-dropping queue to
limit bandwidth and netem to create loss and delay.
The RTT for the test is 100ms and the packet interarrival time for the
thin streams are 85ms.

> Were these results obtained with Linux, and if so what was FRTO set to?

The results are from our Linux implementation of the mechanisms. FRTO was
disabled and Nagle was disabled for all test sets.

>> > Thats too easy to accept possibly dangerous features with the excuse
>> of
>> saying
>> > "It wont be used very much", because you cannot predict the future.
>> I agree that it is no argument to say that it won't be used much; indeed,
>> my hope is that it will be used much. However, our experiments indicate no
>> negative effects while showing a large improvement on retransmission
latency for the scenario in question. I therefore think that the option
for such an improvement should be made available for time-dependent
thin-stream applications.
>
> Everyone can right away tell that most RTOs are not due to extreme
congestion, so some linear back off seems sensible when dupACK feedback
is lacking for some reason. Of course it is a tradeoff as there's that
chance for getting 1/(n+1) goodput only (where n is the number of linear
steps) step if RTOs were spurious (and without FRTO even more
unnecessary
> retransmission will be triggered so in fact even could be slightly worse
in theory). But that to happen in the first place requires of course
this
> RTT > RTO situation which is hard to see to be a persisting state.

Actually, we have found the low number of packets in flight to be a
persisting state in a large amount of applications that are interactive or
time-dependent. Some examples can be found in the table linked to below:

http://folk.uio.no/apetlund/lktmp/thin_apps_table.pdf

It seems that human interaction, sensor networks, and several other
scenarios that are not inherently greedy will produce a steady trickle of
data segments that fall into the "thin-stream" category and stays there.

Regards,
Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-29 16:54 apetlund
  0 siblings, 0 replies; 22+ messages in thread
From: apetlund @ 2009-10-29 16:54 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Andreas Petlund, Eric Dumazet, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, shemminger@vyatta.com,
	ilpo.jarvinen@helsinki.fi, davem@davemloft.net

> Andreas Petlund schrieb:
>> We have found no noticeable degradation of the goodput in a series of
experiments we have performed in order to map the effects of the
modifications. Furthermore, the modifications implemented in the
patches
>> are explicitly enabled only for applications where the developer knows
that streams will be thin, thus only a small subset of the streams will
apply the modifications.
>> Graphs presenting results from experiments performed to analyse latency
and fairness issues can be found here:
>> http://folk.uio.no/apetlund/lktmp/
>
> How often did you hit consecutive RTOs in these measurements?
> As I see you did a measurement with 512 thick vs. 512 thin streams. Lets
do a hypothetical calculation with only 512 "thin" streams. Lets further
assume the rtt is low, so that RTO is around 200ms. Assume each segment
has 128 Bytes (already very small...).
> Assume after a period of normal operation all streams are in
> timeout-based loss recovery. (e.g. because destination endpoint
> suddenly behaves like a black hole)
> As all streams are in timeout-based loss recovery, each stream
> will transmit 5 segments each second with your modification.
> This would result in a throughput around 512*5*1024bit = 2560 kbit/s and
a goodput of 0 kbit/s (because the receiver is a black hole). So you can
easily saturate a 2 MBit/s link, only with retransmissions.

I have not yet performed experiments where the receiver becomes a black
hole, but I recognise the problem. Eric Dumazet suggested that the
mechanism switch to exponential backoff after 6 linear retries. This would
avoid situation where the link stays congested indefinitely, and I will
implement this in the next iteration.

> Unfortunately in Germany an ADSL uplink of 786 kbit/s is still quite
common, and its already called "broadband"...

I believe that a subscriber for such an uplink would not keep several
hundred thin-stream connections, though accidents do happen.

> Regarding the "small subset", why have a global sysctl option, then? And
I think "tcp_stream_is_thin(tp)" will be true for every flow in the RTO
case, at least for consecutive RTOs.

The sysctl is ment for cases of proprietary code that will benefit from
the modifications. In our experiments, we have found it useful in many
cases for such applications (like game clients).

Regards,
Andreas





^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-29 15:43 apetlund
  2009-10-29 15:50 ` Eric Dumazet
  2009-10-29 20:52 ` Ilpo Järvinen
  0 siblings, 2 replies; 22+ messages in thread
From: apetlund @ 2009-10-29 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andreas Petlund, Ilpo Järvinen, Arnd Hannemann, Netdev, LKML,
	shemminger, David Miller

> Andreas Petlund a écrit :
>
>> The removal of exponential backoff on a general basis has been
>> investigated and discussed already, for instance here:
>> http://ccr.sigcomm.org/online/?q=node/416
>> Such steps are, however considered drastic, and I agree that caution
must be made to thoroughly investigate the effects of such changes. The
changes introduced by the proposed patches, however, are not
default
>> behaviour, but an option for applications that suffer from the
>> thin-stream TCP increased retransmission latencies. They will, as such,
not affect all streams. In addition, the changes will only be active
for
>> streams which are perpetually thin or in the early phase of expanding
their cwnd. Also, experiments performed on congested bottlenecks with
tail-drop queues show very little (if any at all) effect on goodput for
the modified scenario compared to a scenario with unmodified TCP
streams.
>> Graphs both for latency-results and fairness tests can be found here:
http://folk.uio.no/apetlund/lktmp/
>
> There should be a limit to linear timeouts, to say ... no more than 6
retransmits
> (eventually tunable), then switch to exponential backoff. Maybe your
patch
> already implement such heuristic ?
>

The limitation you suggest to the linear timeouts makes very good sense.
Our experiments performed on the Internet indicate that it is extremely
rare that more than 6 retransmissions are needed to recover. It is not
included in the current patch, so I will include this in the next
iteration.

> True link collapses do happen, it would be good if not all streams
wakeup
> in the same
> second and make recovery very slow.
>

Each stream will have its own schedule for wakeup, so such events will
still be subject to coincidence. The timer granularity of the TCP wakeup
timer will also influence how many streams will wake at the same time. The
experiments we have performed on severely congested bottlenecks (link
above) indicate that the modifications will not create a large negative
effect. In fact, when goodput is drastically reduced due to severe
overload, regular TCP and the LT and dupACK modifications seem to perform
nearly identically. Other scenarios may exist where different effects can
be observed, and I am open to suggestions for further testing.

> Thats too easy to accept possibly dangerous features with the excuse of
saying
> "It wont be used very much", because you cannot predict the future.

I agree that it is no argument to say that it won't be used much; indeed,
my hope is that it will be used much. However, our experiments indicate no
negative effects while showing a large improvement on retransmission
latency for the scenario in question. I therefore think that the option
for such an improvement should be made available for time-dependent
thin-stream applications.

-AP

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-29 15:19 apetlund
  0 siblings, 0 replies; 22+ messages in thread
From: apetlund @ 2009-10-29 15:19 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Eric Dumazet, Andreas Petlund, netdev, linux-kernel, shemminger,
	ilpo.jarvinen, davem

I apologise that some of you received this mail more than once. My email
client played a HTML-trick on me.

> Eric Dumazet schrieb:
>> Andreas Petlund a écrit :
>>> This patch will make TCP use only linear timeouts if the stream is
thin. This will help to avoid the very high latencies that thin stream
suffer because of exponential backoff. This mechanism is only active
if
>>> enabled by iocontrol or syscontrol and the stream is identified as thin.
>> Wont this reduce the session timeout to something very small, ie 15
retransmits, way under the minute ?
>
> The session timeout no longer depends on the actual number of
retransmits.
> Instead its a time interval,
> which is roughly equivalent to the time a TCP, performing exponential
backoff would need to perform
> 15 retransmits.
>
> However, addressing the proposal:
> I wonder how one can seriously suggest to just skip congestion response
during timeout-based
> loss recovery? I believe that in a heavily congested scenarios, this
would
> lead to a goodput
> goodput disaster... Not to mention that in a heavily congested scenario,
suddenly every flow
> will become "thin", so this will even amplify the problems. Or did I
miss
> something?

We have found no noticeable degradation of the goodput in a series of
experiments we have performed in order to map the effects of the
modifications. Furthermore, the modifications implemented in the patches
are explicitly enabled only for applications where the developer knows
that streams will be thin, thus only a small subset of the streams will
apply the modifications.

Graphs presenting results from experiments performed to analyse latency
and fairness issues can be found here:
http://folk.uio.no/apetlund/lktmp/

-AP

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-29 15:14 apetlund
  0 siblings, 0 replies; 22+ messages in thread
From: apetlund @ 2009-10-29 15:14 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Andreas Petlund, Netdev, LKML, shemminger, David Miller

I apologise that some of you received this mail more than once. My email
client played a HTML-trick on me.

>> +		icsk->icsk_backoff = 0;
>> +		icsk->icsk_rto = min(((tp->srtt >> 3) + tp->rttvar), TCP_RTO_MAX);
>
> The first part is nowadays done with __tcp_set_rto(tp).
>
> --
>  i.
>

I will address this in the next iteration of the patch.

-AP

^ permalink raw reply	[flat|nested] 22+ messages in thread
* [PATCH 2/3] net: TCP thin linear timeouts
@ 2009-10-27 16:31 Andreas Petlund
  2009-10-27 16:56 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Andreas Petlund @ 2009-10-27 16:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, shemminger, ilpo.jarvinen, davem

This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.


Signed-off-by: Andreas Petlund <apetlund@simula.no>
---
 include/linux/tcp.h        |    3 +++
 include/net/tcp.h          |    1 +
 net/ipv4/sysctl_net_ipv4.c |    8 ++++++++
 net/ipv4/tcp.c             |    5 +++++
 net/ipv4/tcp_timer.c       |   17 ++++++++++++++++-
 5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 61723a7..e64368d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -96,6 +96,7 @@ enum {
 #define TCP_QUICKACK		12	/* Block/reenable quick acks */
 #define TCP_CONGESTION		13	/* Congestion control algorithm */
 #define TCP_MD5SIG		14	/* TCP MD5 Signature (RFC2385) */
+#define TCP_THIN_RM_EXPB        15      /* Remove exp. backoff for thin streams*/
 
 #define TCPI_OPT_TIMESTAMPS	1
 #define TCPI_OPT_SACK		2
@@ -299,6 +300,8 @@ struct tcp_sock {
 	u16	advmss;		/* Advertised MSS			*/
 	u8	frto_counter;	/* Number of new acks after RTO */
 	u8	nonagle;	/* Disable Nagle algorithm?             */
+	u8      thin_rm_expb:1, /* Remove exp. backoff for thin streams */
+		thin_undef : 7;
 
 /* RTT measurement */
 	u32	srtt;		/* smoothed round trip time << 3	*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7c4482f..412c1bd 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -237,6 +237,7 @@ extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
 extern int sysctl_tcp_max_ssthresh;
+extern int sysctl_tcp_force_thin_rm_expb;
 
 extern atomic_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 2dcf04d..7458f37 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -713,6 +713,14 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "tcp_force_thin_rm_expb",
+		.data           = &sysctl_tcp_force_thin_rm_expb,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec
+	},
+	{
 		.ctl_name	= CTL_UNNUMBERED,
 		.procname	= "udp_mem",
 		.data		= &sysctl_udp_mem,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 90b2e06..b4b0931 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2134,6 +2134,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		}
 		break;
 
+	case TCP_THIN_RM_EXPB:
+		if (val)
+			tp->thin_rm_expb = 1;
+		break;
+
 	case TCP_CORK:
 		/* When set indicates to always queue non-full frames.
 		 * Later the user clears this option and we transmit
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index cdb2ca7..24d6dc3 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
 int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
 int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
 int sysctl_tcp_orphan_retries __read_mostly;
+int sysctl_tcp_force_thin_rm_expb __read_mostly;
 
 static void tcp_write_timer(unsigned long);
 static void tcp_delack_timer(unsigned long);
@@ -386,7 +387,21 @@ void tcp_retransmit_timer(struct sock *sk)
 	icsk->icsk_retransmits++;
 
 out_reset_timer:
-	icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+	if ((tp->thin_rm_expb || sysctl_tcp_force_thin_rm_expb) &&
+	    tcp_stream_is_thin(tp) && sk->sk_state == TCP_ESTABLISHED) {
+		/* If stream is thin, remove exponential backoff.
+		 * Since 'icsk_backoff' is used to reset timer, set to 0
+		 * Recalculate 'icsk_rto' as this might be increased if
+		 * stream oscillates between thin and thick, thus the old
+		 * value might already be too high compared to the value
+		 * set by 'tcp_set_rto' in tcp_input.c which resets the
+		 * rto without backoff. */
+		icsk->icsk_backoff = 0;
+		icsk->icsk_rto = min(((tp->srtt >> 3) + tp->rttvar), TCP_RTO_MAX);
+	} else {
+		/* Use normal backoff */
+		icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+	}
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
 	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
 		__sk_dst_reset(sk);
-- 
1.6.0.4



^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-11-05 13:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-30 10:48 [PATCH 2/3] net: TCP thin linear timeouts apetlund
2009-10-30 17:33 ` Rick Jones
2009-10-30 18:11   ` William Allen Simpson
2009-11-05 13:37     ` Andreas Petlund
  -- strict thread matches above, loose matches on Subject: below --
2009-10-30 15:27 apetlund
2009-10-29 16:54 apetlund
2009-10-29 15:43 apetlund
2009-10-29 15:50 ` Eric Dumazet
2009-10-29 20:52 ` Ilpo Järvinen
2009-10-29 15:19 apetlund
2009-10-29 15:14 apetlund
2009-10-27 16:31 Andreas Petlund
2009-10-27 16:56 ` Eric Dumazet
2009-10-28 12:58   ` Arnd Hannemann
2009-10-28 14:31     ` Ilpo Järvinen
2009-10-29 13:51       ` Andreas Petlund
2009-10-29 14:24         ` Eric Dumazet
2009-10-29 17:01         ` Rick Jones
     [not found]     ` <07CD1135-C68B-4264-8CD3-C4BC0400FDA2@simula.no>
2009-10-29 16:11       ` Arnd Hannemann
2009-10-28  3:20 ` William Allen Simpson
2009-10-29 13:50   ` Andreas Petlund
2009-10-28 14:18 ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).