* [PATCH] make _minimum_ TCP retransmission timeout configurable
@ 2007-08-29 20:52 Rick Jones
2007-08-29 21:13 ` Eric Dumazet
2007-08-29 21:32 ` Ian McDonald
0 siblings, 2 replies; 30+ messages in thread
From: Rick Jones @ 2007-08-29 20:52 UTC (permalink / raw)
To: netdev
Enable configuration of the minimum TCP Retransmission Timeout via
a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
have quite variable RTTs avoid spurrious RTOs.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Lamont Jones <lamont@hp.com>
---
diff -r 1559df81a153 Documentation/networking/ip-sysctl.txt
--- a/Documentation/networking/ip-sysctl.txt Mon Aug 13 05:00:33 2007 +0000
+++ b/Documentation/networking/ip-sysctl.txt Wed Aug 22 10:42:55 2007 -0700
@@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de
selected receiver buffers for TCP socket. This value does not override
net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
Default: 87380*2 bytes.
+
+tcp_rto_min - INTEGER
+ The minimum value for the TCP Retransmission Timeout, expressed
+ in milliseconds for the convenience of the user.
+ This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at
+ the high-end.
+ Default: 200.
tcp_sack - BOOLEAN
Enable select acknowledgments (SACKS).
diff -r 1559df81a153 include/linux/sysctl.h
--- a/include/linux/sysctl.h Mon Aug 13 05:00:33 2007 +0000
+++ b/include/linux/sysctl.h Wed Aug 22 10:42:55 2007 -0700
@@ -441,6 +441,7 @@ enum
NET_TCP_ALLOWED_CONG_CONTROL=123,
NET_TCP_MAX_SSTHRESH=124,
NET_TCP_FRTO_RESPONSE=125,
+ NET_TCP_RTO_MIN=126,
};
enum {
diff -r 1559df81a153 include/net/tcp.h
--- a/include/net/tcp.h Mon Aug 13 05:00:33 2007 +0000
+++ b/include/net/tcp.h Wed Aug 22 10:42:55 2007 -0700
@@ -232,6 +232,7 @@ extern int sysctl_tcp_workaround_signed_
extern int sysctl_tcp_workaround_signed_windows;
extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
+extern unsigned int sysctl_tcp_rto_min;
extern atomic_t tcp_memory_allocated;
extern atomic_t tcp_sockets_allocated;
diff -r 1559df81a153 net/ipv4/sysctl_net_ipv4.c
--- a/net/ipv4/sysctl_net_ipv4.c Mon Aug 13 05:00:33 2007 +0000
+++ b/net/ipv4/sysctl_net_ipv4.c Wed Aug 22 10:42:55 2007 -0700
@@ -186,6 +186,36 @@ static int strategy_allowed_congestion_c
}
+/* if there is ever a proc_dointvec_ms_jiffies_minmax we can get rid
+ of this routine */
+
+static int proc_tcp_rto_min(ctl_table *ctl, int write, struct file *filp,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int *valp = ctl->data;
+ int oldval = *valp;
+ int ret;
+
+ ret = proc_dointvec_ms_jiffies(ctl, write, filp, buffer, lenp, ppos);
+ if (ret)
+ return ret;
+
+ /* some bounds checking would be in order */
+ if (write && *valp != oldval) {
+ if (*valp < (int)TCP_RTO_MIN) {
+ *valp = oldval;
+ return -EINVAL;
+ }
+ if (*valp > (int)TCP_RTO_MAX) {
+ *valp = oldval;
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+
ctl_table ipv4_table[] = {
{
.ctl_name = NET_IPV4_TCP_TIMESTAMPS,
@@ -819,6 +849,14 @@ ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = NET_TCP_RTO_MIN,
+ .procname = "tcp_rto_min",
+ .data = &sysctl_tcp_rto_min,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_tcp_rto_min
+ },
{ .ctl_name = 0 }
};
diff -r 1559df81a153 net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c Mon Aug 13 05:00:33 2007 +0000
+++ b/net/ipv4/tcp_input.c Wed Aug 22 10:42:55 2007 -0700
@@ -91,6 +91,8 @@ int sysctl_tcp_nometrics_save __read_mos
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;
+
+unsigned int sysctl_tcp_rto_min __read_mostly = TCP_RTO_MIN;
#define FLAG_DATA 0x01 /* Incoming frame contained data. */
#define FLAG_WIN_UPDATE 0x02 /* Incoming ACK was a window update. */
@@ -616,13 +618,13 @@ static void tcp_rtt_estimator(struct soc
if (tp->mdev_max < tp->rttvar)
tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
tp->rtt_seq = tp->snd_nxt;
- tp->mdev_max = TCP_RTO_MIN;
+ tp->mdev_max = sysctl_tcp_rto_min;
}
} else {
/* no previous measure. */
tp->srtt = m<<3; /* take the measured time to be rtt */
tp->mdev = m<<1; /* make sure rto = 3*rtt */
- tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+ tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
tp->rtt_seq = tp->snd_nxt;
}
}
@@ -843,7 +845,7 @@ static void tcp_init_metrics(struct sock
}
if (dst_metric(dst, RTAX_RTTVAR) > tp->mdev) {
tp->mdev = dst_metric(dst, RTAX_RTTVAR);
- tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+ tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
}
tcp_set_rto(sk);
tcp_bound_rto(sk);
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 20:52 [PATCH] make _minimum_ TCP retransmission timeout configurable Rick Jones
@ 2007-08-29 21:13 ` Eric Dumazet
2007-08-29 22:11 ` Rick Jones
2007-08-29 21:32 ` Ian McDonald
1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2007-08-29 21:13 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
Rick Jones a écrit :
> Enable configuration of the minimum TCP Retransmission Timeout via
> a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
> have quite variable RTTs avoid spurrious RTOs.
>
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> Signed-off-by: Lamont Jones <lamont@hp.com>
> ---
>
> diff -r 1559df81a153 include/linux/sysctl.h
> --- a/include/linux/sysctl.h Mon Aug 13 05:00:33 2007 +0000
> +++ b/include/linux/sysctl.h Wed Aug 22 10:42:55 2007 -0700
> @@ -441,6 +441,7 @@ enum
> NET_TCP_ALLOWED_CONG_CONTROL=123,
> NET_TCP_MAX_SSTHRESH=124,
> NET_TCP_FRTO_RESPONSE=125,
> + NET_TCP_RTO_MIN=126,
> };
>
> enum {
I am sure you can use CTL_UNNUMBERED instead of adding yet another sysctl
value, as advised in include/linux/sysctl.h
** For new interfaces unless you really need a binary number
** please use CTL_UNNUMBERED.
Thank you
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 20:52 [PATCH] make _minimum_ TCP retransmission timeout configurable Rick Jones
2007-08-29 21:13 ` Eric Dumazet
@ 2007-08-29 21:32 ` Ian McDonald
2007-08-29 21:46 ` David Miller
2007-08-29 22:09 ` Rick Jones
1 sibling, 2 replies; 30+ messages in thread
From: Ian McDonald @ 2007-08-29 21:32 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
On 8/30/07, Rick Jones <rick.jones2@hp.com> wrote:
> Enable configuration of the minimum TCP Retransmission Timeout via
> a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
> have quite variable RTTs avoid spurrious RTOs.
>
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> Signed-off-by: Lamont Jones <lamont@hp.com>
> ---
>
> diff -r 1559df81a153 Documentation/networking/ip-sysctl.txt
> --- a/Documentation/networking/ip-sysctl.txt Mon Aug 13 05:00:33 2007 +0000
> +++ b/Documentation/networking/ip-sysctl.txt Wed Aug 22 10:42:55 2007 -0700
> @@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de
> selected receiver buffers for TCP socket. This value does not override
> net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
> Default: 87380*2 bytes.
> +
> +tcp_rto_min - INTEGER
> + The minimum value for the TCP Retransmission Timeout, expressed
> + in milliseconds for the convenience of the user.
> + This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at
> + the high-end.
> + Default: 200.
>
Hmmm... RFC2988 says:
(2.4) Whenever RTO is computed, if it is less than 1 second then the
RTO SHOULD be rounded up to 1 second.
Traditionally, TCP implementations use coarse grain clocks to
measure the RTT and trigger the RTO, which imposes a large
minimum value on the RTO. Research suggests that a large
minimum RTO is needed to keep TCP conservative and avoid
spurious retransmissions [AP99]. Therefore, this
specification requires a large minimum RTO as a conservative
approach, while at the same time acknowledging that at some
future point, research may show that a smaller minimum RTO is
acceptable or superior.
I went and had a look and this RFC has not been obsoleted. RFC3390
also backs this assertion up.
So I'm suspecting that the default should be changed to 1000 to match
the RFC which would solve this issue. I note that the RFC is a SHOULD
rather than a MUST. I had a quick look around and not sure why Linux
overrides the RFC on this one.
Ian
--
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:32 ` Ian McDonald
@ 2007-08-29 21:46 ` David Miller
2007-08-29 22:10 ` Ian McDonald
` (2 more replies)
2007-08-29 22:09 ` Rick Jones
1 sibling, 3 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 21:46 UTC (permalink / raw)
To: ian.mcdonald; +Cc: rick.jones2, netdev
From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 09:32:38 +1200
> So I'm suspecting that the default should be changed to 1000 to match
> the RFC which would solve this issue. I note that the RFC is a SHOULD
> rather than a MUST. I had a quick look around and not sure why Linux
> overrides the RFC on this one.
Everyone uses this value, even BSD since ancient times.
None of the research folks want to commit to saying a lower value is
OK, even though it's quite clear that on a local 10 gigabit link a
minimum value of even 200 is absolutely and positively absurd.
So what do these cellphone network people want to do, increate the
minimum RTO or increase it? Exactly how does it help them?
If the issue is wireless loss, algorithms like FRTO might help them,
because FRTO tries to make a distinction between capacity losses
(which should adjust cwnd) and radio losses (which are not capacity
based and therefore should not affect cwnd).
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:32 ` Ian McDonald
2007-08-29 21:46 ` David Miller
@ 2007-08-29 22:09 ` Rick Jones
2007-08-29 22:20 ` David Miller
1 sibling, 1 reply; 30+ messages in thread
From: Rick Jones @ 2007-08-29 22:09 UTC (permalink / raw)
To: Ian McDonald; +Cc: netdev
Ian McDonald wrote:
> Hmmm... RFC2988 says:
> (2.4) Whenever RTO is computed, if it is less than 1 second then the
> RTO SHOULD be rounded up to 1 second.
>
> Traditionally, TCP implementations use coarse grain clocks to
> measure the RTT and trigger the RTO, which imposes a large
> minimum value on the RTO. Research suggests that a large
> minimum RTO is needed to keep TCP conservative and avoid
> spurious retransmissions [AP99]. Therefore, this
> specification requires a large minimum RTO as a conservative
> approach, while at the same time acknowledging that at some
> future point, research may show that a smaller minimum RTO is
> acceptable or superior.
>
> I went and had a look and this RFC has not been obsoleted. RFC3390
> also backs this assertion up.
>
> So I'm suspecting that the default should be changed to 1000 to match
> the RFC which would solve this issue. I note that the RFC is a SHOULD
> rather than a MUST. I had a quick look around and not sure why Linux
> overrides the RFC on this one.
If nothing else, 200 ms is a "principle of least surprise" thing since
that is the current value (in MS) for TCP_RTO_MIN.
rick jones
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:46 ` David Miller
@ 2007-08-29 22:10 ` Ian McDonald
2007-08-29 22:23 ` David Miller
2007-08-29 22:13 ` Stephen Hemminger
2007-08-29 22:29 ` Rick Jones
2 siblings, 1 reply; 30+ messages in thread
From: Ian McDonald @ 2007-08-29 22:10 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, netdev
On 8/30/07, David Miller <davem@davemloft.net> wrote:
> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
Understand what you are saying. That is why I questioned as 200 msecs
makes no sense on a LAN with < 1 msec RTT. So if the current is
ridiculous and 1000 is even more so, why do we use? Just because that
is how TCP is written I'm guessing.
I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
be a slight variation) but we ended up putting a minimum on it as you
also face a problem if it fires too frequently (i.e. link is in
usecs).
I might ask around on research lists and see why this issue has never
been revisited.
Now to the original issue - high RTT links. If that is an issue, and I
believe it would be, then it's probably better to do this on a per
route basis or similar, although then we're becoming a defacto X x rtt
type setup. Rereading the RFC this actually doesn't seem prohibited
and here is the code from DCCP CCID3 that we use:
/*
* Update timeout interval for the nofeedback timer.
* We use a configuration option to increase the lower bound.
* This can help avoid triggering the nofeedback timer too
* often ('spinning') on LANs with small RTTs.
*/
hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt,
CONFIG_IP_DCCP_CCID3_RTO *
(USEC_PER_SEC/1000));
/*
* Schedule no feedback timer to expire in
* max(t_RTO, 2 * s/X) = max(t_RTO, 2 * t_ipi)
*/
t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);
ccid3_pr_debug("%s(%p), Scheduled no feedback timer to "
"expire in %lu jiffies (%luus)\n",
dccp_role(sk),
sk, usecs_to_jiffies(t_nfb), t_nfb);
sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
jiffies + usecs_to_jiffies(t_nfb));
Maybe the TCP code could do this also (with a sysctl to turn behaviour
off and on) and then it would save system administrators having to
"tune" the TCP stack if they want this sort of behaviour.
Ian
--
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:13 ` Eric Dumazet
@ 2007-08-29 22:11 ` Rick Jones
0 siblings, 0 replies; 30+ messages in thread
From: Rick Jones @ 2007-08-29 22:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
> I am sure you can use CTL_UNNUMBERED instead of adding yet another
> sysctl value, as advised in include/linux/sysctl.h
>
> ** For new interfaces unless you really need a binary number
> ** please use CTL_UNNUMBERED.
fair enough. i was just repeating past behaviour :)
rick jones
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:46 ` David Miller
2007-08-29 22:10 ` Ian McDonald
@ 2007-08-29 22:13 ` Stephen Hemminger
2007-08-29 22:28 ` David Miller
2007-08-29 22:32 ` Rick Jones
2007-08-29 22:29 ` Rick Jones
2 siblings, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-08-29 22:13 UTC (permalink / raw)
To: David Miller; +Cc: ian.mcdonald, rick.jones2, netdev
On Wed, 29 Aug 2007 14:46:56 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
> So what do these cellphone network people want to do, increate the
> minimum RTO or increase it? Exactly how does it help them?
>
> If the issue is wireless loss, algorithms like FRTO might help them,
> because FRTO tries to make a distinction between capacity losses
> (which should adjust cwnd) and radio losses (which are not capacity
> based and therefore should not affect cwnd).
The following could help with loss.
There was some discussion about implementing TCP NCR (RFC4653)
and Narasimha Reddy said he might have something that could be used.
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:09 ` Rick Jones
@ 2007-08-29 22:20 ` David Miller
2007-08-29 22:33 ` Ian McDonald
0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2007-08-29 22:20 UTC (permalink / raw)
To: rick.jones2; +Cc: ian.mcdonald, netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 15:09:58 -0700
> If nothing else, 200 ms is a "principle of least surprise" thing since
> that is the current value (in MS) for TCP_RTO_MIN.
And Solaris and MacOS-X and...
In fact this is a great example why we don't treat RFCs as dictations
from the gods. They are often wrong, impractical, or full of fatal
flaws.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:10 ` Ian McDonald
@ 2007-08-29 22:23 ` David Miller
0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 22:23 UTC (permalink / raw)
To: ian.mcdonald; +Cc: rick.jones2, netdev
From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 10:10:37 +1200
> Understand what you are saying. That is why I questioned as 200 msecs
> makes no sense on a LAN with < 1 msec RTT. So if the current is
> ridiculous and 1000 is even more so, why do we use? Just because that
> is how TCP is written I'm guessing.
We considered getting rid of the lower bound several times, but didn't
want to investigate it fully back then.
> I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
> be a slight variation) but we ended up putting a minimum on it as you
> also face a problem if it fires too frequently (i.e. link is in
> usecs).
>
> I might ask around on research lists and see why this issue has never
> been revisited.
There is also the argument that on a local lan congestion control
stops to make any sense. The problem it that you can't detect what is
a local lan, and any config knob to indicate this is an unacceptable
hack.
Any "congestion" you see on a local high speed lan will be gone before
you can react to it, so it's pretty pointless to do anything.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:13 ` Stephen Hemminger
@ 2007-08-29 22:28 ` David Miller
2007-08-29 22:51 ` Stephen Hemminger
2007-08-29 22:32 ` Rick Jones
1 sibling, 1 reply; 30+ messages in thread
From: David Miller @ 2007-08-29 22:28 UTC (permalink / raw)
To: shemminger; +Cc: ian.mcdonald, rick.jones2, netdev
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Wed, 29 Aug 2007 15:13:01 -0700
> There was some discussion about implementing TCP NCR (RFC4653)
> and Narasimha Reddy said he might have something that could be used.
Although this looks interesting, I'm unsure it will help these
cell folks. Actually I can't tell for sure until Rick provides
us with some more details of the exact issue at hand.
NCR seems to deal with when the trigger loss recovery, whereas
the cell phone network folks aparently want to jack up TCP_RTO_MIN
so that hard timeout based retranmits are deferred a lot more
than normal.
And reading NCR some more, we already have something similar in the
form of Alexey's reordering detection, in fact it handles exactly the
case NCR supposedly deals with. We do not trigger loss recovery
strictly on the 3rd duplicate ACK, and we've known about and dealt
with the reordering issue explicitly for years.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 21:46 ` David Miller
2007-08-29 22:10 ` Ian McDonald
2007-08-29 22:13 ` Stephen Hemminger
@ 2007-08-29 22:29 ` Rick Jones
2007-08-29 22:35 ` David Miller
2 siblings, 1 reply; 30+ messages in thread
From: Rick Jones @ 2007-08-29 22:29 UTC (permalink / raw)
To: David Miller; +Cc: ian.mcdonald, netdev
David Miller wrote:
> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
>
>>So I'm suspecting that the default should be changed to 1000 to match
>>the RFC which would solve this issue. I note that the RFC is a SHOULD
>>rather than a MUST. I had a quick look around and not sure why Linux
>>overrides the RFC on this one.
>
>
> Everyone uses this value, even BSD since ancient times.
Or at least something close to it - some use 500 milliseconds for
"tcp_rto_min."
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
> So what do these cellphone network people want to do, increate the
> minimum RTO or increase it? Exactly how does it help them?
They want to increase it. The folks who triggered this want to make it
3 seconds to avoid spurrious RTOs. Their experience the "other
platform" they widh to replace suggests that 3 seconds is a good value
for their network.
> If the issue is wireless loss, algorithms like FRTO might help them,
> because FRTO tries to make a distinction between capacity losses
> (which should adjust cwnd) and radio losses (which are not capacity
> based and therefore should not affect cwnd).
I was looking at that. FRTO seems only to affect the cwnd calculations,
and not the RTO calculation, so it seems to "deal with" spurrious RTOs
rather than preclude them. There is a strong desire here to not have
spurrious RTO's in the first place. Each spurrious retransmission will
increase a user's charges.
rick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:13 ` Stephen Hemminger
2007-08-29 22:28 ` David Miller
@ 2007-08-29 22:32 ` Rick Jones
1 sibling, 0 replies; 30+ messages in thread
From: Rick Jones @ 2007-08-29 22:32 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, ian.mcdonald, netdev
From what I've seen thusfar, the issue isn't so much actual loss, but
very variable RTTs leading to spurrious RTOs.
rick jones
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:20 ` David Miller
@ 2007-08-29 22:33 ` Ian McDonald
2007-08-29 22:37 ` David Miller
0 siblings, 1 reply; 30+ messages in thread
From: Ian McDonald @ 2007-08-29 22:33 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, netdev
On 8/30/07, David Miller <davem@davemloft.net> wrote:
> In fact this is a great example why we don't treat RFCs as dictations
> from the gods. They are often wrong, impractical, or full of fatal
> flaws.
>
Correct - they often have flaws in them, just like all documents. If
that is the case we should try and get the RFCs fixed. I've raised
this in a discussion in the ICCRG group and see if I get any sort of
response.
Ian
--
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:29 ` Rick Jones
@ 2007-08-29 22:35 ` David Miller
2007-08-29 22:48 ` John Heffner
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 22:35 UTC (permalink / raw)
To: rick.jones2; +Cc: ian.mcdonald, netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 15:29:03 -0700
> David Miller wrote:
> > None of the research folks want to commit to saying a lower value is
> > OK, even though it's quite clear that on a local 10 gigabit link a
> > minimum value of even 200 is absolutely and positively absurd.
> >
> > So what do these cellphone network people want to do, increate the
> > minimum RTO or increase it? Exactly how does it help them?
>
> They want to increase it. The folks who triggered this want to make it
> 3 seconds to avoid spurrious RTOs. Their experience the "other
> platform" they widh to replace suggests that 3 seconds is a good value
> for their network.
>
> > If the issue is wireless loss, algorithms like FRTO might help them,
> > because FRTO tries to make a distinction between capacity losses
> > (which should adjust cwnd) and radio losses (which are not capacity
> > based and therefore should not affect cwnd).
>
> I was looking at that. FRTO seems only to affect the cwnd calculations,
> and not the RTO calculation, so it seems to "deal with" spurrious RTOs
> rather than preclude them. There is a strong desire here to not have
> spurrious RTO's in the first place. Each spurrious retransmission will
> increase a user's charges.
All of this seems to suggest that the RTO calculation is wrong.
It seems that packets in this network can be delayed several orders of
magnitude longer than the usual round trip as measured by TCP.
What exactly causes such a huge delay? What is the TCP measured RTO
in these circumstances where spurious RTOs happen and a 3 second
minimum RTO makes things better?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:33 ` Ian McDonald
@ 2007-08-29 22:37 ` David Miller
0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 22:37 UTC (permalink / raw)
To: ian.mcdonald; +Cc: rick.jones2, netdev
From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 10:33:32 +1200
> Correct - they often have flaws in them, just like all documents. If
> that is the case we should try and get the RFCs fixed.
In many cases it is not the wording, but the actual concept or idea
the RFC itself is describing which is fatally flawed.
TCP timestamps are a great example, as designed they simply do not
work when ACKs are reordered by the network because it makes the PAWS
test fail for the out of order ACKs.
Therefore everyone adds an extra fuzz to the PAWS test so that a small
window of "older" packets are allowed to pass the check.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:35 ` David Miller
@ 2007-08-29 22:48 ` John Heffner
2007-08-29 22:52 ` John Heffner
2007-08-29 22:53 ` Edgar E. Iglesias
2007-08-29 23:06 ` Rick Jones
2 siblings, 1 reply; 30+ messages in thread
From: John Heffner @ 2007-08-29 22:48 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 15:29:03 -0700
>
>> David Miller wrote:
>>> None of the research folks want to commit to saying a lower value is
>>> OK, even though it's quite clear that on a local 10 gigabit link a
>>> minimum value of even 200 is absolutely and positively absurd.
>>>
>>> So what do these cellphone network people want to do, increate the
>>> minimum RTO or increase it? Exactly how does it help them?
>> They want to increase it. The folks who triggered this want to make it
>> 3 seconds to avoid spurrious RTOs. Their experience the "other
>> platform" they widh to replace suggests that 3 seconds is a good value
>> for their network.
>>
>>> If the issue is wireless loss, algorithms like FRTO might help them,
>>> because FRTO tries to make a distinction between capacity losses
>>> (which should adjust cwnd) and radio losses (which are not capacity
>>> based and therefore should not affect cwnd).
>> I was looking at that. FRTO seems only to affect the cwnd calculations,
>> and not the RTO calculation, so it seems to "deal with" spurrious RTOs
>> rather than preclude them. There is a strong desire here to not have
>> spurrious RTO's in the first place. Each spurrious retransmission will
>> increase a user's charges.
>
> All of this seems to suggest that the RTO calculation is wrong.
I think there's definitely room for improving the RTO calculation.
However, this may not be the end-all fix...
> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
>
> What exactly causes such a huge delay? What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?
I haven't done a lot of work on wireless myself, but my understanding is
that one of the biggest problems is the behavior link-layer
retransmission schemes. They can suddenly increase the delay of packets
by a significant amount when you get a burst of radio interference.
It's hard for TCP to gracefully handle this kind of jump without some
minimum RTO, especially since wlan RTTs can often be quite small.
-John
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:28 ` David Miller
@ 2007-08-29 22:51 ` Stephen Hemminger
2007-08-29 22:58 ` NCR, was " John Heffner
0 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-08-29 22:51 UTC (permalink / raw)
To: David Miller; +Cc: ian.mcdonald, rick.jones2, netdev
On Wed, 29 Aug 2007 15:28:12 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Wed, 29 Aug 2007 15:13:01 -0700
>
> > There was some discussion about implementing TCP NCR (RFC4653)
> > and Narasimha Reddy said he might have something that could be used.
>
> Although this looks interesting, I'm unsure it will help these
> cell folks. Actually I can't tell for sure until Rick provides
> us with some more details of the exact issue at hand.
>
> NCR seems to deal with when the trigger loss recovery, whereas
> the cell phone network folks aparently want to jack up TCP_RTO_MIN
> so that hard timeout based retranmits are deferred a lot more
> than normal.
>
> And reading NCR some more, we already have something similar in the
> form of Alexey's reordering detection, in fact it handles exactly the
> case NCR supposedly deals with. We do not trigger loss recovery
> strictly on the 3rd duplicate ACK, and we've known about and dealt
> with the reordering issue explicitly for years.
>
Yeah, it looked like another case of BSD RFC writers reinventing
Linux algorithms, but it is worth getting the behaviour standardized
and more widely reviewed.
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:48 ` John Heffner
@ 2007-08-29 22:52 ` John Heffner
0 siblings, 0 replies; 30+ messages in thread
From: John Heffner @ 2007-08-29 22:52 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
John Heffner wrote:
>> What exactly causes such a huge delay? What is the TCP measured RTO
>> in these circumstances where spurious RTOs happen and a 3 second
>> minimum RTO makes things better?
>
> I haven't done a lot of work on wireless myself, but my understanding is
> that one of the biggest problems is the behavior link-layer
> retransmission schemes. They can suddenly increase the delay of packets
> by a significant amount when you get a burst of radio interference. It's
> hard for TCP to gracefully handle this kind of jump without some minimum
> RTO, especially since wlan RTTs can often be quite small.
(Replying to myself) Though F-RTO does often help in this case.
-John
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:35 ` David Miller
2007-08-29 22:48 ` John Heffner
@ 2007-08-29 22:53 ` Edgar E. Iglesias
2007-08-29 23:06 ` Rick Jones
2 siblings, 0 replies; 30+ messages in thread
From: Edgar E. Iglesias @ 2007-08-29 22:53 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
On Wed, Aug 29, 2007 at 03:35:03PM -0700, David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 15:29:03 -0700
>
> > David Miller wrote:
> > > None of the research folks want to commit to saying a lower value is
> > > OK, even though it's quite clear that on a local 10 gigabit link a
> > > minimum value of even 200 is absolutely and positively absurd.
> > >
> > > So what do these cellphone network people want to do, increate the
> > > minimum RTO or increase it? Exactly how does it help them?
> >
> > They want to increase it. The folks who triggered this want to make it
> > 3 seconds to avoid spurrious RTOs. Their experience the "other
> > platform" they widh to replace suggests that 3 seconds is a good value
> > for their network.
> >
> > > If the issue is wireless loss, algorithms like FRTO might help them,
> > > because FRTO tries to make a distinction between capacity losses
> > > (which should adjust cwnd) and radio losses (which are not capacity
> > > based and therefore should not affect cwnd).
> >
> > I was looking at that. FRTO seems only to affect the cwnd calculations,
> > and not the RTO calculation, so it seems to "deal with" spurrious RTOs
> > rather than preclude them. There is a strong desire here to not have
> > spurrious RTO's in the first place. Each spurrious retransmission will
> > increase a user's charges.
>
> All of this seems to suggest that the RTO calculation is wrong.
>
> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
>
> What exactly causes such a huge delay? What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?
I don't know what they are doing, but it reminds me of what happens when
you run TCP over a reliable medium. You don't see loss, instead the
RTT starts to jitter alot.
IIRC FRTO does help avoid unnecessary retransmits (although the RTO still
hits).
Best regards
--
Programmer
Edgar E. Iglesias <edgar.iglesias@axis.com> 46.46.272.1946
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: NCR, was [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:51 ` Stephen Hemminger
@ 2007-08-29 22:58 ` John Heffner
2007-08-29 22:59 ` David Miller
0 siblings, 1 reply; 30+ messages in thread
From: John Heffner @ 2007-08-29 22:58 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, ian.mcdonald, rick.jones2, netdev
Stephen Hemminger wrote:
> On Wed, 29 Aug 2007 15:28:12 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>> And reading NCR some more, we already have something similar in the
>> form of Alexey's reordering detection, in fact it handles exactly the
>> case NCR supposedly deals with. We do not trigger loss recovery
>> strictly on the 3rd duplicate ACK, and we've known about and dealt
>> with the reordering issue explicitly for years.
>>
>
> Yeah, it looked like another case of BSD RFC writers reinventing
> Linux algorithms, but it is worth getting the behaviour standardized
> and more widely reviewed.
I don't believe this was the case. NCR is substantially different, and
came out of work at Texas A&M. The original (only) implementation was
in Linux IIRC. Its goal was to do better. Their papers say it does.
It might be worth looking at.
In my own experience with reordering, Alexey's code had some
hard-to-track-down bugs (look at all the work Ilpo's been doing), and
the relative simplicity of NCR may be one of the reasons it does well in
tests.
-John
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: NCR, was [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:58 ` NCR, was " John Heffner
@ 2007-08-29 22:59 ` David Miller
0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 22:59 UTC (permalink / raw)
To: jheffner; +Cc: shemminger, ian.mcdonald, rick.jones2, netdev
From: John Heffner <jheffner@psc.edu>
Date: Wed, 29 Aug 2007 18:58:12 -0400
> I don't believe this was the case. NCR is substantially different, and
> came out of work at Texas A&M. The original (only) implementation was
> in Linux IIRC. Its goal was to do better. Their papers say it does.
> It might be worth looking at.
>
> In my own experience with reordering, Alexey's code had some
> hard-to-track-down bugs (look at all the work Ilpo's been doing), and
> the relative simplicity of NCR may be one of the reasons it does well in
> tests.
Interesting, thanks for the info John.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 22:35 ` David Miller
2007-08-29 22:48 ` John Heffner
2007-08-29 22:53 ` Edgar E. Iglesias
@ 2007-08-29 23:06 ` Rick Jones
2007-08-29 23:15 ` David Miller
2 siblings, 1 reply; 30+ messages in thread
From: Rick Jones @ 2007-08-29 23:06 UTC (permalink / raw)
To: David Miller; +Cc: ian.mcdonald, netdev
> All of this seems to suggest that the RTO calculation is wrong.
That is a possiblity. Or at least could be enhanced.
> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
>
> What exactly causes such a huge delay? What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?
I belive the biggest component comes from link-layer retransmissions.
There can also be some short outtages thanks to signal blocking,
tunnels, people with big hats and whatnot that the link-layer
retransmissions are trying to address. The three seconds seems to be a
value that gives the certainty that 99 times out of 10 the segment was
indeed lost.
The trace I've been sent shows clean RTTs ranging from ~200 milliseconds
to ~7000 milliseconds.
rick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 23:06 ` Rick Jones
@ 2007-08-29 23:15 ` David Miller
2007-08-29 23:31 ` Rick Jones
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: David Miller @ 2007-08-29 23:15 UTC (permalink / raw)
To: rick.jones2; +Cc: ian.mcdonald, netdev, ilpo.jarvinen
From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 16:06:27 -0700
> I belive the biggest component comes from link-layer retransmissions.
> There can also be some short outtages thanks to signal blocking,
> tunnels, people with big hats and whatnot that the link-layer
> retransmissions are trying to address. The three seconds seems to be a
> value that gives the certainty that 99 times out of 10 the segment was
> indeed lost.
>
> The trace I've been sent shows clean RTTs ranging from ~200 milliseconds
> to ~7000 milliseconds.
Thanks for the info.
It's pretty easy to generate examples where we might have some sockets
talking over interfaces on such a network and others which are not.
Therefore, if we do this, a per-route metric is probably the best bet.
Ilpo, I'm also very interested to see what you think of all of this
:-)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 23:15 ` David Miller
@ 2007-08-29 23:31 ` Rick Jones
2007-08-30 5:22 ` Krishna Kumar2
2007-08-29 23:44 ` John Heffner
2007-09-05 19:04 ` Ilpo Järvinen
2 siblings, 1 reply; 30+ messages in thread
From: Rick Jones @ 2007-08-29 23:31 UTC (permalink / raw)
To: David Miller; +Cc: ian.mcdonald, netdev, ilpo.jarvinen
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
>>The trace I've been sent shows clean RTTs ranging from ~200 milliseconds
>>to ~7000 milliseconds.
>
>
> Thanks for the info.
>
> It's pretty easy to generate examples where we might have some sockets
> talking over interfaces on such a network and others which are not.
> Therefore, if we do this, a per-route metric is probably the best bet.
FWIW, the places where I've seen this come-up thusfar are where we have
a sort of "gateway" or front-end system which is connected on one side
to the cellphone network with the bad delays, and on the other side is
connected to an internal network where actual losses leading to RTO's
are epsilon. Certainly something which could make a per-route decision
would work there and probably quite well, though a simple sysctl does
seem to be sufficient and would touch fewer places.
Do you think it is still worthwhile for me to rework the initial patch
to use CTL_UNNUMBERED?
rick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 23:15 ` David Miller
2007-08-29 23:31 ` Rick Jones
@ 2007-08-29 23:44 ` John Heffner
2007-09-05 19:04 ` Ilpo Järvinen
2 siblings, 0 replies; 30+ messages in thread
From: John Heffner @ 2007-08-29 23:44 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev, ilpo.jarvinen
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 16:06:27 -0700
>
>> I belive the biggest component comes from link-layer retransmissions.
>> There can also be some short outtages thanks to signal blocking,
>> tunnels, people with big hats and whatnot that the link-layer
>> retransmissions are trying to address. The three seconds seems to be a
>> value that gives the certainty that 99 times out of 10 the segment was
>> indeed lost.
>>
>> The trace I've been sent shows clean RTTs ranging from ~200 milliseconds
>> to ~7000 milliseconds.
>
> Thanks for the info.
>
> It's pretty easy to generate examples where we might have some sockets
> talking over interfaces on such a network and others which are not.
> Therefore, if we do this, a per-route metric is probably the best bet.
This is exactly what I was thinking. It might even help discourage
users from playing with this setting who should not. ;)
-John
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 23:31 ` Rick Jones
@ 2007-08-30 5:22 ` Krishna Kumar2
2007-08-30 17:10 ` Rick Jones
0 siblings, 1 reply; 30+ messages in thread
From: Krishna Kumar2 @ 2007-08-30 5:22 UTC (permalink / raw)
To: Rick Jones
Cc: David Miller, ian.mcdonald, ilpo.jarvinen, netdev, netdev-owner
Hi Rick,
> > From: Rick Jones <rick.jones2@hp.com>
> >>The trace I've been sent shows clean RTTs ranging from ~200
milliseconds
> >>to ~7000 milliseconds.
> >
> >
> > Thanks for the info.
> >
> > It's pretty easy to generate examples where we might have some sockets
> > talking over interfaces on such a network and others which are not.
> > Therefore, if we do this, a per-route metric is probably the best bet.
>
> FWIW, the places where I've seen this come-up thusfar are where we have
> a sort of "gateway" or front-end system which is connected on one side
> to the cellphone network with the bad delays, and on the other side is
> connected to an internal network where actual losses leading to RTO's
> are epsilon. Certainly something which could make a per-route decision
> would work there and probably quite well, though a simple sysctl does
> seem to be sufficient and would touch fewer places.
>
> Do you think it is still worthwhile for me to rework the initial patch
> to use CTL_UNNUMBERED?
You could add following cleanup:
static int proc_tcp_rto_min(ctl_table *ctl, int write, struct file *filp,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
int *valp = ctl->data;
int oldval = *valp;
int ret;
ret = proc_dointvec_ms_jiffies(ctl, write, filp, buffer, lenp, ppos);
if (ret)
return ret;
/* some bounds checking would be in order */
if (write && *valp != oldval) {
if (*valp < (int)TCP_RTO_MIN || *valp > (int)TCP_RTO_MAX) {
*valp = oldval;
ret = -EINVAL;
}
}
return ret;
}
Also, isn't it enough to use u32 for valp/oldval and remove the "(int)"
typecasts?
Thanks,
- KK
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-30 5:22 ` Krishna Kumar2
@ 2007-08-30 17:10 ` Rick Jones
0 siblings, 0 replies; 30+ messages in thread
From: Rick Jones @ 2007-08-30 17:10 UTC (permalink / raw)
To: Krishna Kumar2
Cc: David Miller, ian.mcdonald, ilpo.jarvinen, netdev, netdev-owner
Krishna Kumar2 wrote:
> Hi Rick,
>
>
>>>From: Rick Jones <rick.jones2@hp.com>
>>>
>>>>The trace I've been sent shows clean RTTs ranging from ~200
>
> milliseconds
>
>>>>to ~7000 milliseconds.
>>>
>>>
>>>Thanks for the info.
>>>
>>>It's pretty easy to generate examples where we might have some sockets
>>>talking over interfaces on such a network and others which are not.
>>>Therefore, if we do this, a per-route metric is probably the best bet.
>>
>>FWIW, the places where I've seen this come-up thusfar are where we have
>>a sort of "gateway" or front-end system which is connected on one side
>>to the cellphone network with the bad delays, and on the other side is
>>connected to an internal network where actual losses leading to RTO's
>>are epsilon. Certainly something which could make a per-route decision
>>would work there and probably quite well, though a simple sysctl does
>>seem to be sufficient and would touch fewer places.
>>
>>Do you think it is still worthwhile for me to rework the initial patch
>>to use CTL_UNNUMBERED?
>
>
> You could add following cleanup:
>
> static int proc_tcp_rto_min(ctl_table *ctl, int write, struct file *filp,
> void __user *buffer, size_t *lenp,
> loff_t *ppos)
> {
> int *valp = ctl->data;
> int oldval = *valp;
> int ret;
>
> ret = proc_dointvec_ms_jiffies(ctl, write, filp, buffer, lenp, ppos);
> if (ret)
> return ret;
>
> /* some bounds checking would be in order */
> if (write && *valp != oldval) {
> if (*valp < (int)TCP_RTO_MIN || *valp > (int)TCP_RTO_MAX) {
> *valp = oldval;
> ret = -EINVAL;
> }
> }
> return ret;
> }
Sure.
> Also, isn't it enough to use u32 for valp/oldval and remove the "(int)"
> typecasts?
I suppose, that was some mimicing of code I'd seen elsewhere but I'll
give it a shot.
rick
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-08-29 23:15 ` David Miller
2007-08-29 23:31 ` Rick Jones
2007-08-29 23:44 ` John Heffner
@ 2007-09-05 19:04 ` Ilpo Järvinen
2007-09-06 20:39 ` David Miller
2 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2007-09-05 19:04 UTC (permalink / raw)
To: David Miller; +Cc: rick.jones2, ian.mcdonald, Netdev
On Wed, 29 Aug 2007, David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 16:06:27 -0700
>
> > I belive the biggest component comes from link-layer retransmissions.
> > There can also be some short outtages thanks to signal blocking,
> > tunnels, people with big hats and whatnot that the link-layer
> > retransmissions are trying to address. The three seconds seems to be a
> > value that gives the certainty that 99 times out of 10 the segment was
> > indeed lost.
> >
> > The trace I've been sent shows clean RTTs ranging from ~200 milliseconds
> > to ~7000 milliseconds.
>
> Thanks for the info.
>
> It's pretty easy to generate examples where we might have some sockets
> talking over interfaces on such a network and others which are not.
> Therefore, if we do this, a per-route metric is probably the best bet.
>
> Ilpo, I'm also very interested to see what you think of all of this :-)
...Haven't been too actively reading mails for a while until now, so I'm a
bit late in response... I'll try to quickly summarize FRTO here.
It's true that FRTO cannot prevent the first retransmission, yet I suspect
that it won't cost that much even if you have to pay for each bit, won't
be that high percentage out of all packets after all :-). However, usually
when you have a spurious RTO, not only the first segment unnecessarily
retransmitted but the *whole window*. It goes like this: all cumulative
ACKs got delayed due to in-order delivery, then TCP will actually send
1.5*original cwnd worth of data in the RTO's slow-start when the delayed
ACKs arrive (basically the original cwnd worth of it unnecessarily). In
case one is interested in minimizing unnecessary retransmissions e.g. due
to cost, those rexmissions must never see daylight. Besides, in the worst
case the generated burst overloads the bottleneck buffers which is likely
to significantly delay the further progress of the flow. In case of ll
rexmissions, ACK compression often occurs at the same time making the
burst very "sharp edged" (in that case TCP often loses most of the
segments above high_seq => very bad performance too). When FRTO is
enabled, those unnecessary retransmissions are fully avoided except for
the first segment and the cwnd behavior after detected spurious RTO is
determined by the response (one can tune that by sysctl). Basic version
(non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious
and falls back to conservative behavior. ACK lossage is much less
significant than reordering, usually the FRTO can detect spurious RTO if
at least 2 cumulative ACKs from original window are preserved (excluding
the ACK that advances to high_seq). With SACK-enhanced version, the
detection is quite robust. Of course one could jump to min_rto bandwagon
instead, but it often ends up being more or less black magic and can still
produce unwanted behavior unless one goes to ridicilously high minimum RTOs.
Main obstacle to FRTO use is its deployment as it has to be on the sender
side where as wireless link is often the receiver's access link but if one
can tune tcp_min_rto (or equal) on the sender side, one could enable
FRTO at will as well. Anyway, anything older than 2.6.22 is not going to
give very good results with FRTO. FRTO code's maturity point of view, IMHO
currently just unconditional clearing of undo_marker (in
tcp_enter_frto_loss) is on the way of enabling FRTO in future kernels by
default as it basically disables DSACK undoing, I'll try to solve that
soon, has been on my todo list for too long already (don't currently have
much time to devote to that though so 2.6.24-rc1 might come too early for
me :-(). After that, it might be a good move to enable it in mainline by
default if you agree... ...Uninteresting enough, even IETF seems to
interested in advancing FRTO from experimental [1].
Another important thing to consider in cellular besides ll rexmissions is
bandwidth allocation delay... We actually a week ago ran some measurements
in a real umts network to determine buffer, one-way delay, etc. behavior
(though YMMV depending on operators configuration etc.). Basically we saw
1 s delay spike when allocation delay occurs (it's very hard to predict
when that happens due to other network users role). One-way propagation
delay was around 50 ms, so 1500 bytes takes about 80 ms+ to transmit, so
it's it order of magnitude larger than RTT but queue delay is probably
large enough to prevent spurious RTOs due to allocation delay. Besides
that, we saw some long latencies, up to 8-12 s, they could be due to ll
retransmissions but their source is not yet verified to be the WWAN link
as we had the phone connected through bluetooth (could interfere). A funny
sidenote about the experiment, we found out what Linux cannot do (from
userspace only): it seems to be unable to receive the same packet it has
sent out to itself as we forced the packet out from eth0 by binding
sending dev to eth0 and received from ppp0 => the packet gots always
discard as martian and there seems to be no knob to that, so had to
hack it :-).
--
i.
[1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
2007-09-05 19:04 ` Ilpo Järvinen
@ 2007-09-06 20:39 ` David Miller
0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-09-06 20:39 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: rick.jones2, ian.mcdonald, netdev
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 5 Sep 2007 22:04:11 +0300 (EEST)
> Main obstacle to FRTO use is its deployment as it has to be on the
> sender side where as wireless link is often the receiver's access
> link but if one can tune tcp_min_rto (or equal) on the sender side,
> one could enable FRTO at will as well.
Correct.
I've currently succumbed to the realization that no matter what we
tell these folks about FRTO they aren't interested in testing it as a
usable solution mostly because they already invested to time to find
out that min_rto gives them pretty much what they want.
I'd prefer they used an FRTO based solution but this is reality :)
> Uninteresting enough, even IETF seems to
> interested in advancing FRTO from experimental [1].
That's great :)
> A funny sidenote about the experiment, we found out what Linux
> cannot do (from userspace only): it seems to be unable to receive
> the same packet it has sent out to itself as we forced the packet
> out from eth0 by binding sending dev to eth0 and received from ppp0
> => the packet gots always discard as martian and there seems to be
> no knob to that, so had to hack it :-).
Yes, and this is loosely related to the "send to self" patches
that come up from time to time because Linux naturally wants
to just loopback in software any packet you try to direct out
a real interface that matches a local host address.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-09-06 20:40 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-29 20:52 [PATCH] make _minimum_ TCP retransmission timeout configurable Rick Jones
2007-08-29 21:13 ` Eric Dumazet
2007-08-29 22:11 ` Rick Jones
2007-08-29 21:32 ` Ian McDonald
2007-08-29 21:46 ` David Miller
2007-08-29 22:10 ` Ian McDonald
2007-08-29 22:23 ` David Miller
2007-08-29 22:13 ` Stephen Hemminger
2007-08-29 22:28 ` David Miller
2007-08-29 22:51 ` Stephen Hemminger
2007-08-29 22:58 ` NCR, was " John Heffner
2007-08-29 22:59 ` David Miller
2007-08-29 22:32 ` Rick Jones
2007-08-29 22:29 ` Rick Jones
2007-08-29 22:35 ` David Miller
2007-08-29 22:48 ` John Heffner
2007-08-29 22:52 ` John Heffner
2007-08-29 22:53 ` Edgar E. Iglesias
2007-08-29 23:06 ` Rick Jones
2007-08-29 23:15 ` David Miller
2007-08-29 23:31 ` Rick Jones
2007-08-30 5:22 ` Krishna Kumar2
2007-08-30 17:10 ` Rick Jones
2007-08-29 23:44 ` John Heffner
2007-09-05 19:04 ` Ilpo Järvinen
2007-09-06 20:39 ` David Miller
2007-08-29 22:09 ` Rick Jones
2007-08-29 22:20 ` David Miller
2007-08-29 22:33 ` Ian McDonald
2007-08-29 22:37 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).