* [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS
@ 2008-05-21 16:38 Stephane Chazelas
2008-05-21 16:56 ` Patrick McHardy
2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger
0 siblings, 2 replies; 12+ messages in thread
From: Stephane Chazelas @ 2008-05-21 16:38 UTC (permalink / raw)
To: shemminger; +Cc: netdev
Hi Stephen,
there seems to be something wrong with the timer values as
processed by at least the "ss" and "ip" commands when
CONFIG_HIGH_RES_TIMERS is on.
$ zgrep -e HZ= -e HIGH_RES /proc/config.gz
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HZ=250
For instance, the "ip route rtt <time>" and
rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i
look incorrect:
$ PROC_NET_TCP=/proc/net/tcp ss -i
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 rto:5.5e-08
[...]
$ ss -i
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198
bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792
$ ip route show
10.95.131.111 via 10.95.128.1 dev eth0 rtt 5ms
$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.95.131.111 10.95.128.1 255.255.255.255 UGH 0 0 5000000 eth0
I think this is because get_hz gets the HZ value from the fourth
field in /proc/net/psched. That's 1e9 in my case, because that's
meant to be the frequency of CLOCK_MONOTONIC and when
CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is
1ns. So get_hz returns 1,000,000,000 instead of 250 in my case.
I don't know how to get the right value, clock_getres return 1ns
as well.
That's with 2.6.24.2, iproute-20080108 from debian (but the GIT
head code looks the same).
Best regards,
Stéphane
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS 2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas @ 2008-05-21 16:56 ` Patrick McHardy 2008-05-21 17:40 ` [PATCH] net: neighbour table ABI problem Stephen Hemminger 2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger 1 sibling, 1 reply; 12+ messages in thread From: Patrick McHardy @ 2008-05-21 16:56 UTC (permalink / raw) To: Stephane Chazelas; +Cc: shemminger, netdev Stephane Chazelas wrote: > Hi Stephen, > > there seems to be something wrong with the timer values as > processed by at least the "ss" and "ip" commands when > CONFIG_HIGH_RES_TIMERS is on. > > $ zgrep -e HZ= -e HIGH_RES /proc/config.gz > CONFIG_HIGH_RES_TIMERS=y > CONFIG_HZ=250 > > For instance, the "ip route rtt <time>" and > rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i > look incorrect: > > $ PROC_NET_TCP=/proc/net/tcp ss -i > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 rto:5.5e-08 > [...] > > $ ss -i > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 > bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792 > > > $ ip route show > 10.95.131.111 via 10.95.128.1 dev eth0 rtt 5ms > > $ netstat -rn > Kernel IP routing table > Destination Gateway Genmask Flags MSS Window irtt Iface > 10.95.131.111 10.95.128.1 255.255.255.255 UGH 0 0 5000000 eth0 > > I think this is because get_hz gets the HZ value from the fourth > field in /proc/net/psched. That's 1e9 in my case, because that's > meant to be the frequency of CLOCK_MONOTONIC and when > CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is > 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case. > > I don't know how to get the right value, clock_getres return 1ns > as well. > > That's with 2.6.24.2, iproute-20080108 from debian (but the GIT > head code looks the same). Both ss and ip route shouldn't be using get_hz(). inet_diag is using fixed units anyway, the routing metrics should to. /proc/net/psched is *only* for packet schedulers and they want to know the real clock resolution. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] net: neighbour table ABI problem 2008-05-21 16:56 ` Patrick McHardy @ 2008-05-21 17:40 ` Stephen Hemminger 2008-05-21 20:35 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-05-21 17:40 UTC (permalink / raw) To: Patrick McHardy, Thomas Graf; +Cc: Stephane Chazelas, netdev The neighbor table time of last use information is returned in the incorrect unit. Kernel to user space ABI's need to use USER_HZ (or milliseconds), otherwise the application has to try and discover the real system HZ value which is problematic. Linux has standardized on keeping USER_HZ consistent (100hz) even when kernel is running internally at some other value. This change is small, but it breaks the ABI for older version of iproute2 utilities. But these utilities are already broken since they are looking at the psched_hz values which are completely different. So let's just go ahead and fix both kernel and user space. Older utilities will just print wrong values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- a/net/core/neighbour.c 2008-05-21 10:23:05.000000000 -0700 +++ b/net/core/neighbour.c 2008-05-21 10:28:09.000000000 -0700 @@ -2057,9 +2057,9 @@ static int neigh_fill_info(struct sk_buf goto nla_put_failure; } - ci.ndm_used = now - neigh->used; - ci.ndm_confirmed = now - neigh->confirmed; - ci.ndm_updated = now - neigh->updated; + ci.ndm_used = jiffies_to_clock_t(now - neigh->used); + ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed); + ci.ndm_updated = jiffies_to_clock_t(now - neigh->updated); ci.ndm_refcnt = atomic_read(&neigh->refcnt) - 1; read_unlock_bh(&neigh->lock); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] net: neighbour table ABI problem 2008-05-21 17:40 ` [PATCH] net: neighbour table ABI problem Stephen Hemminger @ 2008-05-21 20:35 ` David Miller 2008-05-22 0:20 ` Thomas Graf 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2008-05-21 20:35 UTC (permalink / raw) To: shemminger; +Cc: kaber, tgraf, Stephane_Chazelas, netdev From: Stephen Hemminger <shemminger@vyatta.com> Date: Wed, 21 May 2008 10:40:19 -0700 > The neighbor table time of last use information is returned in the incorrect > unit. Kernel to user space ABI's need to use USER_HZ (or milliseconds), otherwise > the application has to try and discover the real system HZ value which is problematic. > Linux has standardized on keeping USER_HZ consistent (100hz) even when kernel is > running internally at some other value. > > This change is small, but it breaks the ABI for older version of iproute2 utilities. > But these utilities are already broken since they are looking at the psched_hz values > which are completely different. So let's just go ahead and fix both kernel and user > space. Older utilities will just print wrong values. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> In at least one sense the kernel has been providing a consistent value :-) I don't know what to do here, it's different from the other patch you posted today in that I can't see any easy way to not change behavior for old stuff. Can we add a new attribute or something like that? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] net: neighbour table ABI problem 2008-05-21 20:35 ` David Miller @ 2008-05-22 0:20 ` Thomas Graf 2008-06-03 23:03 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Thomas Graf @ 2008-05-22 0:20 UTC (permalink / raw) To: David Miller; +Cc: shemminger, kaber, Stephane_Chazelas, netdev * David Miller <davem@davemloft.net> 2008-05-21 13:35 > From: Stephen Hemminger <shemminger@vyatta.com> > Date: Wed, 21 May 2008 10:40:19 -0700 > > > This change is small, but it breaks the ABI for older version of iproute2 utilities. > > But these utilities are already broken since they are looking at the psched_hz values > > which are completely different. So let's just go ahead and fix both kernel and user > > space. Older utilities will just print wrong values. > > > > Can we add a new attribute or something like that? We could do but I agree with Stephen to just fix it the way he proposes. The value we're talking is only useful in a debugging or statistical context. We're not changing the format of the attribute at all, not even the unit, srictly speaking. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] net: neighbour table ABI problem 2008-05-22 0:20 ` Thomas Graf @ 2008-06-03 23:03 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2008-06-03 23:03 UTC (permalink / raw) To: tgraf; +Cc: shemminger, kaber, Stephane_Chazelas, netdev From: Thomas Graf <tgraf@suug.ch> Date: Thu, 22 May 2008 02:20:42 +0200 > * David Miller <davem@davemloft.net> 2008-05-21 13:35 > > From: Stephen Hemminger <shemminger@vyatta.com> > > Date: Wed, 21 May 2008 10:40:19 -0700 > > > > > This change is small, but it breaks the ABI for older version of iproute2 utilities. > > > But these utilities are already broken since they are looking at the psched_hz values > > > which are completely different. So let's just go ahead and fix both kernel and user > > > space. Older utilities will just print wrong values. > > > > > > > Can we add a new attribute or something like that? > > We could do but I agree with Stephen to just fix it the way he > proposes. The value we're talking is only useful in a debugging > or statistical context. We're not changing the format of the > attribute at all, not even the unit, srictly speaking. Fair enough, I've applied Stephen's patch. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS 2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas 2008-05-21 16:56 ` Patrick McHardy @ 2008-05-21 17:10 ` Stephen Hemminger 2008-05-21 18:43 ` route metrics in jiffies?? Stephen Hemminger 1 sibling, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-05-21 17:10 UTC (permalink / raw) To: Stephane Chazelas; +Cc: netdev On Wed, 21 May 2008 17:38:15 +0100 Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote: > Hi Stephen, > > there seems to be something wrong with the timer values as > processed by at least the "ss" and "ip" commands when > CONFIG_HIGH_RES_TIMERS is on. > > $ zgrep -e HZ= -e HIGH_RES /proc/config.gz > CONFIG_HIGH_RES_TIMERS=y > CONFIG_HZ=250 > > For instance, the "ip route rtt <time>" and > rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i > look incorrect: > > $ PROC_NET_TCP=/proc/net/tcp ss -i > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 rto:5.5e-08 > [...] > > $ ss -i > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 > bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792 > > > $ ip route show > 10.95.131.111 via 10.95.128.1 dev eth0 rtt 5ms > > $ netstat -rn > Kernel IP routing table > Destination Gateway Genmask Flags MSS Window irtt Iface > 10.95.131.111 10.95.128.1 255.255.255.255 UGH 0 0 5000000 eth0 > > I think this is because get_hz gets the HZ value from the fourth > field in /proc/net/psched. That's 1e9 in my case, because that's > meant to be the frequency of CLOCK_MONOTONIC and when > CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is > 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case. > > I don't know how to get the right value, clock_getres return 1ns > as well. > > That's with 2.6.24.2, iproute-20080108 from debian (but the GIT > head code looks the same). > > Best regards, > Stéphane Several places in iproute utils confuse psched and user hz values. I am looking into it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* route metrics in jiffies?? 2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger @ 2008-05-21 18:43 ` Stephen Hemminger 2008-05-21 20:31 ` David Miller 2008-05-22 10:36 ` rtt metric only for incoming connections? Stephane Chazelas 0 siblings, 2 replies; 12+ messages in thread From: Stephen Hemminger @ 2008-05-21 18:43 UTC (permalink / raw) To: David Miller; +Cc: Stephane Chazelas, netdev On Wed, 21 May 2008 10:10:53 -0700 Stephen Hemminger <shemminger@vyatta.com> wrote: > On Wed, 21 May 2008 17:38:15 +0100 > Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote: > > > Hi Stephen, > > > > there seems to be something wrong with the timer values as > > processed by at least the "ss" and "ip" commands when > > CONFIG_HIGH_RES_TIMERS is on. > > > > $ zgrep -e HZ= -e HIGH_RES /proc/config.gz > > CONFIG_HIGH_RES_TIMERS=y > > CONFIG_HZ=250 > > > > For instance, the "ip route rtt <time>" and > > rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i > > look incorrect: > > > > $ PROC_NET_TCP=/proc/net/tcp ss -i > > State Recv-Q Send-Q Local Address:Port Peer Address:Port > > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 rto:5.5e-08 > > [...] > > > > $ ss -i > > State Recv-Q Send-Q Local Address:Port Peer Address:Port > > ESTAB 0 0 127.0.0.1:35466 127.0.0.1:10198 > > bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792 > > > > > > $ ip route show > > 10.95.131.111 via 10.95.128.1 dev eth0 rtt 5ms > > > > $ netstat -rn > > Kernel IP routing table > > Destination Gateway Genmask Flags MSS Window irtt Iface > > 10.95.131.111 10.95.128.1 255.255.255.255 UGH 0 0 5000000 eth0 > > > > I think this is because get_hz gets the HZ value from the fourth > > field in /proc/net/psched. That's 1e9 in my case, because that's > > meant to be the frequency of CLOCK_MONOTONIC and when > > CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is > > 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case. > > > > I don't know how to get the right value, clock_getres return 1ns > > as well. > > > > That's with 2.6.24.2, iproute-20080108 from debian (but the GIT > > head code looks the same). > > > > Best regards, > > Stéphane > > Several places in iproute utils confuse psched and user hz values. > I am looking into it. There is an even bigger mess up. The API for route metrics has several values encoded in jiffies. This is a problem because there is no good way to find the internal kernel value of HZ. So all kernel/user ABI's are supposed to use an absolute value (like milliseconds) or clock_t which user USER_HZ. The problem is that these values are now hardcoded into people's systems so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken. They might be lucky now (but I doubt it). I propose doing the right thing and fixing kernel and iproute to always use milliseconds for these values. To maintain compatibility, the new metric values will be renumbered. So old kernels don't misinterpret the new values. --------------------- --- a/include/linux/rtnetlink.h 2008-05-21 11:26:47.000000000 -0700 +++ b/include/linux/rtnetlink.h 2008-05-21 11:28:48.000000000 -0700 @@ -343,10 +343,8 @@ enum #define RTAX_MTU RTAX_MTU RTAX_WINDOW, #define RTAX_WINDOW RTAX_WINDOW - RTAX_RTT, -#define RTAX_RTT RTAX_RTT - RTAX_RTTVAR, -#define RTAX_RTTVAR RTAX_RTTVAR + RTAX_RTT_OLD, + RTAX_RTTVAR_OLD, RTAX_SSTHRESH, #define RTAX_SSTHRESH RTAX_SSTHRESH RTAX_CWND, @@ -361,8 +359,14 @@ enum #define RTAX_INITCWND RTAX_INITCWND RTAX_FEATURES, #define RTAX_FEATURES RTAX_FEATURES + RTAX_RTO_MIN_OLD, + RTAX_RTO_MIN, #define RTAX_RTO_MIN RTAX_RTO_MIN + RTAX_RTT, +#define RTAX_RTT RTAX_RTT + RTAX_RTTVAR, +#define RTAX_RTTVAR RTAX_RTTVAR __RTAX_MAX }; --- a/net/ipv4/tcp_input.c 2008-05-21 11:31:23.000000000 -0700 +++ b/net/ipv4/tcp_input.c 2008-05-21 11:40:29.000000000 -0700 @@ -730,7 +730,7 @@ void tcp_update_metrics(struct sock *sk) if (dst && (dst->flags & DST_HOST)) { const struct inet_connection_sock *icsk = inet_csk(sk); - int m; + long m; if (icsk->icsk_backoff || !tp->srtt) { /* This session failed to estimate rtt. Why? @@ -742,7 +742,7 @@ void tcp_update_metrics(struct sock *sk) return; } - m = dst_metric(dst, RTAX_RTT) - tp->srtt; + m = msecs_to_jiffies(dst_metric(dst, RTAX_RTT)) - tp->srtt; /* If newly calculated rtt larger than stored one, * store new one. Otherwise, use EWMA. Remember, @@ -750,9 +750,9 @@ void tcp_update_metrics(struct sock *sk) */ if (!(dst_metric_locked(dst, RTAX_RTT))) { if (m <= 0) - dst->metrics[RTAX_RTT - 1] = tp->srtt; + dst->metrics[RTAX_RTT - 1] = jiffies_to_msecs(tp->srtt); else - dst->metrics[RTAX_RTT - 1] -= (m >> 3); + dst->metrics[RTAX_RTT - 1] -= jiffies_to_msecs(m >> 3); } if (!(dst_metric_locked(dst, RTAX_RTTVAR))) { @@ -765,10 +765,11 @@ void tcp_update_metrics(struct sock *sk) m = tp->mdev; if (m >= dst_metric(dst, RTAX_RTTVAR)) - dst->metrics[RTAX_RTTVAR - 1] = m; + dst->metrics[RTAX_RTTVAR - 1] = jiffies_to_msecs(m); else dst->metrics[RTAX_RTTVAR-1] -= - (dst_metric(dst, RTAX_RTTVAR) - m)>>2; + jiffies_to_msecs((dst_metric(dst, RTAX_RTTVAR) + - m) >> 2); } if (tp->snd_ssthresh >= 0xFFFF) { @@ -899,7 +900,7 @@ static void tcp_init_metrics(struct sock if (dst_metric(dst, RTAX_RTT) == 0) goto reset; - if (!tp->srtt && dst_metric(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3)) + if (!tp->srtt && dst_metric(dst, RTAX_RTT) < jiffies_to_msecs(TCP_TIMEOUT_INIT << 3)) goto reset; /* Initial rtt is determined from SYN,SYN-ACK. @@ -916,12 +917,12 @@ static void tcp_init_metrics(struct sock * to low value, and then abruptly stops to do it and starts to delay * ACKs, wait for troubles. */ - if (dst_metric(dst, RTAX_RTT) > tp->srtt) { - tp->srtt = dst_metric(dst, RTAX_RTT); + if (dst_metric(dst, RTAX_RTT) > jiffies_to_msecs(tp->srtt)) { + tp->srtt = msecs_to_jiffies(dst_metric(dst, RTAX_RTT)); tp->rtt_seq = tp->snd_nxt; } - if (dst_metric(dst, RTAX_RTTVAR) > tp->mdev) { - tp->mdev = dst_metric(dst, RTAX_RTTVAR); + if (dst_metric(dst, RTAX_RTTVAR) > jiffies_to_msecs(tp->mdev)) { + tp->mdev = msecs_to_jiffies(dst_metric(dst, RTAX_RTTVAR)); tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk)); } tcp_set_rto(sk); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: route metrics in jiffies?? 2008-05-21 18:43 ` route metrics in jiffies?? Stephen Hemminger @ 2008-05-21 20:31 ` David Miller 2008-05-22 10:36 ` rtt metric only for incoming connections? Stephane Chazelas 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2008-05-21 20:31 UTC (permalink / raw) To: shemminger; +Cc: Stephane_Chazelas, netdev From: Stephen Hemminger <shemminger@vyatta.com> Date: Wed, 21 May 2008 11:43:54 -0700 > There is an even bigger mess up. The API for route metrics has several > values encoded in jiffies. This is a problem because there is no good > way to find the internal kernel value of HZ. So all kernel/user ABI's > are supposed to use an absolute value (like milliseconds) or clock_t > which user USER_HZ. > > The problem is that these values are now hardcoded into people's systems > so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken. > They might be lucky now (but I doubt it). > > I propose doing the right thing and fixing kernel and iproute to always > use milliseconds for these values. To maintain compatibility, the new metric > values will be renumbered. So old kernels don't misinterpret the new values. That is one way to solve the problem. But we could be adding surprises on a source level for people with this approach. Just use new names and leave the old ones alone, with a _MS or similar postfix to them. This is how we've handled this kind of situation in the past. ^ permalink raw reply [flat|nested] 12+ messages in thread
* rtt metric only for incoming connections? 2008-05-21 18:43 ` route metrics in jiffies?? Stephen Hemminger 2008-05-21 20:31 ` David Miller @ 2008-05-22 10:36 ` Stephane Chazelas 2008-05-27 18:43 ` Stephen Hemminger 1 sibling, 1 reply; 12+ messages in thread From: Stephane Chazelas @ 2008-05-22 10:36 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, netdev On Wed, May 21, 2008 at 11:43:54AM -0700, Stephen Hemminger wrote: [...] > The problem is that these values are now hardcoded into people's systems > so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken. > They might be lucky now (but I doubt it). [...] Hi Stephen, all a slightly related question: it seems that the "rtt" parameter provided in "ip route ... rtt <value>" is not taken into account for the retransmission of SYNs while it is for the retransmissions of SYN+ACKs, why would that be (2.6.24.2)? Also, it seems we can't lower the initial RTO below the RFC 1122 default of 3 seconds. 3 seconds may be appropriate for a host for which we don't know how many hops, links, satellites are needed to reach it, but what about local/corporate networks where it's possible to administratively know the rtt so that it can be hardcoded in the routing table. For instance, on the office wireless network, I know the average rtt is below the ms. Some SYNs may be lost, but they can't be delayed more than a few hundred ms. So, I may want to specify in the route to that network, the initial and maximum rto, so that a down host can be detected in less than a second. The delay before the first retransmission is 3 seconds at the moment. That value is often more than what some applications are ready to wait for (applications that are meant to be run locally for instance). So, it's a shame, because the application will timeout on the connect even before the first retransmission, so the SYN retransmission mechanism is useless in that case. Or is it because there's a risk of congesting the internet if people misuse that? Note that applications can always reattempt a connect to work around that (for SYNs to be sent more often). It would be nice if what the "rtt" exactly is could be clarified. For instance, if I understand correctly, by default, the initial rtt is 0 and the rttvar 3s, which results in a rto of 3s. That "rtt" is the smoothed rtt, right? (I think the "route" man page from net-tools is incorrect about that, BTW.), but then when setting those variables per route, it's the RTT that can't be made lower than 3s, while rttvar can be as low as rto_min (200ms by default). It's all very confusing (well, I'm very confused ;). regards, Stéphane ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtt metric only for incoming connections? 2008-05-22 10:36 ` rtt metric only for incoming connections? Stephane Chazelas @ 2008-05-27 18:43 ` Stephen Hemminger 2008-05-27 18:53 ` Rick Jones 0 siblings, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-05-27 18:43 UTC (permalink / raw) To: Stephane Chazelas; +Cc: Stephen Hemminger, David Miller, netdev On Thu, 22 May 2008 11:36:10 +0100 Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote: > On Wed, May 21, 2008 at 11:43:54AM -0700, Stephen Hemminger wrote: > [...] > > The problem is that these values are now hardcoded into people's systems > > so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken. > > They might be lucky now (but I doubt it). > [...] > > Hi Stephen, all > > a slightly related question: > > it seems that the "rtt" parameter provided in "ip route ... rtt > <value>" is not taken into account for the retransmission of > SYNs while it is for the retransmissions of SYN+ACKs, why would > that be (2.6.24.2)? > > Also, it seems we can't lower the initial RTO below the RFC 1122 > default of 3 seconds. 3 seconds may be appropriate for a host > for which we don't know how many hops, links, satellites are > needed to reach it, but what about local/corporate networks > where it's possible to administratively know the rtt so that it > can be hardcoded in the routing table. Violating RFC's is not really that useful. If you have a network dropping SYN packets regularly than there are worse problems. > For instance, on the office wireless network, I know the average > rtt is below the ms. Some SYNs may be lost, but they can't be > delayed more than a few hundred ms. So, I may want to specify in > the route to that network, the initial and maximum rto, so that > a down host can be detected in less than a second. > > The delay before the first retransmission is 3 seconds at the > moment. That value is often more than what some applications are > ready to wait for (applications that are meant to be run locally > for instance). So, it's a shame, because the application will > timeout on the connect even before the first retransmission, so > the SYN retransmission mechanism is useless in that case. Relying on TCP to overcome wireless network problems is not a good idea. > Or is it because there's a risk of congesting the internet if > people misuse that? Note that applications can always reattempt > a connect to work around that (for SYNs to be sent more often). The problem is that distributions can't even get the settings right now. It would be too easy for some distribution to ship with a default small value. > It would be nice if what the "rtt" exactly is could be > clarified. For instance, if I understand correctly, by default, > the initial rtt is 0 and the rttvar 3s, which results in a rto > of 3s. That "rtt" is the smoothed rtt, right? (I think the > "route" man page from net-tools is incorrect about that, BTW.), > but then when setting those variables per route, it's the RTT > that can't be made lower than 3s, while rttvar can be as low as > rto_min (200ms by default). It's all very confusing (well, I'm > very confused ;). > > regards, > Stéphane RTT is used as a starting point for the smoothed round trip time. As soon as the first ack comes back the value starts to change. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtt metric only for incoming connections? 2008-05-27 18:43 ` Stephen Hemminger @ 2008-05-27 18:53 ` Rick Jones 0 siblings, 0 replies; 12+ messages in thread From: Rick Jones @ 2008-05-27 18:53 UTC (permalink / raw) To: Stephen Hemminger Cc: Stephane Chazelas, Stephen Hemminger, David Miller, netdev >>Also, it seems we can't lower the initial RTO below the RFC 1122 >>default of 3 seconds. 3 seconds may be appropriate for a host >>for which we don't know how many hops, links, satellites are >>needed to reach it, but what about local/corporate networks >>where it's possible to administratively know the rtt so that it >>can be hardcoded in the routing table. > > > Violating RFC's is not really that useful. Yet the RFC's are not stone tablets, and they often represent a "compromise" between things desirable for the great big internet and those someone with a bounded network might have. > If you have a network dropping SYN packets regularly than there are > worse problems. That is entirely plausible. > Relying on TCP to overcome wireless network problems is not > a good idea. How is it any worse than relying on TCP to overcome network congestion problems?-) rick jones ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-06-03 23:03 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas 2008-05-21 16:56 ` Patrick McHardy 2008-05-21 17:40 ` [PATCH] net: neighbour table ABI problem Stephen Hemminger 2008-05-21 20:35 ` David Miller 2008-05-22 0:20 ` Thomas Graf 2008-06-03 23:03 ` David Miller 2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger 2008-05-21 18:43 ` route metrics in jiffies?? Stephen Hemminger 2008-05-21 20:31 ` David Miller 2008-05-22 10:36 ` rtt metric only for incoming connections? Stephane Chazelas 2008-05-27 18:43 ` Stephen Hemminger 2008-05-27 18:53 ` Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).