netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS
@ 2008-05-21 16:38 Stephane Chazelas
  2008-05-21 16:56 ` Patrick McHardy
  2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger
  0 siblings, 2 replies; 12+ messages in thread
From: Stephane Chazelas @ 2008-05-21 16:38 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

Hi Stephen,

there seems to be something wrong with the timer values as
processed by at least the "ss" and "ip" commands when
CONFIG_HIGH_RES_TIMERS is on.

$ zgrep -e HZ= -e HIGH_RES /proc/config.gz
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HZ=250

For instance, the "ip route rtt <time>" and 
rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i
look incorrect:

$ PROC_NET_TCP=/proc/net/tcp ss -i
State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198    rto:5.5e-08
[...]

$ ss -i
State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198
        bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792


$ ip route show
10.95.131.111 via 10.95.128.1 dev eth0  rtt 5ms

$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
10.95.131.111   10.95.128.1     255.255.255.255 UGH       0 0     5000000 eth0

I think this is because get_hz gets the HZ value from the fourth
field in /proc/net/psched. That's 1e9 in my case, because that's
meant to be the frequency of CLOCK_MONOTONIC and when
CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is
1ns. So get_hz returns 1,000,000,000 instead of 250 in my case.

I don't know how to get the right value, clock_getres return 1ns
as well.

That's with 2.6.24.2, iproute-20080108 from debian (but the GIT
head code looks the same).

Best regards,
Stéphane

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS
  2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas
@ 2008-05-21 16:56 ` Patrick McHardy
  2008-05-21 17:40   ` [PATCH] net: neighbour table ABI problem Stephen Hemminger
  2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger
  1 sibling, 1 reply; 12+ messages in thread
From: Patrick McHardy @ 2008-05-21 16:56 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: shemminger, netdev

Stephane Chazelas wrote:
> Hi Stephen,
> 
> there seems to be something wrong with the timer values as
> processed by at least the "ss" and "ip" commands when
> CONFIG_HIGH_RES_TIMERS is on.
> 
> $ zgrep -e HZ= -e HIGH_RES /proc/config.gz
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_HZ=250
> 
> For instance, the "ip route rtt <time>" and 
> rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i
> look incorrect:
> 
> $ PROC_NET_TCP=/proc/net/tcp ss -i
> State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198    rto:5.5e-08
> [...]
> 
> $ ss -i
> State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198
>         bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792
> 
> 
> $ ip route show
> 10.95.131.111 via 10.95.128.1 dev eth0  rtt 5ms
> 
> $ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> 10.95.131.111   10.95.128.1     255.255.255.255 UGH       0 0     5000000 eth0
> 
> I think this is because get_hz gets the HZ value from the fourth
> field in /proc/net/psched. That's 1e9 in my case, because that's
> meant to be the frequency of CLOCK_MONOTONIC and when
> CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is
> 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case.
> 
> I don't know how to get the right value, clock_getres return 1ns
> as well.
> 
> That's with 2.6.24.2, iproute-20080108 from debian (but the GIT
> head code looks the same).


Both ss and ip route shouldn't be using get_hz(). inet_diag
is using fixed units anyway, the routing metrics should to.

/proc/net/psched is *only* for packet schedulers and they
want to know the real clock resolution.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS
  2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas
  2008-05-21 16:56 ` Patrick McHardy
@ 2008-05-21 17:10 ` Stephen Hemminger
  2008-05-21 18:43   ` route metrics in jiffies?? Stephen Hemminger
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2008-05-21 17:10 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: netdev

On Wed, 21 May 2008 17:38:15 +0100
Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote:

> Hi Stephen,
> 
> there seems to be something wrong with the timer values as
> processed by at least the "ss" and "ip" commands when
> CONFIG_HIGH_RES_TIMERS is on.
> 
> $ zgrep -e HZ= -e HIGH_RES /proc/config.gz
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_HZ=250
> 
> For instance, the "ip route rtt <time>" and 
> rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i
> look incorrect:
> 
> $ PROC_NET_TCP=/proc/net/tcp ss -i
> State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198    rto:5.5e-08
> [...]
> 
> $ ss -i
> State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198
>         bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792
> 
> 
> $ ip route show
> 10.95.131.111 via 10.95.128.1 dev eth0  rtt 5ms
> 
> $ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> 10.95.131.111   10.95.128.1     255.255.255.255 UGH       0 0     5000000 eth0
> 
> I think this is because get_hz gets the HZ value from the fourth
> field in /proc/net/psched. That's 1e9 in my case, because that's
> meant to be the frequency of CLOCK_MONOTONIC and when
> CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is
> 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case.
> 
> I don't know how to get the right value, clock_getres return 1ns
> as well.
> 
> That's with 2.6.24.2, iproute-20080108 from debian (but the GIT
> head code looks the same).
> 
> Best regards,
> Stéphane

Several places in iproute utils confuse psched and user hz values.
I am looking into it.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] net: neighbour table ABI problem
  2008-05-21 16:56 ` Patrick McHardy
@ 2008-05-21 17:40   ` Stephen Hemminger
  2008-05-21 20:35     ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2008-05-21 17:40 UTC (permalink / raw)
  To: Patrick McHardy, Thomas Graf; +Cc: Stephane Chazelas, netdev

The neighbor table time of last use information is returned in the incorrect
unit. Kernel to user space ABI's need to use USER_HZ (or milliseconds), otherwise
the application has to try and discover the real system HZ value which is problematic.
Linux has standardized on keeping USER_HZ consistent (100hz) even when kernel is
running internally at some other value.

This change is small, but it breaks the ABI for older version of iproute2 utilities.
But these utilities are already broken since they are looking at the psched_hz values
which are completely different. So let's just go ahead and fix both kernel and user
space. Older utilities will just print wrong values.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/neighbour.c	2008-05-21 10:23:05.000000000 -0700
+++ b/net/core/neighbour.c	2008-05-21 10:28:09.000000000 -0700
@@ -2057,9 +2057,9 @@ static int neigh_fill_info(struct sk_buf
 		goto nla_put_failure;
 	}
 
-	ci.ndm_used	 = now - neigh->used;
-	ci.ndm_confirmed = now - neigh->confirmed;
-	ci.ndm_updated	 = now - neigh->updated;
+	ci.ndm_used	 = jiffies_to_clock_t(now - neigh->used);
+	ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
+	ci.ndm_updated	 = jiffies_to_clock_t(now - neigh->updated);
 	ci.ndm_refcnt	 = atomic_read(&neigh->refcnt) - 1;
 	read_unlock_bh(&neigh->lock);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* route metrics in jiffies??
  2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger
@ 2008-05-21 18:43   ` Stephen Hemminger
  2008-05-21 20:31     ` David Miller
  2008-05-22 10:36     ` rtt metric only for incoming connections? Stephane Chazelas
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Hemminger @ 2008-05-21 18:43 UTC (permalink / raw)
  To: David Miller; +Cc: Stephane Chazelas, netdev

On Wed, 21 May 2008 10:10:53 -0700
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Wed, 21 May 2008 17:38:15 +0100
> Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote:
> 
> > Hi Stephen,
> > 
> > there seems to be something wrong with the timer values as
> > processed by at least the "ss" and "ip" commands when
> > CONFIG_HIGH_RES_TIMERS is on.
> > 
> > $ zgrep -e HZ= -e HIGH_RES /proc/config.gz
> > CONFIG_HIGH_RES_TIMERS=y
> > CONFIG_HZ=250
> > 
> > For instance, the "ip route rtt <time>" and 
> > rto values returned by PROC_NET_TCP=/proc/net/tcp ss -i
> > look incorrect:
> > 
> > $ PROC_NET_TCP=/proc/net/tcp ss -i
> > State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> > ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198    rto:5.5e-08
> > [...]
> > 
> > $ ss -i
> > State       Recv-Q Send-Q  Local Address:Port      Peer Address:Port
> > ESTAB       0      0           127.0.0.1:35466        127.0.0.1:10198
> >         bic wscale:7,7 rto:220 rtt:23.5/19 send 11.2Mbps rcv_space:32792
> > 
> > 
> > $ ip route show
> > 10.95.131.111 via 10.95.128.1 dev eth0  rtt 5ms
> > 
> > $ netstat -rn
> > Kernel IP routing table
> > Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> > 10.95.131.111   10.95.128.1     255.255.255.255 UGH       0 0     5000000 eth0
> > 
> > I think this is because get_hz gets the HZ value from the fourth
> > field in /proc/net/psched. That's 1e9 in my case, because that's
> > meant to be the frequency of CLOCK_MONOTONIC and when
> > CONFIG_HIGH_RES_TIMERS is on, the CLOCK_MONOTONIC resolution is
> > 1ns. So get_hz returns 1,000,000,000 instead of 250 in my case.
> > 
> > I don't know how to get the right value, clock_getres return 1ns
> > as well.
> > 
> > That's with 2.6.24.2, iproute-20080108 from debian (but the GIT
> > head code looks the same).
> > 
> > Best regards,
> > Stéphane
> 
> Several places in iproute utils confuse psched and user hz values.
> I am looking into it.


There is an even bigger mess up.  The API for route metrics has several
values encoded in jiffies.  This is a problem because there is no good
way to find the internal kernel value of HZ. So all kernel/user ABI's
are supposed to use an absolute value (like milliseconds) or clock_t
which user USER_HZ.

The problem is that these values are now hardcoded into people's systems
so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken.
They might be lucky now (but I doubt it).

I propose doing the right thing and fixing kernel and iproute to always
use milliseconds for these values. To maintain compatibility, the new metric
values will be renumbered.  So old kernels don't misinterpret the new values.

---------------------
--- a/include/linux/rtnetlink.h	2008-05-21 11:26:47.000000000 -0700
+++ b/include/linux/rtnetlink.h	2008-05-21 11:28:48.000000000 -0700
@@ -343,10 +343,8 @@ enum
 #define RTAX_MTU RTAX_MTU
 	RTAX_WINDOW,
 #define RTAX_WINDOW RTAX_WINDOW
-	RTAX_RTT,
-#define RTAX_RTT RTAX_RTT
-	RTAX_RTTVAR,
-#define RTAX_RTTVAR RTAX_RTTVAR
+	RTAX_RTT_OLD,
+	RTAX_RTTVAR_OLD,
 	RTAX_SSTHRESH,
 #define RTAX_SSTHRESH RTAX_SSTHRESH
 	RTAX_CWND,
@@ -361,8 +359,14 @@ enum
 #define RTAX_INITCWND RTAX_INITCWND
 	RTAX_FEATURES,
 #define RTAX_FEATURES RTAX_FEATURES
+	RTAX_RTO_MIN_OLD,
+
 	RTAX_RTO_MIN,
 #define RTAX_RTO_MIN RTAX_RTO_MIN
+	RTAX_RTT,
+#define RTAX_RTT RTAX_RTT
+	RTAX_RTTVAR,
+#define RTAX_RTTVAR RTAX_RTTVAR
 	__RTAX_MAX
 };
 
--- a/net/ipv4/tcp_input.c	2008-05-21 11:31:23.000000000 -0700
+++ b/net/ipv4/tcp_input.c	2008-05-21 11:40:29.000000000 -0700
@@ -730,7 +730,7 @@ void tcp_update_metrics(struct sock *sk)
 
 	if (dst && (dst->flags & DST_HOST)) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
-		int m;
+		long m;
 
 		if (icsk->icsk_backoff || !tp->srtt) {
 			/* This session failed to estimate rtt. Why?
@@ -742,7 +742,7 @@ void tcp_update_metrics(struct sock *sk)
 			return;
 		}
 
-		m = dst_metric(dst, RTAX_RTT) - tp->srtt;
+		m = msecs_to_jiffies(dst_metric(dst, RTAX_RTT)) - tp->srtt;
 
 		/* If newly calculated rtt larger than stored one,
 		 * store new one. Otherwise, use EWMA. Remember,
@@ -750,9 +750,9 @@ void tcp_update_metrics(struct sock *sk)
 		 */
 		if (!(dst_metric_locked(dst, RTAX_RTT))) {
 			if (m <= 0)
-				dst->metrics[RTAX_RTT - 1] = tp->srtt;
+				dst->metrics[RTAX_RTT - 1] = jiffies_to_msecs(tp->srtt);
 			else
-				dst->metrics[RTAX_RTT - 1] -= (m >> 3);
+				dst->metrics[RTAX_RTT - 1] -= jiffies_to_msecs(m >> 3);
 		}
 
 		if (!(dst_metric_locked(dst, RTAX_RTTVAR))) {
@@ -765,10 +765,11 @@ void tcp_update_metrics(struct sock *sk)
 				m = tp->mdev;
 
 			if (m >= dst_metric(dst, RTAX_RTTVAR))
-				dst->metrics[RTAX_RTTVAR - 1] = m;
+				dst->metrics[RTAX_RTTVAR - 1] = jiffies_to_msecs(m);
 			else
 				dst->metrics[RTAX_RTTVAR-1] -=
-					(dst_metric(dst, RTAX_RTTVAR) - m)>>2;
+					jiffies_to_msecs((dst_metric(dst, RTAX_RTTVAR)
+							  - m) >> 2);
 		}
 
 		if (tp->snd_ssthresh >= 0xFFFF) {
@@ -899,7 +900,7 @@ static void tcp_init_metrics(struct sock
 	if (dst_metric(dst, RTAX_RTT) == 0)
 		goto reset;
 
-	if (!tp->srtt && dst_metric(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3))
+	if (!tp->srtt && dst_metric(dst, RTAX_RTT) < jiffies_to_msecs(TCP_TIMEOUT_INIT << 3))
 		goto reset;
 
 	/* Initial rtt is determined from SYN,SYN-ACK.
@@ -916,12 +917,12 @@ static void tcp_init_metrics(struct sock
 	 * to low value, and then abruptly stops to do it and starts to delay
 	 * ACKs, wait for troubles.
 	 */
-	if (dst_metric(dst, RTAX_RTT) > tp->srtt) {
-		tp->srtt = dst_metric(dst, RTAX_RTT);
+	if (dst_metric(dst, RTAX_RTT) > jiffies_to_msecs(tp->srtt)) {
+		tp->srtt = msecs_to_jiffies(dst_metric(dst, RTAX_RTT));
 		tp->rtt_seq = tp->snd_nxt;
 	}
-	if (dst_metric(dst, RTAX_RTTVAR) > tp->mdev) {
-		tp->mdev = dst_metric(dst, RTAX_RTTVAR);
+	if (dst_metric(dst, RTAX_RTTVAR) > jiffies_to_msecs(tp->mdev)) {
+		tp->mdev = msecs_to_jiffies(dst_metric(dst, RTAX_RTTVAR));
 		tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
 	}
 	tcp_set_rto(sk);




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: route metrics in jiffies??
  2008-05-21 18:43   ` route metrics in jiffies?? Stephen Hemminger
@ 2008-05-21 20:31     ` David Miller
  2008-05-22 10:36     ` rtt metric only for incoming connections? Stephane Chazelas
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2008-05-21 20:31 UTC (permalink / raw)
  To: shemminger; +Cc: Stephane_Chazelas, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 21 May 2008 11:43:54 -0700

> There is an even bigger mess up.  The API for route metrics has several
> values encoded in jiffies.  This is a problem because there is no good
> way to find the internal kernel value of HZ. So all kernel/user ABI's
> are supposed to use an absolute value (like milliseconds) or clock_t
> which user USER_HZ.
> 
> The problem is that these values are now hardcoded into people's systems
> so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken.
> They might be lucky now (but I doubt it).
> 
> I propose doing the right thing and fixing kernel and iproute to always
> use milliseconds for these values. To maintain compatibility, the new metric
> values will be renumbered.  So old kernels don't misinterpret the new values.

That is one way to solve the problem.  But we could be adding
surprises on a source level for people with this approach.

Just use new names and leave the old ones alone, with a _MS or similar
postfix to them.

This is how we've handled this kind of situation in the past.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: neighbour table ABI problem
  2008-05-21 17:40   ` [PATCH] net: neighbour table ABI problem Stephen Hemminger
@ 2008-05-21 20:35     ` David Miller
  2008-05-22  0:20       ` Thomas Graf
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-05-21 20:35 UTC (permalink / raw)
  To: shemminger; +Cc: kaber, tgraf, Stephane_Chazelas, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 21 May 2008 10:40:19 -0700

> The neighbor table time of last use information is returned in the incorrect
> unit. Kernel to user space ABI's need to use USER_HZ (or milliseconds), otherwise
> the application has to try and discover the real system HZ value which is problematic.
> Linux has standardized on keeping USER_HZ consistent (100hz) even when kernel is
> running internally at some other value.
> 
> This change is small, but it breaks the ABI for older version of iproute2 utilities.
> But these utilities are already broken since they are looking at the psched_hz values
> which are completely different. So let's just go ahead and fix both kernel and user
> space. Older utilities will just print wrong values.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

In at least one sense the kernel has been providing a consistent
value :-)

I don't know what to do here, it's different from the other patch
you posted today in that I can't see any easy way to not change
behavior for old stuff.

Can we add a new attribute or something like that?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: neighbour table ABI problem
  2008-05-21 20:35     ` David Miller
@ 2008-05-22  0:20       ` Thomas Graf
  2008-06-03 23:03         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Graf @ 2008-05-22  0:20 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, kaber, Stephane_Chazelas, netdev

* David Miller <davem@davemloft.net> 2008-05-21 13:35
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 21 May 2008 10:40:19 -0700
> 
> > This change is small, but it breaks the ABI for older version of iproute2 utilities.
> > But these utilities are already broken since they are looking at the psched_hz values
> > which are completely different. So let's just go ahead and fix both kernel and user
> > space. Older utilities will just print wrong values.
> > 
> 
> Can we add a new attribute or something like that?

We could do but I agree with Stephen to just fix it the way he
proposes. The value we're talking is only useful in a debugging
or statistical context. We're not changing the format of the
attribute at all, not even the unit, srictly speaking.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* rtt metric only for incoming connections?
  2008-05-21 18:43   ` route metrics in jiffies?? Stephen Hemminger
  2008-05-21 20:31     ` David Miller
@ 2008-05-22 10:36     ` Stephane Chazelas
  2008-05-27 18:43       ` Stephen Hemminger
  1 sibling, 1 reply; 12+ messages in thread
From: Stephane Chazelas @ 2008-05-22 10:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

On Wed, May 21, 2008 at 11:43:54AM -0700, Stephen Hemminger wrote:
[...]
> The problem is that these values are now hardcoded into people's systems
> so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken.
> They might be lucky now (but I doubt it).
[...]

Hi Stephen, all

a slightly related question:

it seems that the "rtt" parameter provided in "ip route ... rtt
<value>" is not taken into account for the retransmission of
SYNs while it is for the retransmissions of SYN+ACKs, why would
that be (2.6.24.2)?

Also, it seems we can't lower the initial RTO below the RFC 1122
default of 3 seconds. 3 seconds may be appropriate for a host
for which we don't know how many hops, links, satellites are
needed to reach it, but what about local/corporate networks
where it's possible to administratively know the rtt so that it
can be hardcoded in the routing table.

For instance, on the office wireless network, I know the average
rtt is below the ms. Some SYNs may be lost, but they can't be
delayed more than a few hundred ms. So, I may want to specify in
the route to that network, the initial and maximum rto, so that
a down host can be detected in less than a second.

The delay before the first retransmission is 3 seconds at the
moment. That value is often more than what some applications are
ready to wait for (applications that are meant to be run locally
for instance). So, it's a shame, because the application will
timeout on the connect even before the first retransmission, so
the SYN retransmission mechanism is useless in that case.

Or is it because there's a risk of congesting the internet if
people misuse that? Note that applications can always reattempt
a connect to work around that (for SYNs to be sent more often).

It would be nice if what the "rtt" exactly is could be
clarified. For instance, if I understand correctly, by default,
the initial rtt is 0 and the rttvar 3s, which results in a rto
of 3s. That "rtt" is the smoothed rtt, right? (I think the
"route" man page from net-tools is incorrect about that, BTW.),
but then when setting those variables per route, it's the RTT
that can't be made lower than 3s, while rttvar can be as low as
rto_min (200ms by default). It's all very confusing (well, I'm
very confused ;).

regards,
Stéphane

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rtt metric only for incoming connections?
  2008-05-22 10:36     ` rtt metric only for incoming connections? Stephane Chazelas
@ 2008-05-27 18:43       ` Stephen Hemminger
  2008-05-27 18:53         ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2008-05-27 18:43 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: Stephen Hemminger, David Miller, netdev

On Thu, 22 May 2008 11:36:10 +0100
Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote:

> On Wed, May 21, 2008 at 11:43:54AM -0700, Stephen Hemminger wrote:
> [...]
> > The problem is that these values are now hardcoded into people's systems
> > so anyone using the 'ip route' options: rttvar, rtomin, or rtt are broken.
> > They might be lucky now (but I doubt it).
> [...]
> 
> Hi Stephen, all
> 
> a slightly related question:
> 
> it seems that the "rtt" parameter provided in "ip route ... rtt
> <value>" is not taken into account for the retransmission of
> SYNs while it is for the retransmissions of SYN+ACKs, why would
> that be (2.6.24.2)?
> 
> Also, it seems we can't lower the initial RTO below the RFC 1122
> default of 3 seconds. 3 seconds may be appropriate for a host
> for which we don't know how many hops, links, satellites are
> needed to reach it, but what about local/corporate networks
> where it's possible to administratively know the rtt so that it
> can be hardcoded in the routing table.

Violating RFC's is not really that useful. If you have a network
dropping SYN packets regularly than there are worse problems.

> For instance, on the office wireless network, I know the average
> rtt is below the ms. Some SYNs may be lost, but they can't be
> delayed more than a few hundred ms. So, I may want to specify in
> the route to that network, the initial and maximum rto, so that
> a down host can be detected in less than a second.
> 
> The delay before the first retransmission is 3 seconds at the
> moment. That value is often more than what some applications are
> ready to wait for (applications that are meant to be run locally
> for instance). So, it's a shame, because the application will
> timeout on the connect even before the first retransmission, so
> the SYN retransmission mechanism is useless in that case.

Relying on TCP to overcome wireless network problems is not
a good idea. 

> Or is it because there's a risk of congesting the internet if
> people misuse that? Note that applications can always reattempt
> a connect to work around that (for SYNs to be sent more often).

The problem is that distributions can't even get the settings right now.
It would be too easy for some distribution to ship with a default small
value.

> It would be nice if what the "rtt" exactly is could be
> clarified. For instance, if I understand correctly, by default,
> the initial rtt is 0 and the rttvar 3s, which results in a rto
> of 3s. That "rtt" is the smoothed rtt, right? (I think the
> "route" man page from net-tools is incorrect about that, BTW.),
> but then when setting those variables per route, it's the RTT
> that can't be made lower than 3s, while rttvar can be as low as
> rto_min (200ms by default). It's all very confusing (well, I'm
> very confused ;).
> 
> regards,
> Stéphane

RTT is used as a starting point for the smoothed round trip time.
As soon as the first ack comes back the value starts to change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rtt metric only for incoming connections?
  2008-05-27 18:43       ` Stephen Hemminger
@ 2008-05-27 18:53         ` Rick Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Rick Jones @ 2008-05-27 18:53 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephane Chazelas, Stephen Hemminger, David Miller, netdev

>>Also, it seems we can't lower the initial RTO below the RFC 1122
>>default of 3 seconds. 3 seconds may be appropriate for a host
>>for which we don't know how many hops, links, satellites are
>>needed to reach it, but what about local/corporate networks
>>where it's possible to administratively know the rtt so that it
>>can be hardcoded in the routing table.
> 
> 
> Violating RFC's is not really that useful. 

Yet the RFC's are not stone tablets, and they often represent a 
"compromise" between things desirable for the great big internet and 
those someone with a bounded network might have.

> If you have a network dropping SYN packets regularly than there are
> worse problems.

That is entirely plausible.

> Relying on TCP to overcome wireless network problems is not
> a good idea.

How is it any worse than relying on TCP to overcome network congestion 
problems?-)

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: neighbour table ABI problem
  2008-05-22  0:20       ` Thomas Graf
@ 2008-06-03 23:03         ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2008-06-03 23:03 UTC (permalink / raw)
  To: tgraf; +Cc: shemminger, kaber, Stephane_Chazelas, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 22 May 2008 02:20:42 +0200

> * David Miller <davem@davemloft.net> 2008-05-21 13:35
> > From: Stephen Hemminger <shemminger@vyatta.com>
> > Date: Wed, 21 May 2008 10:40:19 -0700
> > 
> > > This change is small, but it breaks the ABI for older version of iproute2 utilities.
> > > But these utilities are already broken since they are looking at the psched_hz values
> > > which are completely different. So let's just go ahead and fix both kernel and user
> > > space. Older utilities will just print wrong values.
> > > 
> > 
> > Can we add a new attribute or something like that?
> 
> We could do but I agree with Stephen to just fix it the way he
> proposes. The value we're talking is only useful in a debugging
> or statistical context. We're not changing the format of the
> attribute at all, not even the unit, srictly speaking.

Fair enough, I've applied Stephen's patch.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-06-03 23:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-21 16:38 [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephane Chazelas
2008-05-21 16:56 ` Patrick McHardy
2008-05-21 17:40   ` [PATCH] net: neighbour table ABI problem Stephen Hemminger
2008-05-21 20:35     ` David Miller
2008-05-22  0:20       ` Thomas Graf
2008-06-03 23:03         ` David Miller
2008-05-21 17:10 ` [iproute2] get_hz() with CONFIG_HIGH_RES_TIMERS Stephen Hemminger
2008-05-21 18:43   ` route metrics in jiffies?? Stephen Hemminger
2008-05-21 20:31     ` David Miller
2008-05-22 10:36     ` rtt metric only for incoming connections? Stephane Chazelas
2008-05-27 18:43       ` Stephen Hemminger
2008-05-27 18:53         ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).