netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
@ 2007-08-31  0:09 Rick Jones
  2007-08-31  0:39 ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31  0:09 UTC (permalink / raw)
  To: netdev

Enable configuration of the minimum TCP Retransmission Timeout via
a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
have quite variable RTTs avoid spurrious RTOs.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Lamont Jones <lamont@hp.com>
---

diff -r 06d7322848a3 Documentation/networking/ip-sysctl.txt
--- a/Documentation/networking/ip-sysctl.txt	Mon Aug 27 18:32:35 2007 -0700
+++ b/Documentation/networking/ip-sysctl.txt	Thu Aug 30 17:06:16 2007 -0700
@@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de
 	selected receiver buffers for TCP socket. This value does not override
 	net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
 	Default: 87380*2 bytes.
+
+tcp_rto_min - INTEGER
+	The minimum value for the TCP Retransmission Timeout, expressed
+	in milliseconds for the convenience of the user.
+	This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at
+	the high-end.	
+	Default: 200.
 
 tcp_sack - BOOLEAN
 	Enable select acknowledgments (SACKS).
diff -r 06d7322848a3 include/net/tcp.h
--- a/include/net/tcp.h	Mon Aug 27 18:32:35 2007 -0700
+++ b/include/net/tcp.h	Thu Aug 30 17:06:16 2007 -0700
@@ -232,6 +232,7 @@ extern int sysctl_tcp_workaround_signed_
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
 extern int sysctl_tcp_max_ssthresh;
+extern unsigned int sysctl_tcp_rto_min;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
diff -r 06d7322848a3 net/ipv4/sysctl_net_ipv4.c
--- a/net/ipv4/sysctl_net_ipv4.c	Mon Aug 27 18:32:35 2007 -0700
+++ b/net/ipv4/sysctl_net_ipv4.c	Thu Aug 30 17:06:16 2007 -0700
@@ -186,6 +186,32 @@ static int strategy_allowed_congestion_c
 
 }
 
+/* if there is ever a proc_dointvec_ms_jiffies_minmax we can get rid
+   of this routine */
+
+static int proc_tcp_rto_min(ctl_table *ctl, int write, struct file *filp,
+			    void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	u32 *valp = ctl->data;
+	u32 oldval = *valp;
+	int ret;
+
+	ret = proc_dointvec_ms_jiffies(ctl, write, filp, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	/* some bounds checking would be in order */   
+	if (write && *valp != oldval) {
+		if (*valp < TCP_RTO_MIN || *valp > TCP_RTO_MAX) {
+			*valp = oldval;
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+
 ctl_table ipv4_table[] = {
 	{
 		.ctl_name	= NET_IPV4_TCP_TIMESTAMPS,
@@ -819,6 +845,14 @@ ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "tcp_rto_min",
+		.data		= &sysctl_tcp_rto_min,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_tcp_rto_min
+	},
 	{ .ctl_name = 0 }
 };
 
diff -r 06d7322848a3 net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c	Mon Aug 27 18:32:35 2007 -0700
+++ b/net/ipv4/tcp_input.c	Thu Aug 30 17:06:16 2007 -0700
@@ -91,6 +91,8 @@ int sysctl_tcp_nometrics_save __read_mos
 
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_abc __read_mostly;
+
+unsigned int sysctl_tcp_rto_min __read_mostly = TCP_RTO_MIN;
 
 #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
 #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
@@ -616,13 +618,13 @@ static void tcp_rtt_estimator(struct soc
 			if (tp->mdev_max < tp->rttvar)
 				tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
 			tp->rtt_seq = tp->snd_nxt;
-			tp->mdev_max = TCP_RTO_MIN;
+			tp->mdev_max = sysctl_tcp_rto_min;
 		}
 	} else {
 		/* no previous measure. */
 		tp->srtt = m<<3;	/* take the measured time to be rtt */
 		tp->mdev = m<<1;	/* make sure rto = 3*rtt */
-		tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+		tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
 		tp->rtt_seq = tp->snd_nxt;
 	}
 }
@@ -851,7 +853,7 @@ static void tcp_init_metrics(struct sock
 	}
 	if (dst_metric(dst, RTAX_RTTVAR) > tp->mdev) {
 		tp->mdev = dst_metric(dst, RTAX_RTTVAR);
-		tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+		tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
 	}
 	tcp_set_rto(sk);
 	tcp_bound_rto(sk);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  0:09 [PATCH] make _minimum_ TCP retransmission timeout configurable take 2 Rick Jones
@ 2007-08-31  0:39 ` David Miller
  2007-08-31  1:07   ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31  0:39 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 30 Aug 2007 17:09:04 -0700 (PDT)

> Enable configuration of the minimum TCP Retransmission Timeout via
> a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
> have quite variable RTTs avoid spurrious RTOs.
> 
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> Signed-off-by: Lamont Jones <lamont@hp.com>

Thanks for doing this work Rick.

But as John Heffner and I both mentioned, it's pretty clear we should
do this as a routing metric.  Both for handling realistic scenerios
where the sysctl doesn't work, and to help prevent misuse (example:
someone decides that it would be _totally_ _awesome_ for "Carrier
Grade Linux" to set this to 3 seconds by default in /etc/sysctl.conf
and crap like that).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  0:39 ` David Miller
@ 2007-08-31  1:07   ` Rick Jones
  2007-08-31  4:52     ` John Heffner
  2007-08-31  5:09     ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Rick Jones @ 2007-08-31  1:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Thu, 30 Aug 2007 17:09:04 -0700 (PDT)
> 
> 
>>Enable configuration of the minimum TCP Retransmission Timeout via
>>a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
>>have quite variable RTTs avoid spurrious RTOs.
>>
>>Signed-off-by: Rick Jones <rick.jones2@hp.com>
>>Signed-off-by: Lamont Jones <lamont@hp.com>
> 
> 
> Thanks for doing this work Rick.
> 
> But as John Heffner and I both mentioned, it's pretty clear we should
> do this as a routing metric.  Both for handling realistic scenerios
> where the sysctl doesn't work, and to help prevent misuse (example:
> someone decides that it would be _totally_ _awesome_ for "Carrier
> Grade Linux" to set this to 3 seconds by default in /etc/sysctl.conf
> and crap like that).

If nothing else it was worth the practice :)  I'll be happy with either 
mechanism, just wasn't sure if the jury was still out on whether making 
it a routing metric was really necessary.  I can see where it would be 
goodness if one had separate paths out of a system, one with the highly 
variable RTT and one with non-trivial loss rates, just that thusfar I've 
not come across any :)  I've only seen one path with high RTT 
variability and the other path with trivial loss rates.

Also, not surprisingly, the folks for whom I'm doing this are a triffle 
"anxious" so I figured that simplicity was worthwhile.  Particularly if 
it was going to be the case those folks were going to be asking for 
back-ports.

Anyhow, I'll try grubbing around the source code (already doing that to 
see about writing a pet tcp cong module) but if pointers to the likely 
relevant files were available I could try to help thrash-out the routing 
metric version.  Like I said the consumers of this are a triffle well, 
"anxious" :)

rick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  1:07   ` Rick Jones
@ 2007-08-31  4:52     ` John Heffner
  2007-08-31 17:19       ` Rick Jones
  2007-08-31  5:09     ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: John Heffner @ 2007-08-31  4:52 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

Rick Jones wrote:
> Like I said the consumers of this are a triffle well, 
> "anxious" :)

Just curious, did you or this customer try with F-RTO enabled?  Or is 
this case you're dealing with truly hopeless?

   -John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  1:07   ` Rick Jones
  2007-08-31  4:52     ` John Heffner
@ 2007-08-31  5:09     ` David Miller
  2007-08-31 18:11       ` Rick Jones
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31  5:09 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 30 Aug 2007 18:07:13 -0700

> Anyhow, I'll try grubbing around the source code (already doing that to 
> see about writing a pet tcp cong module) but if pointers to the likely 
> relevant files were available I could try to help thrash-out the routing 
> metric version.  Like I said the consumers of this are a triffle well, 
> "anxious" :)

The change is actually a lot simpler than the sysctl version.

In fact it borders on trivial :-)

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c91476c..dff3192 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -351,6 +351,8 @@ enum
 #define RTAX_INITCWND RTAX_INITCWND
 	RTAX_FEATURES,
 #define RTAX_FEATURES RTAX_FEATURES
+	RTAX_RTO_MIN,
+#define RTAX_RTO_MIN RTAX_RTO_MIN
 	__RTAX_MAX
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9785df3..1ee7212 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -555,6 +555,16 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
 		tcp_grow_window(sk, skb);
 }
 
+static u32 tcp_rto_min(struct sock *sk)
+{
+	struct dst_entry *dst = __sk_dst_get(sk);
+	u32 rto_min = TCP_RTO_MIN;
+
+	if (dst_metric_locked(dst, RTAX_RTO_MIN))
+		rto_min = dst->metrics[RTAX_RTO_MIN-1];
+	return rto_min;
+}
+
 /* Called to compute a smoothed rtt estimate. The data fed to this
  * routine either comes from timestamps, or from segments that were
  * known _not_ to have been retransmitted [see Karn/Partridge
@@ -616,13 +626,13 @@ static void tcp_rtt_estimator(struct sock *sk, const __u32 mrtt)
 			if (tp->mdev_max < tp->rttvar)
 				tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
 			tp->rtt_seq = tp->snd_nxt;
-			tp->mdev_max = TCP_RTO_MIN;
+			tp->mdev_max = tcp_rto_min(sk);
 		}
 	} else {
 		/* no previous measure. */
 		tp->srtt = m<<3;	/* take the measured time to be rtt */
 		tp->mdev = m<<1;	/* make sure rto = 3*rtt */
-		tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+		tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
 		tp->rtt_seq = tp->snd_nxt;
 	}
 }



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  4:52     ` John Heffner
@ 2007-08-31 17:19       ` Rick Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Rick Jones @ 2007-08-31 17:19 UTC (permalink / raw)
  To: John Heffner; +Cc: netdev

John Heffner wrote:
> Rick Jones wrote:
> 
>> Like I said the consumers of this are a triffle well, "anxious" :)
> 
> 
> Just curious, did you or this customer try with F-RTO enabled?  Or is 
> this case you're dealing with truly hopeless?

F-RTO was mentioned to the customer and I'm awaiting their response as 
to its efficacy in their situation.  Everything I've seen thusfar is 
leading me to believe that we'll still need a higher than 200 
millisecond minimum rto though.

rick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31  5:09     ` David Miller
@ 2007-08-31 18:11       ` Rick Jones
  2007-08-31 18:57         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 18:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Thu, 30 Aug 2007 18:07:13 -0700
> 
> 
>>Anyhow, I'll try grubbing around the source code (already doing that to 
>>see about writing a pet tcp cong module) but if pointers to the likely 
>>relevant files were available I could try to help thrash-out the routing 
>>metric version.  Like I said the consumers of this are a triffle well, 
>>"anxious" :)
> 
> 
> The change is actually a lot simpler than the sysctl version.
> 
> In fact it borders on trivial :-)
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index c91476c..dff3192 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -351,6 +351,8 @@ enum
>  #define RTAX_INITCWND RTAX_INITCWND
>  	RTAX_FEATURES,
>  #define RTAX_FEATURES RTAX_FEATURES
> +	RTAX_RTO_MIN,
> +#define RTAX_RTO_MIN RTAX_RTO_MIN
>  	__RTAX_MAX
>  };
>  
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9785df3..1ee7212 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -555,6 +555,16 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
>  		tcp_grow_window(sk, skb);
>  }
>  
> +static u32 tcp_rto_min(struct sock *sk)
> +{
> +	struct dst_entry *dst = __sk_dst_get(sk);
> +	u32 rto_min = TCP_RTO_MIN;
> +
> +	if (dst_metric_locked(dst, RTAX_RTO_MIN))
> +		rto_min = dst->metrics[RTAX_RTO_MIN-1];
> +	return rto_min;
> +}
> +
>  /* Called to compute a smoothed rtt estimate. The data fed to this
>   * routine either comes from timestamps, or from segments that were
>   * known _not_ to have been retransmitted [see Karn/Partridge
> @@ -616,13 +626,13 @@ static void tcp_rtt_estimator(struct sock *sk, const __u32 mrtt)
>  			if (tp->mdev_max < tp->rttvar)
>  				tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
>  			tp->rtt_seq = tp->snd_nxt;
> -			tp->mdev_max = TCP_RTO_MIN;
> +			tp->mdev_max = tcp_rto_min(sk);
>  		}
>  	} else {
>  		/* no previous measure. */
>  		tp->srtt = m<<3;	/* take the measured time to be rtt */
>  		tp->mdev = m<<1;	/* make sure rto = 3*rtt */
> -		tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
> +		tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
>  		tp->rtt_seq = tp->snd_nxt;
>  	}
>  }

At the risk of showing my ignorance (what me worry about that?-) I 
presume this is then an interface expecting to take-in jiffies?  That 
means the user has to know the value of HZ which can be (IIRC) one of 
three different values?

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31 18:11       ` Rick Jones
@ 2007-08-31 18:57         ` David Miller
  2007-08-31 20:59           ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 18:57 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 11:11:37 -0700

> At the risk of showing my ignorance (what me worry about that?-) I 
> presume this is then an interface expecting to take-in jiffies?  That 
> means the user has to know the value of HZ which can be (IIRC) one of 
> three different values?

The iproute2 changes might look something like this:

--- ./include/linux/rtnetlink.h.orig	2007-08-31 11:55:30.000000000 -0700
+++ ./include/linux/rtnetlink.h	2007-08-31 11:52:22.000000000 -0700
@@ -351,6 +351,8 @@ enum
 #define RTAX_INITCWND RTAX_INITCWND
 	RTAX_FEATURES,
 #define RTAX_FEATURES RTAX_FEATURES
+	RTAX_RTO_MIN,
+#define RTAX_RTO_MIN RTAX_RTO_MIN
 	__RTAX_MAX
 };
 
--- ./ip/iproute.c.orig	2007-08-31 11:55:30.000000000 -0700
+++ ./ip/iproute.c	2007-08-31 11:53:29.000000000 -0700
@@ -51,6 +51,7 @@ static const char *mx_names[RTAX_MAX+1] 
 	[RTAX_HOPLIMIT] = "hoplimit",
 	[RTAX_INITCWND] = "initcwnd",
 	[RTAX_FEATURES] = "features",
+	[RTAX_RTO_MIN]	= "rto_min",
 };
 static void usage(void) __attribute__((noreturn));
 
@@ -74,6 +75,7 @@ static void usage(void)
 	fprintf(stderr, "           [ rtt NUMBER ] [ rttvar NUMBER ]\n");
 	fprintf(stderr, "           [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n");
 	fprintf(stderr, "           [ ssthresh NUMBER ] [ realms REALM ]\n");
+	fprintf(stderr, "           [ rto_min NUMBER ]\n");
 	fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n");
 	fprintf(stderr, "          unreachable | prohibit | blackhole | nat ]\n");
 	fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
@@ -520,7 +522,8 @@ int print_route(const struct sockaddr_nl
 			if (mxlock & (1<<i))
 				fprintf(fp, " lock");
 
-			if (i != RTAX_RTT && i != RTAX_RTTVAR)
+			if (i != RTAX_RTT && i != RTAX_RTTVAR &&
+			    i != RTAX_RTO_MIN)
 				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
 			else {
 				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
@@ -528,7 +531,7 @@ int print_route(const struct sockaddr_nl
 				val *= 1000;
 				if (i == RTAX_RTT)
 					val /= 8;
-				else
+				else if (i == RTAX_RTTVAR)
 					val /= 4;
 				if (val >= hz)
 					fprintf(fp, " %ums", val/hz);
@@ -803,6 +806,15 @@ int iproute_modify(int cmd, unsigned fla
 			if (get_unsigned(&rtt, *argv, 0))
 				invarg("\"rtt\" value is invalid\n", *argv);
 			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTT, rtt);
+		} else if (strcmp(*argv, "rto_min") == 0) {
+			unsigned rto_min;
+			NEXT_ARG();
+			mxlock |= (1<<RTAX_RTO_MIN);
+			if (get_unsigned(&rto_min, *argv, 0))
+				invarg("\"rto_min\" value is invalid\n",
+				       *argv);
+			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTO_MIN,
+				      rto_min);
 		} else if (matches(*argv, "window") == 0) {
 			unsigned win;
 			NEXT_ARG();

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31 18:57         ` David Miller
@ 2007-08-31 20:59           ` Rick Jones
  2007-08-31 21:38             ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 20:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

I managed to find iproute2 sources (they were debian lenny/testing 
2.6.20-1) and applied the patch, and figured-out how to add a host route 
back to one of my systems.  I then did a change to set rto_min to 300. 
I started a tcpdump and then a netperf, and then forces some 
retransmissions the old fashioned way - by pulling a cable :)  I then 
Ctrl-C'd netperf and at that point got this:

Unable to handle kernel NULL pointer dereference (address 0000000000000038)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: ipv6 sg sr_mod cdrom dm_snapshot dm_mirror dm_mod 
loop button shpchp pci_hotplug joydev evdev ext3 jbd mbcache usb_storage 
usbhid hid ide_core mptspi mptscsih ehci_hcd cciss ohci_hcd mptbase 
scsi_transport_spi scsi_mod usbcore e1000 thermal processor fan

Pid: 0, CPU 3, comm:              swapper
psr : 0000101008026038 ifs : 8000000000000001 ip  : [<a000000100477600>] 
    Not tainted
ip is at tcp_rto_min+0x20/0x40
unat: 0000000000000000 pfs : 0000000000000307 rsc : 0000000000000003
rnat: e0000100f32917e0 bsps: e0000100f32917e8 pr  : 00000000000166a5
ldrs: 0000000000000000 ccv : 0000000000010001 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004777f0 b6  : a000000100230700 b7  : a000000100452e40
f6  : 000000000000000000000 f7  : 1003e000000000002813f
f8  : 1003e00000000001c06e2 f9  : 1003e000000000000043f
f10 : 1003e00000000000fd24b f11 : 1003e0044b82fa09b5a53
r1  : a0000001009b2710 r2  : 0000000000000000 r3  : e0000100fd317b40
r8  : 0000000000000032 r9  : 000000000000012c r10 : 0000000000000003
r11 : e000000ff07b63a0 r12 : e0000100fd317b40 r13 : e0000100fd310000
r14 : 0000000000000000 r15 : 0000000000000038 r16 : 0000000000000068
r17 : e000000ff07b6380 r18 : fffffffffffffa58 r19 : 000000008d769671
r20 : 000000008d769c19 r21 : e000000ff07b62f0 r22 : 000000008d7712e1
r23 : e000000ff07b62ec r24 : e000000ff07b637c r25 : 000000000000012c
r26 : 0000000000000032 r27 : e000000ff07b6438 r28 : e000000ff07b5fc8
r29 : e000000ff215bd80 r30 : e000000ff07b60d0 r31 : e000000ff07b6250

Call Trace:
  [<a000000100013700>] show_stack+0x40/0xa0
                                 sp=e0000100fd317710 bsp=e0000100fd311310
  [<a000000100014000>] show_regs+0x840/0x880
                                 sp=e0000100fd3178e0 bsp=e0000100fd3112b8
  [<a0000001000351a0>] die+0x1a0/0x2a0
                                 sp=e0000100fd3178e0 bsp=e0000100fd311270
  [<a0000001000579d0>] ia64_do_page_fault+0x8d0/0xa00
                                 sp=e0000100fd3178e0 bsp=e0000100fd311220
  [<a00000010000b0c0>] ia64_leave_kernel+0x0/0x270
                                 sp=e0000100fd317970 bsp=e0000100fd311220
  [<a000000100477600>] tcp_rto_min+0x20/0x40
                                 sp=e0000100fd317b40 bsp=e0000100fd311218
  [<a0000001004777f0>] tcp_rtt_estimator+0x1d0/0x280
                                 sp=e0000100fd317b40 bsp=e0000100fd3111e0
  [<a000000100479110>] tcp_ack_saw_tstamp+0x50/0xc0
                                 sp=e0000100fd317b40 bsp=e0000100fd3111c0
  [<a00000010047d7a0>] tcp_ack+0x13c0/0x4380
                                 sp=e0000100fd317b40 bsp=e0000100fd311120
  [<a000000100486fa0>] tcp_rcv_state_process+0x1420/0x2100
                                 sp=e0000100fd317b60 bsp=e0000100fd3110d8
  [<a000000100498760>] tcp_v4_do_rcv+0x960/0xa80
                                 sp=e0000100fd317b60 bsp=e0000100fd311078
  [<a00000010049e830>] tcp_v4_rcv+0x19d0/0x1b20
                                 sp=e0000100fd317b70 bsp=e0000100fd311008
  [<a0000001004542f0>] ip_local_deliver+0x530/0x7c0
                                 sp=e0000100fd317b70 bsp=e0000100fd310fd0
  [<a000000100453cb0>] ip_rcv+0xe70/0xf80
                                 sp=e0000100fd317b80 bsp=e0000100fd310f98
  [<a0000001003fcb40>] netif_receive_skb+0xa20/0xb80
                                 sp=e0000100fd317ba0 bsp=e0000100fd310f50
  [<a000000213211c20>] e1000_clean_rx_irq+0x9e0/0xc00 [e1000]
                                 sp=e0000100fd317ba0 bsp=e0000100fd310e90
  [<a00000021320ce50>] e1000_clean+0x130/0x6e0 [e1000]
                                 sp=e0000100fd317ba0 bsp=e0000100fd310e38
  [<a0000001004034c0>] net_rx_action+0x1c0/0x540
                                 sp=e0000100fd317bb0 bsp=e0000100fd310df0
  [<a0000001000908f0>] __do_softirq+0xf0/0x240
                                 sp=e0000100fd317bc0 bsp=e0000100fd310d78
  [<a000000100090ab0>] do_softirq+0x70/0xc0
                                 sp=e0000100fd317bc0 bsp=e0000100fd310d18
  [<a000000100090ca0>] irq_exit+0x80/0xa0
                                 sp=e0000100fd317bc0 bsp=e0000100fd310d00
  [<a000000100010cc0>] ia64_handle_irq+0x2a0/0x2e0
                                 sp=e0000100fd317bc0 bsp=e0000100fd310cd0
  [<a00000010000b0c0>] ia64_leave_kernel+0x0/0x270
                                 sp=e0000100fd317bc0 bsp=e0000100fd310cd0
  [<a0000001000142d0>] default_idle+0x110/0x180
                                 sp=e0000100fd317d90 bsp=e0000100fd310c90
  [<a0000001000131b0>] cpu_idle+0x210/0x2e0
                                 sp=e0000100fd317e30 bsp=e0000100fd310c60
  [<a00000010063cf50>] start_secondary+0x4b0/0x4e0
                                 sp=e0000100fd317e30 bsp=e0000100fd310c20
  [<a00000010050ae00>] __kprobes_text_end+0x340/0x370
                                 sp=e0000100fd317e30 bsp=e0000100fd310c20
Kernel panic - not syncing: Aiee, killing interrupt handler!

Of course there isn't all that much code to tcp_rto_min:


+static u32 tcp_rto_min(struct sock *sk)
+{
+	struct dst_entry *dst = __sk_dst_get(sk);
+	u32 rto_min = TCP_RTO_MIN;
+
+	if (dst_metric_locked(dst, RTAX_RTO_MIN))
+		rto_min = dst->metrics[RTAX_RTO_MIN-1];
+	return rto_min;
+}
+

So, I went ahead and rebooted and started again:

hpcpc103:~# ./ip route add dev eth0 15.244.56.217
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0  scope link
16.89.84.0/25 dev eth0  proto kernel  scope link  src 16.89.84.103
default via 16.89.84.1 dev eth0
hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET

hpcpc103:~# ./ip route change dev eth0 15.244.56.217 min_rto 300
Error: either "to" is duplicate, or "min_rto" is a garbage.
hpcpc103:~# ./ip route change dev eth0 15.244.56.217 rto_min 300
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0  scope link  rto_min lock 1200ms
16.89.84.0/25 dev eth0  proto kernel  scope link  src 16.89.84.103
default via 16.89.84.1 dev eth0

300 became 1200

hpcpc103:~# ./ip route change dev eth0 15.244.56.217 rto_min 600
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0  scope link  rto_min lock 2400ms
16.89.84.0/25 dev eth0  proto kernel  scope link  src 16.89.84.103
default via 16.89.84.1 dev eth0

600 became 2400 ms but that is window dressing at the moment.  Go ahead 
and run netperf and let it finish on its own:

hpcpc103:~# netperf -H tardy
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  32768  16384  16384    10.02      72.81
hpcpc103:~#

now try it and abort netperf in the middle:

hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET

hpcpc103:~#

hmm, didn't happen again.  Now try with some RTO's forced:

hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  32768  16384  16384    30.02      32.69

still happy.  forced RTO's and abort:

hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET

hpcpc103:~#

ok still happy.  So the failure is at best intermittant.  One final 
thing - try adding the tcpudmp again:

hpcpc103:~# device eth0 entered promiscuous mode
audit(1188593799.036:2): dev=eth0 prom=256 old_prom=0 auid=4294967295
netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET

hpcpc103:~#

still happy.  Sigh, so much for easily reproducing the panic :(

rick

that second tcpdump does show the rto_min being honored:

16:56:46.543603 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: . 
ack 29438433 win 32768 <nop,nop,timestamp 7997540 54018>
16:56:46.543608 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29438433:29454361(15928) ack 1 win 46 <nop,nop,timestamp 54018 7997540>
16:56:46.543613 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: P 
29454361:29470289(15928) ack 1 win 46 <nop,nop,timestamp 54018 7997540>
16:56:48.956342 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29438433:29439881(1448) ack 1 win 46 <nop,nop,timestamp 54622 7997540>
16:56:53.788276 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29438433:29439881(1448) ack 1 win 46 <nop,nop,timestamp 55830 7997540>
16:56:53.855520 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: . 
ack 29439881 win 32768 <nop,nop,timestamp 7998272 55830>
16:56:53.855526 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29439881:29441329(1448) ack 1 win 46 <nop,nop,timestamp 55846 7998272>
16:56:53.855530 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29441329:29442777(1448) ack 1 win 46 <nop,nop,timestamp 55846 7998272>
16:56:53.925505 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: . 
ack 29442777 win 32768 <nop,nop,timestamp 7998279 55846>
16:56:53.925511 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: . 
29442777:29444225(1448) ack 1 win 46 <nop,nop,timestamp 55864 7998279>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31 20:59           ` Rick Jones
@ 2007-08-31 21:38             ` David Miller
  2007-08-31 22:20               ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 21:38 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 13:59:50 -0700

> ip is at tcp_rto_min+0x20/0x40

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1ee7212..bbad2cd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
 	struct dst_entry *dst = __sk_dst_get(sk);
 	u32 rto_min = TCP_RTO_MIN;
 
-	if (dst_metric_locked(dst, RTAX_RTO_MIN))
+	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
 		rto_min = dst->metrics[RTAX_RTO_MIN-1];
 	return rto_min;
 }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31 21:38             ` David Miller
@ 2007-08-31 22:20               ` Rick Jones
  2007-08-31 22:24                 ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 22:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Fri, 31 Aug 2007 13:59:50 -0700
> 
> 
>>ip is at tcp_rto_min+0x20/0x40
> 
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 1ee7212..bbad2cd 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
>  	struct dst_entry *dst = __sk_dst_get(sk);
>  	u32 rto_min = TCP_RTO_MIN;
>  
> -	if (dst_metric_locked(dst, RTAX_RTO_MIN))
> +	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
>  		rto_min = dst->metrics[RTAX_RTO_MIN-1];
>  	return rto_min;
>  }

Applied and beating on it with a while loop doing a bunch of ip route 
del add change stuff while netperf TCP_CRR tests are running.  Thusfar 
things seem OK wrt the system staying alive, but since I only saw the 
failure once I'm not sure how much that is really saying.

I'm going to go ahead and take a look at input vs output units and 
differences between those with rto_min vs rtt.

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
  2007-08-31 22:20               ` Rick Jones
@ 2007-08-31 22:24                 ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-08-31 22:24 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 15:20:52 -0700

> I'm going to go ahead and take a look at input vs output units and 
> differences between those with rto_min vs rtt.

You better because that's one of the last non-trivial emails you'll
get for me over the next few days while I'm travelling to kernel
summit :-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-08-31 22:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-31  0:09 [PATCH] make _minimum_ TCP retransmission timeout configurable take 2 Rick Jones
2007-08-31  0:39 ` David Miller
2007-08-31  1:07   ` Rick Jones
2007-08-31  4:52     ` John Heffner
2007-08-31 17:19       ` Rick Jones
2007-08-31  5:09     ` David Miller
2007-08-31 18:11       ` Rick Jones
2007-08-31 18:57         ` David Miller
2007-08-31 20:59           ` Rick Jones
2007-08-31 21:38             ` David Miller
2007-08-31 22:20               ` Rick Jones
2007-08-31 22:24                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).