* [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
@ 2007-08-31 0:09 Rick Jones
2007-08-31 0:39 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 0:09 UTC (permalink / raw)
To: netdev
Enable configuration of the minimum TCP Retransmission Timeout via
a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
have quite variable RTTs avoid spurrious RTOs.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Lamont Jones <lamont@hp.com>
---
diff -r 06d7322848a3 Documentation/networking/ip-sysctl.txt
--- a/Documentation/networking/ip-sysctl.txt Mon Aug 27 18:32:35 2007 -0700
+++ b/Documentation/networking/ip-sysctl.txt Thu Aug 30 17:06:16 2007 -0700
@@ -339,6 +339,13 @@ tcp_rmem - vector of 3 INTEGERs: min, de
selected receiver buffers for TCP socket. This value does not override
net.core.rmem_max, "static" selection via SO_RCVBUF does not use this.
Default: 87380*2 bytes.
+
+tcp_rto_min - INTEGER
+ The minimum value for the TCP Retransmission Timeout, expressed
+ in milliseconds for the convenience of the user.
+ This is bounded at the low-end by TCP_RTO_MIN and by TCP_RTO_MAX at
+ the high-end.
+ Default: 200.
tcp_sack - BOOLEAN
Enable select acknowledgments (SACKS).
diff -r 06d7322848a3 include/net/tcp.h
--- a/include/net/tcp.h Mon Aug 27 18:32:35 2007 -0700
+++ b/include/net/tcp.h Thu Aug 30 17:06:16 2007 -0700
@@ -232,6 +232,7 @@ extern int sysctl_tcp_workaround_signed_
extern int sysctl_tcp_workaround_signed_windows;
extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
+extern unsigned int sysctl_tcp_rto_min;
extern atomic_t tcp_memory_allocated;
extern atomic_t tcp_sockets_allocated;
diff -r 06d7322848a3 net/ipv4/sysctl_net_ipv4.c
--- a/net/ipv4/sysctl_net_ipv4.c Mon Aug 27 18:32:35 2007 -0700
+++ b/net/ipv4/sysctl_net_ipv4.c Thu Aug 30 17:06:16 2007 -0700
@@ -186,6 +186,32 @@ static int strategy_allowed_congestion_c
}
+/* if there is ever a proc_dointvec_ms_jiffies_minmax we can get rid
+ of this routine */
+
+static int proc_tcp_rto_min(ctl_table *ctl, int write, struct file *filp,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ u32 *valp = ctl->data;
+ u32 oldval = *valp;
+ int ret;
+
+ ret = proc_dointvec_ms_jiffies(ctl, write, filp, buffer, lenp, ppos);
+ if (ret)
+ return ret;
+
+ /* some bounds checking would be in order */
+ if (write && *valp != oldval) {
+ if (*valp < TCP_RTO_MIN || *valp > TCP_RTO_MAX) {
+ *valp = oldval;
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+
ctl_table ipv4_table[] = {
{
.ctl_name = NET_IPV4_TCP_TIMESTAMPS,
@@ -819,6 +845,14 @@ ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "tcp_rto_min",
+ .data = &sysctl_tcp_rto_min,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_tcp_rto_min
+ },
{ .ctl_name = 0 }
};
diff -r 06d7322848a3 net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c Mon Aug 27 18:32:35 2007 -0700
+++ b/net/ipv4/tcp_input.c Thu Aug 30 17:06:16 2007 -0700
@@ -91,6 +91,8 @@ int sysctl_tcp_nometrics_save __read_mos
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;
+
+unsigned int sysctl_tcp_rto_min __read_mostly = TCP_RTO_MIN;
#define FLAG_DATA 0x01 /* Incoming frame contained data. */
#define FLAG_WIN_UPDATE 0x02 /* Incoming ACK was a window update. */
@@ -616,13 +618,13 @@ static void tcp_rtt_estimator(struct soc
if (tp->mdev_max < tp->rttvar)
tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
tp->rtt_seq = tp->snd_nxt;
- tp->mdev_max = TCP_RTO_MIN;
+ tp->mdev_max = sysctl_tcp_rto_min;
}
} else {
/* no previous measure. */
tp->srtt = m<<3; /* take the measured time to be rtt */
tp->mdev = m<<1; /* make sure rto = 3*rtt */
- tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+ tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
tp->rtt_seq = tp->snd_nxt;
}
}
@@ -851,7 +853,7 @@ static void tcp_init_metrics(struct sock
}
if (dst_metric(dst, RTAX_RTTVAR) > tp->mdev) {
tp->mdev = dst_metric(dst, RTAX_RTTVAR);
- tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+ tp->mdev_max = tp->rttvar = max(tp->mdev, sysctl_tcp_rto_min);
}
tcp_set_rto(sk);
tcp_bound_rto(sk);
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 0:09 [PATCH] make _minimum_ TCP retransmission timeout configurable take 2 Rick Jones
@ 2007-08-31 0:39 ` David Miller
2007-08-31 1:07 ` Rick Jones
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 0:39 UTC (permalink / raw)
To: rick.jones2; +Cc: netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 30 Aug 2007 17:09:04 -0700 (PDT)
> Enable configuration of the minimum TCP Retransmission Timeout via
> a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
> have quite variable RTTs avoid spurrious RTOs.
>
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> Signed-off-by: Lamont Jones <lamont@hp.com>
Thanks for doing this work Rick.
But as John Heffner and I both mentioned, it's pretty clear we should
do this as a routing metric. Both for handling realistic scenerios
where the sysctl doesn't work, and to help prevent misuse (example:
someone decides that it would be _totally_ _awesome_ for "Carrier
Grade Linux" to set this to 3 seconds by default in /etc/sysctl.conf
and crap like that).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 0:39 ` David Miller
@ 2007-08-31 1:07 ` Rick Jones
2007-08-31 4:52 ` John Heffner
2007-08-31 5:09 ` David Miller
0 siblings, 2 replies; 12+ messages in thread
From: Rick Jones @ 2007-08-31 1:07 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Thu, 30 Aug 2007 17:09:04 -0700 (PDT)
>
>
>>Enable configuration of the minimum TCP Retransmission Timeout via
>>a new sysctl "tcp_rto_min" to help those who's networks (eg cellular)
>>have quite variable RTTs avoid spurrious RTOs.
>>
>>Signed-off-by: Rick Jones <rick.jones2@hp.com>
>>Signed-off-by: Lamont Jones <lamont@hp.com>
>
>
> Thanks for doing this work Rick.
>
> But as John Heffner and I both mentioned, it's pretty clear we should
> do this as a routing metric. Both for handling realistic scenerios
> where the sysctl doesn't work, and to help prevent misuse (example:
> someone decides that it would be _totally_ _awesome_ for "Carrier
> Grade Linux" to set this to 3 seconds by default in /etc/sysctl.conf
> and crap like that).
If nothing else it was worth the practice :) I'll be happy with either
mechanism, just wasn't sure if the jury was still out on whether making
it a routing metric was really necessary. I can see where it would be
goodness if one had separate paths out of a system, one with the highly
variable RTT and one with non-trivial loss rates, just that thusfar I've
not come across any :) I've only seen one path with high RTT
variability and the other path with trivial loss rates.
Also, not surprisingly, the folks for whom I'm doing this are a triffle
"anxious" so I figured that simplicity was worthwhile. Particularly if
it was going to be the case those folks were going to be asking for
back-ports.
Anyhow, I'll try grubbing around the source code (already doing that to
see about writing a pet tcp cong module) but if pointers to the likely
relevant files were available I could try to help thrash-out the routing
metric version. Like I said the consumers of this are a triffle well,
"anxious" :)
rick
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 1:07 ` Rick Jones
@ 2007-08-31 4:52 ` John Heffner
2007-08-31 17:19 ` Rick Jones
2007-08-31 5:09 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: John Heffner @ 2007-08-31 4:52 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
Rick Jones wrote:
> Like I said the consumers of this are a triffle well,
> "anxious" :)
Just curious, did you or this customer try with F-RTO enabled? Or is
this case you're dealing with truly hopeless?
-John
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 4:52 ` John Heffner
@ 2007-08-31 17:19 ` Rick Jones
0 siblings, 0 replies; 12+ messages in thread
From: Rick Jones @ 2007-08-31 17:19 UTC (permalink / raw)
To: John Heffner; +Cc: netdev
John Heffner wrote:
> Rick Jones wrote:
>
>> Like I said the consumers of this are a triffle well, "anxious" :)
>
>
> Just curious, did you or this customer try with F-RTO enabled? Or is
> this case you're dealing with truly hopeless?
F-RTO was mentioned to the customer and I'm awaiting their response as
to its efficacy in their situation. Everything I've seen thusfar is
leading me to believe that we'll still need a higher than 200
millisecond minimum rto though.
rick
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 1:07 ` Rick Jones
2007-08-31 4:52 ` John Heffner
@ 2007-08-31 5:09 ` David Miller
2007-08-31 18:11 ` Rick Jones
1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 5:09 UTC (permalink / raw)
To: rick.jones2; +Cc: netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 30 Aug 2007 18:07:13 -0700
> Anyhow, I'll try grubbing around the source code (already doing that to
> see about writing a pet tcp cong module) but if pointers to the likely
> relevant files were available I could try to help thrash-out the routing
> metric version. Like I said the consumers of this are a triffle well,
> "anxious" :)
The change is actually a lot simpler than the sysctl version.
In fact it borders on trivial :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c91476c..dff3192 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -351,6 +351,8 @@ enum
#define RTAX_INITCWND RTAX_INITCWND
RTAX_FEATURES,
#define RTAX_FEATURES RTAX_FEATURES
+ RTAX_RTO_MIN,
+#define RTAX_RTO_MIN RTAX_RTO_MIN
__RTAX_MAX
};
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9785df3..1ee7212 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -555,6 +555,16 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
tcp_grow_window(sk, skb);
}
+static u32 tcp_rto_min(struct sock *sk)
+{
+ struct dst_entry *dst = __sk_dst_get(sk);
+ u32 rto_min = TCP_RTO_MIN;
+
+ if (dst_metric_locked(dst, RTAX_RTO_MIN))
+ rto_min = dst->metrics[RTAX_RTO_MIN-1];
+ return rto_min;
+}
+
/* Called to compute a smoothed rtt estimate. The data fed to this
* routine either comes from timestamps, or from segments that were
* known _not_ to have been retransmitted [see Karn/Partridge
@@ -616,13 +626,13 @@ static void tcp_rtt_estimator(struct sock *sk, const __u32 mrtt)
if (tp->mdev_max < tp->rttvar)
tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
tp->rtt_seq = tp->snd_nxt;
- tp->mdev_max = TCP_RTO_MIN;
+ tp->mdev_max = tcp_rto_min(sk);
}
} else {
/* no previous measure. */
tp->srtt = m<<3; /* take the measured time to be rtt */
tp->mdev = m<<1; /* make sure rto = 3*rtt */
- tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
+ tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
tp->rtt_seq = tp->snd_nxt;
}
}
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 5:09 ` David Miller
@ 2007-08-31 18:11 ` Rick Jones
2007-08-31 18:57 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 18:11 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Thu, 30 Aug 2007 18:07:13 -0700
>
>
>>Anyhow, I'll try grubbing around the source code (already doing that to
>>see about writing a pet tcp cong module) but if pointers to the likely
>>relevant files were available I could try to help thrash-out the routing
>>metric version. Like I said the consumers of this are a triffle well,
>>"anxious" :)
>
>
> The change is actually a lot simpler than the sysctl version.
>
> In fact it borders on trivial :-)
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index c91476c..dff3192 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -351,6 +351,8 @@ enum
> #define RTAX_INITCWND RTAX_INITCWND
> RTAX_FEATURES,
> #define RTAX_FEATURES RTAX_FEATURES
> + RTAX_RTO_MIN,
> +#define RTAX_RTO_MIN RTAX_RTO_MIN
> __RTAX_MAX
> };
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9785df3..1ee7212 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -555,6 +555,16 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
> tcp_grow_window(sk, skb);
> }
>
> +static u32 tcp_rto_min(struct sock *sk)
> +{
> + struct dst_entry *dst = __sk_dst_get(sk);
> + u32 rto_min = TCP_RTO_MIN;
> +
> + if (dst_metric_locked(dst, RTAX_RTO_MIN))
> + rto_min = dst->metrics[RTAX_RTO_MIN-1];
> + return rto_min;
> +}
> +
> /* Called to compute a smoothed rtt estimate. The data fed to this
> * routine either comes from timestamps, or from segments that were
> * known _not_ to have been retransmitted [see Karn/Partridge
> @@ -616,13 +626,13 @@ static void tcp_rtt_estimator(struct sock *sk, const __u32 mrtt)
> if (tp->mdev_max < tp->rttvar)
> tp->rttvar -= (tp->rttvar-tp->mdev_max)>>2;
> tp->rtt_seq = tp->snd_nxt;
> - tp->mdev_max = TCP_RTO_MIN;
> + tp->mdev_max = tcp_rto_min(sk);
> }
> } else {
> /* no previous measure. */
> tp->srtt = m<<3; /* take the measured time to be rtt */
> tp->mdev = m<<1; /* make sure rto = 3*rtt */
> - tp->mdev_max = tp->rttvar = max(tp->mdev, TCP_RTO_MIN);
> + tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
> tp->rtt_seq = tp->snd_nxt;
> }
> }
At the risk of showing my ignorance (what me worry about that?-) I
presume this is then an interface expecting to take-in jiffies? That
means the user has to know the value of HZ which can be (IIRC) one of
three different values?
rick jones
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 18:11 ` Rick Jones
@ 2007-08-31 18:57 ` David Miller
2007-08-31 20:59 ` Rick Jones
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 18:57 UTC (permalink / raw)
To: rick.jones2; +Cc: netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 11:11:37 -0700
> At the risk of showing my ignorance (what me worry about that?-) I
> presume this is then an interface expecting to take-in jiffies? That
> means the user has to know the value of HZ which can be (IIRC) one of
> three different values?
The iproute2 changes might look something like this:
--- ./include/linux/rtnetlink.h.orig 2007-08-31 11:55:30.000000000 -0700
+++ ./include/linux/rtnetlink.h 2007-08-31 11:52:22.000000000 -0700
@@ -351,6 +351,8 @@ enum
#define RTAX_INITCWND RTAX_INITCWND
RTAX_FEATURES,
#define RTAX_FEATURES RTAX_FEATURES
+ RTAX_RTO_MIN,
+#define RTAX_RTO_MIN RTAX_RTO_MIN
__RTAX_MAX
};
--- ./ip/iproute.c.orig 2007-08-31 11:55:30.000000000 -0700
+++ ./ip/iproute.c 2007-08-31 11:53:29.000000000 -0700
@@ -51,6 +51,7 @@ static const char *mx_names[RTAX_MAX+1]
[RTAX_HOPLIMIT] = "hoplimit",
[RTAX_INITCWND] = "initcwnd",
[RTAX_FEATURES] = "features",
+ [RTAX_RTO_MIN] = "rto_min",
};
static void usage(void) __attribute__((noreturn));
@@ -74,6 +75,7 @@ static void usage(void)
fprintf(stderr, " [ rtt NUMBER ] [ rttvar NUMBER ]\n");
fprintf(stderr, " [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n");
fprintf(stderr, " [ ssthresh NUMBER ] [ realms REALM ]\n");
+ fprintf(stderr, " [ rto_min NUMBER ]\n");
fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n");
fprintf(stderr, " unreachable | prohibit | blackhole | nat ]\n");
fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
@@ -520,7 +522,8 @@ int print_route(const struct sockaddr_nl
if (mxlock & (1<<i))
fprintf(fp, " lock");
- if (i != RTAX_RTT && i != RTAX_RTTVAR)
+ if (i != RTAX_RTT && i != RTAX_RTTVAR &&
+ i != RTAX_RTO_MIN)
fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
else {
unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
@@ -528,7 +531,7 @@ int print_route(const struct sockaddr_nl
val *= 1000;
if (i == RTAX_RTT)
val /= 8;
- else
+ else if (i == RTAX_RTTVAR)
val /= 4;
if (val >= hz)
fprintf(fp, " %ums", val/hz);
@@ -803,6 +806,15 @@ int iproute_modify(int cmd, unsigned fla
if (get_unsigned(&rtt, *argv, 0))
invarg("\"rtt\" value is invalid\n", *argv);
rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTT, rtt);
+ } else if (strcmp(*argv, "rto_min") == 0) {
+ unsigned rto_min;
+ NEXT_ARG();
+ mxlock |= (1<<RTAX_RTO_MIN);
+ if (get_unsigned(&rto_min, *argv, 0))
+ invarg("\"rto_min\" value is invalid\n",
+ *argv);
+ rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTO_MIN,
+ rto_min);
} else if (matches(*argv, "window") == 0) {
unsigned win;
NEXT_ARG();
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 18:57 ` David Miller
@ 2007-08-31 20:59 ` Rick Jones
2007-08-31 21:38 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 20:59 UTC (permalink / raw)
To: David Miller; +Cc: netdev
I managed to find iproute2 sources (they were debian lenny/testing
2.6.20-1) and applied the patch, and figured-out how to add a host route
back to one of my systems. I then did a change to set rto_min to 300.
I started a tcpdump and then a netperf, and then forces some
retransmissions the old fashioned way - by pulling a cable :) I then
Ctrl-C'd netperf and at that point got this:
Unable to handle kernel NULL pointer dereference (address 0000000000000038)
swapper[0]: Oops 8813272891392 [1]
Modules linked in: ipv6 sg sr_mod cdrom dm_snapshot dm_mirror dm_mod
loop button shpchp pci_hotplug joydev evdev ext3 jbd mbcache usb_storage
usbhid hid ide_core mptspi mptscsih ehci_hcd cciss ohci_hcd mptbase
scsi_transport_spi scsi_mod usbcore e1000 thermal processor fan
Pid: 0, CPU 3, comm: swapper
psr : 0000101008026038 ifs : 8000000000000001 ip : [<a000000100477600>]
Not tainted
ip is at tcp_rto_min+0x20/0x40
unat: 0000000000000000 pfs : 0000000000000307 rsc : 0000000000000003
rnat: e0000100f32917e0 bsps: e0000100f32917e8 pr : 00000000000166a5
ldrs: 0000000000000000 ccv : 0000000000010001 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000001004777f0 b6 : a000000100230700 b7 : a000000100452e40
f6 : 000000000000000000000 f7 : 1003e000000000002813f
f8 : 1003e00000000001c06e2 f9 : 1003e000000000000043f
f10 : 1003e00000000000fd24b f11 : 1003e0044b82fa09b5a53
r1 : a0000001009b2710 r2 : 0000000000000000 r3 : e0000100fd317b40
r8 : 0000000000000032 r9 : 000000000000012c r10 : 0000000000000003
r11 : e000000ff07b63a0 r12 : e0000100fd317b40 r13 : e0000100fd310000
r14 : 0000000000000000 r15 : 0000000000000038 r16 : 0000000000000068
r17 : e000000ff07b6380 r18 : fffffffffffffa58 r19 : 000000008d769671
r20 : 000000008d769c19 r21 : e000000ff07b62f0 r22 : 000000008d7712e1
r23 : e000000ff07b62ec r24 : e000000ff07b637c r25 : 000000000000012c
r26 : 0000000000000032 r27 : e000000ff07b6438 r28 : e000000ff07b5fc8
r29 : e000000ff215bd80 r30 : e000000ff07b60d0 r31 : e000000ff07b6250
Call Trace:
[<a000000100013700>] show_stack+0x40/0xa0
sp=e0000100fd317710 bsp=e0000100fd311310
[<a000000100014000>] show_regs+0x840/0x880
sp=e0000100fd3178e0 bsp=e0000100fd3112b8
[<a0000001000351a0>] die+0x1a0/0x2a0
sp=e0000100fd3178e0 bsp=e0000100fd311270
[<a0000001000579d0>] ia64_do_page_fault+0x8d0/0xa00
sp=e0000100fd3178e0 bsp=e0000100fd311220
[<a00000010000b0c0>] ia64_leave_kernel+0x0/0x270
sp=e0000100fd317970 bsp=e0000100fd311220
[<a000000100477600>] tcp_rto_min+0x20/0x40
sp=e0000100fd317b40 bsp=e0000100fd311218
[<a0000001004777f0>] tcp_rtt_estimator+0x1d0/0x280
sp=e0000100fd317b40 bsp=e0000100fd3111e0
[<a000000100479110>] tcp_ack_saw_tstamp+0x50/0xc0
sp=e0000100fd317b40 bsp=e0000100fd3111c0
[<a00000010047d7a0>] tcp_ack+0x13c0/0x4380
sp=e0000100fd317b40 bsp=e0000100fd311120
[<a000000100486fa0>] tcp_rcv_state_process+0x1420/0x2100
sp=e0000100fd317b60 bsp=e0000100fd3110d8
[<a000000100498760>] tcp_v4_do_rcv+0x960/0xa80
sp=e0000100fd317b60 bsp=e0000100fd311078
[<a00000010049e830>] tcp_v4_rcv+0x19d0/0x1b20
sp=e0000100fd317b70 bsp=e0000100fd311008
[<a0000001004542f0>] ip_local_deliver+0x530/0x7c0
sp=e0000100fd317b70 bsp=e0000100fd310fd0
[<a000000100453cb0>] ip_rcv+0xe70/0xf80
sp=e0000100fd317b80 bsp=e0000100fd310f98
[<a0000001003fcb40>] netif_receive_skb+0xa20/0xb80
sp=e0000100fd317ba0 bsp=e0000100fd310f50
[<a000000213211c20>] e1000_clean_rx_irq+0x9e0/0xc00 [e1000]
sp=e0000100fd317ba0 bsp=e0000100fd310e90
[<a00000021320ce50>] e1000_clean+0x130/0x6e0 [e1000]
sp=e0000100fd317ba0 bsp=e0000100fd310e38
[<a0000001004034c0>] net_rx_action+0x1c0/0x540
sp=e0000100fd317bb0 bsp=e0000100fd310df0
[<a0000001000908f0>] __do_softirq+0xf0/0x240
sp=e0000100fd317bc0 bsp=e0000100fd310d78
[<a000000100090ab0>] do_softirq+0x70/0xc0
sp=e0000100fd317bc0 bsp=e0000100fd310d18
[<a000000100090ca0>] irq_exit+0x80/0xa0
sp=e0000100fd317bc0 bsp=e0000100fd310d00
[<a000000100010cc0>] ia64_handle_irq+0x2a0/0x2e0
sp=e0000100fd317bc0 bsp=e0000100fd310cd0
[<a00000010000b0c0>] ia64_leave_kernel+0x0/0x270
sp=e0000100fd317bc0 bsp=e0000100fd310cd0
[<a0000001000142d0>] default_idle+0x110/0x180
sp=e0000100fd317d90 bsp=e0000100fd310c90
[<a0000001000131b0>] cpu_idle+0x210/0x2e0
sp=e0000100fd317e30 bsp=e0000100fd310c60
[<a00000010063cf50>] start_secondary+0x4b0/0x4e0
sp=e0000100fd317e30 bsp=e0000100fd310c20
[<a00000010050ae00>] __kprobes_text_end+0x340/0x370
sp=e0000100fd317e30 bsp=e0000100fd310c20
Kernel panic - not syncing: Aiee, killing interrupt handler!
Of course there isn't all that much code to tcp_rto_min:
+static u32 tcp_rto_min(struct sock *sk)
+{
+ struct dst_entry *dst = __sk_dst_get(sk);
+ u32 rto_min = TCP_RTO_MIN;
+
+ if (dst_metric_locked(dst, RTAX_RTO_MIN))
+ rto_min = dst->metrics[RTAX_RTO_MIN-1];
+ return rto_min;
+}
+
So, I went ahead and rebooted and started again:
hpcpc103:~# ./ip route add dev eth0 15.244.56.217
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0 scope link
16.89.84.0/25 dev eth0 proto kernel scope link src 16.89.84.103
default via 16.89.84.1 dev eth0
hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
hpcpc103:~# ./ip route change dev eth0 15.244.56.217 min_rto 300
Error: either "to" is duplicate, or "min_rto" is a garbage.
hpcpc103:~# ./ip route change dev eth0 15.244.56.217 rto_min 300
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0 scope link rto_min lock 1200ms
16.89.84.0/25 dev eth0 proto kernel scope link src 16.89.84.103
default via 16.89.84.1 dev eth0
300 became 1200
hpcpc103:~# ./ip route change dev eth0 15.244.56.217 rto_min 600
hpcpc103:~# ./ip route show
15.244.56.217 dev eth0 scope link rto_min lock 2400ms
16.89.84.0/25 dev eth0 proto kernel scope link src 16.89.84.103
default via 16.89.84.1 dev eth0
600 became 2400 ms but that is window dressing at the moment. Go ahead
and run netperf and let it finish on its own:
hpcpc103:~# netperf -H tardy
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
32768 16384 16384 10.02 72.81
hpcpc103:~#
now try it and abort netperf in the middle:
hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
hpcpc103:~#
hmm, didn't happen again. Now try with some RTO's forced:
hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
32768 16384 16384 30.02 32.69
still happy. forced RTO's and abort:
hpcpc103:~# netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
hpcpc103:~#
ok still happy. So the failure is at best intermittant. One final
thing - try adding the tcpudmp again:
hpcpc103:~# device eth0 entered promiscuous mode
audit(1188593799.036:2): dev=eth0 prom=256 old_prom=0 auid=4294967295
netperf -H tardy -l 30
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
hpcpc103:~#
still happy. Sigh, so much for easily reproducing the panic :(
rick
that second tcpdump does show the rto_min being honored:
16:56:46.543603 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: .
ack 29438433 win 32768 <nop,nop,timestamp 7997540 54018>
16:56:46.543608 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29438433:29454361(15928) ack 1 win 46 <nop,nop,timestamp 54018 7997540>
16:56:46.543613 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: P
29454361:29470289(15928) ack 1 win 46 <nop,nop,timestamp 54018 7997540>
16:56:48.956342 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29438433:29439881(1448) ack 1 win 46 <nop,nop,timestamp 54622 7997540>
16:56:53.788276 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29438433:29439881(1448) ack 1 win 46 <nop,nop,timestamp 55830 7997540>
16:56:53.855520 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: .
ack 29439881 win 32768 <nop,nop,timestamp 7998272 55830>
16:56:53.855526 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29439881:29441329(1448) ack 1 win 46 <nop,nop,timestamp 55846 7998272>
16:56:53.855530 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29441329:29442777(1448) ack 1 win 46 <nop,nop,timestamp 55846 7998272>
16:56:53.925505 IP tardy.cup.hp.com.52620 > hpcpc103.cup.hp.com.51691: .
ack 29442777 win 32768 <nop,nop,timestamp 7998279 55846>
16:56:53.925511 IP hpcpc103.cup.hp.com.51691 > tardy.cup.hp.com.52620: .
29442777:29444225(1448) ack 1 win 46 <nop,nop,timestamp 55864 7998279>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 20:59 ` Rick Jones
@ 2007-08-31 21:38 ` David Miller
2007-08-31 22:20 ` Rick Jones
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-08-31 21:38 UTC (permalink / raw)
To: rick.jones2; +Cc: netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 13:59:50 -0700
> ip is at tcp_rto_min+0x20/0x40
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1ee7212..bbad2cd 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
struct dst_entry *dst = __sk_dst_get(sk);
u32 rto_min = TCP_RTO_MIN;
- if (dst_metric_locked(dst, RTAX_RTO_MIN))
+ if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
rto_min = dst->metrics[RTAX_RTO_MIN-1];
return rto_min;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 21:38 ` David Miller
@ 2007-08-31 22:20 ` Rick Jones
2007-08-31 22:24 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2007-08-31 22:20 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Fri, 31 Aug 2007 13:59:50 -0700
>
>
>>ip is at tcp_rto_min+0x20/0x40
>
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 1ee7212..bbad2cd 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -560,7 +560,7 @@ static u32 tcp_rto_min(struct sock *sk)
> struct dst_entry *dst = __sk_dst_get(sk);
> u32 rto_min = TCP_RTO_MIN;
>
> - if (dst_metric_locked(dst, RTAX_RTO_MIN))
> + if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
> rto_min = dst->metrics[RTAX_RTO_MIN-1];
> return rto_min;
> }
Applied and beating on it with a while loop doing a bunch of ip route
del add change stuff while netperf TCP_CRR tests are running. Thusfar
things seem OK wrt the system staying alive, but since I only saw the
failure once I'm not sure how much that is really saying.
I'm going to go ahead and take a look at input vs output units and
differences between those with rto_min vs rtt.
rick jones
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable take 2
2007-08-31 22:20 ` Rick Jones
@ 2007-08-31 22:24 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-08-31 22:24 UTC (permalink / raw)
To: rick.jones2; +Cc: netdev
From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 31 Aug 2007 15:20:52 -0700
> I'm going to go ahead and take a look at input vs output units and
> differences between those with rto_min vs rtt.
You better because that's one of the last non-trivial emails you'll
get for me over the next few days while I'm travelling to kernel
summit :-)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-08-31 22:24 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-31 0:09 [PATCH] make _minimum_ TCP retransmission timeout configurable take 2 Rick Jones
2007-08-31 0:39 ` David Miller
2007-08-31 1:07 ` Rick Jones
2007-08-31 4:52 ` John Heffner
2007-08-31 17:19 ` Rick Jones
2007-08-31 5:09 ` David Miller
2007-08-31 18:11 ` Rick Jones
2007-08-31 18:57 ` David Miller
2007-08-31 20:59 ` Rick Jones
2007-08-31 21:38 ` David Miller
2007-08-31 22:20 ` Rick Jones
2007-08-31 22:24 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).