* Re: fix TCP roundtrip time update code [not found] <200306031552.h53FqknC023999@napali.hpl.hp.com> @ 2003-06-03 17:41 ` Martin Josefsson 2003-06-03 18:45 ` David Mosberger 0 siblings, 1 reply; 15+ messages in thread From: Martin Josefsson @ 2003-06-03 17:41 UTC (permalink / raw) To: davidm; +Cc: kuznet, linux-kernel, linux-ia64, netdev (trimmed CC line and added netdev) On Tue, 2003-06-03 at 17:52, David Mosberger wrote: > One of those very-hard-to-track-down, trivial-to-fix kind of problems: > without this patch, TCP roundtrip time measurements will corrupt the > routing cache's RTT estimates under heavy network load (the bug causes > RTAX_RTT to go negative, but since its type is u32, you end up with a > huge positive value...). From there on, later TCP connections quickly > will go south. > > The typo was introduced 8 months ago in v1.29 of the file by the patch > entitled "Cleanup DST metrics and abstrct MSS/PMTU further". I tested this patch and it looks like it has cured my mysterious TCP stalls. without patch: cache mtu 1500 rtt 479411ms rttvar 953813ms cwnd 46 advmss 1460 I see that before and during the stall if not using this patch. (rtt is never above 20ms accoring to ping) With the patch I see normal rtt and rttvar times. Havn't seen a stall yet (~30 kernelcompiles with distcc over a sometimes congested link), will continue testing. > ===== net/ipv4/tcp_input.c 1.36 vs edited ===== > --- 1.36/net/ipv4/tcp_input.c Mon Apr 28 09:27:57 2003 > +++ edited/net/ipv4/tcp_input.c Tue Jun 3 08:19:36 2003 > @@ -556,8 +556,8 @@ > if (m >= dst_metric(dst, RTAX_RTTVAR)) > dst->metrics[RTAX_RTTVAR-1] = m; > else > - dst->metrics[RTAX_RTT-1] -= > - (dst->metrics[RTAX_RTT-1] - m)>>2; > + dst->metrics[RTAX_RTTVAR-1] -= > + (dst->metrics[RTAX_RTTVAR-1] - m)>>2; > } > > if (tp->snd_ssthresh >= 0xFFFF) { -- /Martin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-03 17:41 ` fix TCP roundtrip time update code Martin Josefsson @ 2003-06-03 18:45 ` David Mosberger 2003-06-04 0:24 ` James Morris 0 siblings, 1 reply; 15+ messages in thread From: David Mosberger @ 2003-06-03 18:45 UTC (permalink / raw) To: Martin Josefsson; +Cc: davidm, kuznet, linux-kernel, linux-ia64, netdev >>>>> On 03 Jun 2003 19:41:11 +0200, Martin Josefsson <gandalf@wlug.westbo.se> said: Martin> (trimmed CC line and added netdev) On Tue, 2003-06-03 at Martin> 17:52, David Mosberger wrote: >> One of those very-hard-to-track-down, trivial-to-fix kind of >> problems: without this patch, TCP roundtrip time measurements >> will corrupt the routing cache's RTT estimates under heavy >> network load (the bug causes RTAX_RTT to go negative, but since >> its type is u32, you end up with a huge positive value...). From >> there on, later TCP connections quickly will go south. >> The typo was introduced 8 months ago in v1.29 of the file by the >> patch entitled "Cleanup DST metrics and abstrct MSS/PMTU >> further". Martin> I tested this patch and it looks like it has cured my Martin> mysterious TCP stalls. Yes, this sounds reasonable. I wasn't very clear on this point, but "by going south" I meant that TCP is starting to misbehave. In particular, you'll likely end up with the kernel aborting ESTABLISHED TCP connections with extreme prejudice (and in violation of the TCP protocol), because it thought that it had been unable to communicate with the remote end for a _very_ long time. The net effect typically is that you end up with one end having a connection that's in the ESTABLISHED state and the other end having no trace of that connection. --david ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-03 18:45 ` David Mosberger @ 2003-06-04 0:24 ` James Morris 2003-06-04 0:43 ` kuznet 0 siblings, 1 reply; 15+ messages in thread From: James Morris @ 2003-06-04 0:24 UTC (permalink / raw) To: davidm Cc: Martin Josefsson, kuznet, linux-kernel, linux-ia64, netdev, David S. Miller On Tue, 3 Jun 2003, David Mosberger wrote: > Martin> I tested this patch and it looks like it has cured my > Martin> mysterious TCP stalls. > > Yes, this sounds reasonable. I wasn't very clear on this point, but > "by going south" I meant that TCP is starting to misbehave. In > particular, you'll likely end up with the kernel aborting ESTABLISHED > TCP connections with extreme prejudice (and in violation of the TCP > protocol), because it thought that it had been unable to communicate > with the remote end for a _very_ long time. The net effect typically > is that you end up with one end having a connection that's in the > ESTABLISHED state and the other end having no trace of that > connection. David, This might be the solution to one of the 'must-fix' bugs for the networking, which nobody so far was quite able to track down. - James -- James Morris <jmorris@intercode.com.au> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 0:24 ` James Morris @ 2003-06-04 0:43 ` kuznet 2003-06-04 2:01 ` Nivedita Singhvi 0 siblings, 1 reply; 15+ messages in thread From: kuznet @ 2003-06-04 0:43 UTC (permalink / raw) To: James Morris Cc: davidm, gandalf, linux-kernel, linux-ia64, netdev, davem, akpm Hello! > This might be the solution to one of the 'must-fix' bugs for the > networking, which nobody so far was quite able to track down. No doubts. All the symptoms are explained by this. I hope Andrew will confirm that the problem has gone. Alexey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 0:43 ` kuznet @ 2003-06-04 2:01 ` Nivedita Singhvi 2003-06-04 3:23 ` David S. Miller 0 siblings, 1 reply; 15+ messages in thread From: Nivedita Singhvi @ 2003-06-04 2:01 UTC (permalink / raw) To: kuznet Cc: James Morris, davidm, gandalf, linux-kernel, linux-ia64, netdev, davem, akpm kuznet@ms2.inr.ac.ru wrote: > No doubts. All the symptoms are explained by this. I hope Andrew > will confirm that the problem has gone. Yep, great catch! But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). thanks, Nivedita ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 2:01 ` Nivedita Singhvi @ 2003-06-04 3:23 ` David S. Miller 2003-06-04 4:35 ` David Mosberger 0 siblings, 1 reply; 15+ messages in thread From: David S. Miller @ 2003-06-04 3:23 UTC (permalink / raw) To: niv Cc: kuznet, jmorris, davidm, gandalf, linux-kernel, linux-ia64, netdev, akpm From: Nivedita Singhvi <niv@us.ibm.com> Date: Tue, 03 Jun 2003 19:01:25 -0700 But, FYI, DaveM and Alexey, we tried reproducing the stalls we (Dave Hansen, Troy Wilson) had seen during SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same config, etc). So its possible our hang/stalls were some other issue that got silently fixed (or more likely, possibly the same thing but other changes minimized us running into the problem). I think this means nothing, and that you can infer nothing from such results. My understanding is that the problem case triggers only when a timeout based retransmit occurs. On LAN this tends to be extremely rare. Although under enough traffic load it can occur. So if your old SpecWEB99 lab tended more to trigger timeout based retransmits on LAN, and your new test network does not, then your new test network will tend to not reproduce the bug regardless of whether the bug is present in the kernel or not :-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 3:23 ` David S. Miller @ 2003-06-04 4:35 ` David Mosberger 2003-06-04 4:40 ` Nivedita Singhvi 2003-06-04 4:47 ` David S. Miller 0 siblings, 2 replies; 15+ messages in thread From: David Mosberger @ 2003-06-04 4:35 UTC (permalink / raw) To: David S. Miller Cc: niv, kuznet, jmorris, davidm, gandalf, linux-kernel, linux-ia64, netdev, akpm >>>>> On Tue, 03 Jun 2003 20:23:20 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: DaveM> From: Nivedita Singhvi <niv@us.ibm.com> Date: Tue, 03 Jun DaveM> 2003 19:01:25 -0700 DaveM> But, FYI, DaveM and Alexey, we tried reproducing the DaveM> stalls we (Dave Hansen, Troy Wilson) had seen during DaveM> SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same DaveM> config, etc). So its possible our hang/stalls were some other DaveM> issue that got silently fixed (or more likely, possibly the DaveM> same thing but other changes minimized us running into the DaveM> problem). DaveM> I think this means nothing, and that you can infer nothing DaveM> from such results. DaveM> My understanding is that the problem case triggers only when DaveM> a timeout based retransmit occurs. On LAN this tends to be DaveM> extremely rare. Although under enough traffic load it can DaveM> occur. DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) --david ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 4:35 ` David Mosberger @ 2003-06-04 4:40 ` Nivedita Singhvi 2003-06-04 5:34 ` David Mosberger 2003-06-04 4:47 ` David S. Miller 1 sibling, 1 reply; 15+ messages in thread From: Nivedita Singhvi @ 2003-06-04 4:40 UTC (permalink / raw) To: davidm Cc: David S. Miller, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm David Mosberger wrote: > DaveM> So if your old SpecWEB99 lab tended more to trigger timeout > DaveM> based retransmits on LAN, and your new test network does not, > DaveM> then your new test network will tend to not reproduce the bug > DaveM> regardless of whether the bug is present in the kernel or not > DaveM> :-) > > Is this where I get to plug httperf? It triggered the bug reliably in > less than 10 secs. ;-) Tarnation!! Ran httperf! Didnt hit it! :(. What were your settings? I extracted an old debug patch to implement dropping of packets - have a sysctl that controls the rate at which I can drop IP packets, so can also generate any kind of packet loss..So thought I would bang away with netperf using sendfile()/TCP_CORK. Thought it was in that code path. Will be running tests tmrw and the rest of this week on 2.5.70 +- patch. Will see if I can provoke any further hangs, stalls, wackiness of any flavor... thanks, Nivedita ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 4:40 ` Nivedita Singhvi @ 2003-06-04 5:34 ` David Mosberger 2003-06-04 5:52 ` David S. Miller 2003-06-04 6:04 ` Nivedita Singhvi 0 siblings, 2 replies; 15+ messages in thread From: David Mosberger @ 2003-06-04 5:34 UTC (permalink / raw) To: Nivedita Singhvi Cc: davidm, David S. Miller, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm >>>>> On Tue, 03 Jun 2003 21:40:18 -0700, Nivedita Singhvi <niv@us.ibm.com> said: Nivedita> David Mosberger wrote: DaveM> So if your old SpecWEB99 lab tended more to trigger timeout DaveM> based retransmits on LAN, and your new test network does not, DaveM> then your new test network will tend to not reproduce the bug DaveM> regardless of whether the bug is present in the kernel or not DaveM> :-) >> Is this where I get to plug httperf? It triggered the bug >> reliably in less than 10 secs. ;-) Nivedita> Tarnation!! Ran httperf! Didnt hit it! :(. What were your Nivedita> settings? I used: $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ --uri pathto30KBfile on 3 clients (for a total of 3000 conns/sec). You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). This load worked well for a machine with a single GigE card. All network tunables were on the default setting (in particular, the tx queue len was 300, which is were the losses came from). With this load, I saw bad RTT values in the route cache within a couple of seconds after starting the third httperf generator. It then took a bit longer (on the order of 1-2 minutes) until the first TCPAbortFailed errors started to pop up. --david ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 5:34 ` David Mosberger @ 2003-06-04 5:52 ` David S. Miller 2003-06-04 6:12 ` David Mosberger 2003-06-04 6:04 ` Nivedita Singhvi 1 sibling, 1 reply; 15+ messages in thread From: David S. Miller @ 2003-06-04 5:52 UTC (permalink / raw) To: davidm, davidm Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm From: David Mosberger <davidm@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 22:34:30 -0700 You can't go higher than 1000 conn/sec per client (IP address) because otherwise you run out of port space (due to TIME_WAIT). echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle It should eliminate this limit. Unfortunately we can't enable this by default because of NAT :( ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 5:52 ` David S. Miller @ 2003-06-04 6:12 ` David Mosberger 0 siblings, 0 replies; 15+ messages in thread From: David Mosberger @ 2003-06-04 6:12 UTC (permalink / raw) To: David S. Miller Cc: davidm, davidm, niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm >>>>> On Tue, 03 Jun 2003 22:52:45 -0700 (PDT), "David S. Miller" <davem@redhat.com> said: David> From: David Mosberger <davidm@napali.hpl.hp.com> Date: David> Tue, 3 Jun 2003 22:34:30 -0700 David> You can't go higher than 1000 conn/sec per client (IP David> address) because otherwise you run out of port space (due to David> TIME_WAIT). DaveM> echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle DaveM> It should eliminate this limit. Unfortunately we can't DaveM> enable this by default because of NAT :( Ah, yes, provided PAWS is enabled, this would give you a time_wait timeout of 3.5*RTO. Nice. --david ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 5:34 ` David Mosberger 2003-06-04 5:52 ` David S. Miller @ 2003-06-04 6:04 ` Nivedita Singhvi 2003-06-04 6:19 ` David Mosberger 1 sibling, 1 reply; 15+ messages in thread From: Nivedita Singhvi @ 2003-06-04 6:04 UTC (permalink / raw) To: davidm Cc: David S. Miller, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm David Mosberger wrote: > $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \ > --uri pathto30KBfile Hmm, ditto, except I was way down at --rate 300 (was seeing client errors of fd-unavail). Have ulimited upwards but am still seeing them.. > on 3 clients (for a total of 3000 conns/sec). You can't go higher > than 1000 conn/sec per client (IP address) because otherwise you run > out of port space (due to TIME_WAIT). You can hike /proc/sys/net/ipv4/tcp_tw_recycle for that. > This load worked well for a machine with a single GigE card. All > network tunables were on the default setting (in particular, the tx > queue len was 300, which is were the losses came from). > > With this load, I saw bad RTT values in the route cache within a > couple of seconds after starting the third httperf generator. It then > took a bit longer (on the order of 1-2 minutes) until the first > TCPAbortFailed errors started to pop up I saw a few AbortOnTimeouts, but no AbortFailed counts. Those should be TCPAbortOnTimeout counts, rather than TCPAbortFailed errors, I would expect? Why AbortFailed? Coming from IP via tcp_transmit_skb()? thanks, Nivedita ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 6:04 ` Nivedita Singhvi @ 2003-06-04 6:19 ` David Mosberger 2003-06-04 7:51 ` David S. Miller 0 siblings, 1 reply; 15+ messages in thread From: David Mosberger @ 2003-06-04 6:19 UTC (permalink / raw) To: Nivedita Singhvi Cc: davidm, David S. Miller, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm >>>>> On Tue, 03 Jun 2003 23:04:02 -0700, Nivedita Singhvi <niv@us.ibm.com> said: Nivedita> Those should be TCPAbortOnTimeout counts, rather than Nivedita> TCPAbortFailed errors, I would expect? Why AbortFailed? Nivedita> Coming from IP via tcp_transmit_skb()? Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; the TCPAbortFailed errors were indicating that tcp_transmit_skb() had failed, i.e., the tx queue was overrun (that's were the losses came from). --david ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 6:19 ` David Mosberger @ 2003-06-04 7:51 ` David S. Miller 0 siblings, 0 replies; 15+ messages in thread From: David S. Miller @ 2003-06-04 7:51 UTC (permalink / raw) To: davidm, davidm Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm From: David Mosberger <davidm@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 23:19:31 -0700 Yes, the "connection hangs/disappearances" where triggered by TCPAbortOnTimeout; This is correct. And it is the reason the connection dies silently. Because such write timeouts invoke tcp_done() which closes the connection off silently. This is correct behavior (sans the RTT bug David fixed of course :)) because a host which hasn't responded at all from so many repeated retransmission attempts isn't likely to get any reset we send either :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fix TCP roundtrip time update code 2003-06-04 4:35 ` David Mosberger 2003-06-04 4:40 ` Nivedita Singhvi @ 2003-06-04 4:47 ` David S. Miller 1 sibling, 0 replies; 15+ messages in thread From: David S. Miller @ 2003-06-04 4:47 UTC (permalink / raw) To: davidm, davidm Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev, akpm From: David Mosberger <davidm@napali.hpl.hp.com> Date: Tue, 3 Jun 2003 21:35:55 -0700 Is this where I get to plug httperf? It triggered the bug reliably in less than 10 secs. ;-) distcc was a reliable test case too... ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-06-04 7:51 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200306031552.h53FqknC023999@napali.hpl.hp.com>
2003-06-03 17:41 ` fix TCP roundtrip time update code Martin Josefsson
2003-06-03 18:45 ` David Mosberger
2003-06-04 0:24 ` James Morris
2003-06-04 0:43 ` kuznet
2003-06-04 2:01 ` Nivedita Singhvi
2003-06-04 3:23 ` David S. Miller
2003-06-04 4:35 ` David Mosberger
2003-06-04 4:40 ` Nivedita Singhvi
2003-06-04 5:34 ` David Mosberger
2003-06-04 5:52 ` David S. Miller
2003-06-04 6:12 ` David Mosberger
2003-06-04 6:04 ` Nivedita Singhvi
2003-06-04 6:19 ` David Mosberger
2003-06-04 7:51 ` David S. Miller
2003-06-04 4:47 ` David S. Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).