Re: fix TCP roundtrip time update code

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: fix TCP roundtrip time update code
       [not found] <200306031552.h53FqknC023999@napali.hpl.hp.com>
@ 2003-06-03 17:41 ` Martin Josefsson
  2003-06-03 18:45   ` David Mosberger
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Josefsson @ 2003-06-03 17:41 UTC (permalink / raw)
  To: davidm; +Cc: kuznet, linux-kernel, linux-ia64, netdev

(trimmed CC line and added netdev)

On Tue, 2003-06-03 at 17:52, David Mosberger wrote:
> One of those very-hard-to-track-down, trivial-to-fix kind of problems:
> without this patch, TCP roundtrip time measurements will corrupt the
> routing cache's RTT estimates under heavy network load (the bug causes
> RTAX_RTT to go negative, but since its type is u32, you end up with a
> huge positive value...).  From there on, later TCP connections quickly
> will go south.
> 
> The typo was introduced 8 months ago in v1.29 of the file by the patch
> entitled "Cleanup DST metrics and abstrct MSS/PMTU further".

I tested this patch and it looks like it has cured my mysterious TCP
stalls.

without patch:

    cache  mtu 1500 rtt 479411ms rttvar 953813ms cwnd 46 advmss 1460

I see that before and during the stall if not using this patch.
(rtt is never above 20ms accoring to ping)

With the patch I see normal rtt and rttvar times.
Havn't seen a stall yet (~30 kernelcompiles with distcc over a sometimes
congested link), will continue testing.

> ===== net/ipv4/tcp_input.c 1.36 vs edited =====
> --- 1.36/net/ipv4/tcp_input.c	Mon Apr 28 09:27:57 2003
> +++ edited/net/ipv4/tcp_input.c	Tue Jun  3 08:19:36 2003
> @@ -556,8 +556,8 @@
>  			if (m >= dst_metric(dst, RTAX_RTTVAR))
>  				dst->metrics[RTAX_RTTVAR-1] = m;
>  			else
> -				dst->metrics[RTAX_RTT-1] -=
> -					(dst->metrics[RTAX_RTT-1] - m)>>2;
> +				dst->metrics[RTAX_RTTVAR-1] -=
> +					(dst->metrics[RTAX_RTTVAR-1] - m)>>2;
>  		}
>  
>  		if (tp->snd_ssthresh >= 0xFFFF) {

-- 
/Martin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-03 17:41 ` fix TCP roundtrip time update code Martin Josefsson
@ 2003-06-03 18:45   ` David Mosberger
  2003-06-04  0:24     ` James Morris
  0 siblings, 1 reply; 15+ messages in thread
From: David Mosberger @ 2003-06-03 18:45 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: davidm, kuznet, linux-kernel, linux-ia64, netdev

>>>>> On 03 Jun 2003 19:41:11 +0200, Martin Josefsson <gandalf@wlug.westbo.se> said:

  Martin> (trimmed CC line and added netdev) On Tue, 2003-06-03 at
  Martin> 17:52, David Mosberger wrote:
  >> One of those very-hard-to-track-down, trivial-to-fix kind of
  >> problems: without this patch, TCP roundtrip time measurements
  >> will corrupt the routing cache's RTT estimates under heavy
  >> network load (the bug causes RTAX_RTT to go negative, but since
  >> its type is u32, you end up with a huge positive value...).  From
  >> there on, later TCP connections quickly will go south.

  >> The typo was introduced 8 months ago in v1.29 of the file by the
  >> patch entitled "Cleanup DST metrics and abstrct MSS/PMTU
  >> further".

  Martin> I tested this patch and it looks like it has cured my
  Martin> mysterious TCP stalls.

Yes, this sounds reasonable.  I wasn't very clear on this point, but
"by going south" I meant that TCP is starting to misbehave.  In
particular, you'll likely end up with the kernel aborting ESTABLISHED
TCP connections with extreme prejudice (and in violation of the TCP
protocol), because it thought that it had been unable to communicate
with the remote end for a _very_ long time.  The net effect typically
is that you end up with one end having a connection that's in the
ESTABLISHED state and the other end having no trace of that
connection.

	--david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-03 18:45   ` David Mosberger
@ 2003-06-04  0:24     ` James Morris
  2003-06-04  0:43       ` kuznet
  0 siblings, 1 reply; 15+ messages in thread
From: James Morris @ 2003-06-04  0:24 UTC (permalink / raw)
  To: davidm
  Cc: Martin Josefsson, kuznet, linux-kernel, linux-ia64, netdev,
	David S. Miller

On Tue, 3 Jun 2003, David Mosberger wrote:

>   Martin> I tested this patch and it looks like it has cured my
>   Martin> mysterious TCP stalls.
> 
> Yes, this sounds reasonable.  I wasn't very clear on this point, but
> "by going south" I meant that TCP is starting to misbehave.  In
> particular, you'll likely end up with the kernel aborting ESTABLISHED
> TCP connections with extreme prejudice (and in violation of the TCP
> protocol), because it thought that it had been unable to communicate
> with the remote end for a _very_ long time.  The net effect typically
> is that you end up with one end having a connection that's in the
> ESTABLISHED state and the other end having no trace of that
> connection.

David,

This might be the solution to one of the 'must-fix' bugs for the
networking, which nobody so far was quite able to track down.


- James
-- 
James Morris
<jmorris@intercode.com.au>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  0:24     ` James Morris
@ 2003-06-04  0:43       ` kuznet
  2003-06-04  2:01         ` Nivedita Singhvi
  0 siblings, 1 reply; 15+ messages in thread
From: kuznet @ 2003-06-04  0:43 UTC (permalink / raw)
  To: James Morris
  Cc: davidm, gandalf, linux-kernel, linux-ia64, netdev, davem, akpm

Hello!

> This might be the solution to one of the 'must-fix' bugs for the
> networking, which nobody so far was quite able to track down.

No doubts. All the symptoms are explained by this. I hope Andrew
will confirm that the problem has gone.

Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  0:43       ` kuznet
@ 2003-06-04  2:01         ` Nivedita Singhvi
  2003-06-04  3:23           ` David S. Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Nivedita Singhvi @ 2003-06-04  2:01 UTC (permalink / raw)
  To: kuznet
  Cc: James Morris, davidm, gandalf, linux-kernel, linux-ia64, netdev,
	davem, akpm

kuznet@ms2.inr.ac.ru wrote:

> No doubts. All the symptoms are explained by this. I hope Andrew
> will confirm that the problem has gone.

Yep, great catch! But, FYI, DaveM and Alexey, we tried
reproducing the stalls we (Dave Hansen, Troy Wilson) had
seen during SpecWeb99 runs and couldn't reproduce them on
2.5.69. (Same config, etc). So its possible our hang/stalls
were some other issue that got silently fixed (or more
likely, possibly the same thing but other changes minimized
us running into the problem).

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  2:01         ` Nivedita Singhvi
@ 2003-06-04  3:23           ` David S. Miller
  2003-06-04  4:35             ` David Mosberger
  0 siblings, 1 reply; 15+ messages in thread
From: David S. Miller @ 2003-06-04  3:23 UTC (permalink / raw)
  To: niv
  Cc: kuznet, jmorris, davidm, gandalf, linux-kernel, linux-ia64,
	netdev, akpm

   From: Nivedita Singhvi <niv@us.ibm.com>
   Date: Tue, 03 Jun 2003 19:01:25 -0700

   But, FYI, DaveM and Alexey, we tried
   reproducing the stalls we (Dave Hansen, Troy Wilson) had
   seen during SpecWeb99 runs and couldn't reproduce them on
   2.5.69. (Same config, etc). So its possible our hang/stalls
   were some other issue that got silently fixed (or more
   likely, possibly the same thing but other changes minimized
   us running into the problem).

I think this means nothing, and that you can infer nothing from such
results.

My understanding is that the problem case triggers only when a timeout
based retransmit occurs.  On LAN this tends to be extremely rare.
Although under enough traffic load it can occur.

So if your old SpecWEB99 lab tended more to trigger timeout based
retransmits on LAN, and your new test network does not, then your new
test network will tend to not reproduce the bug regardless of whether
the bug is present in the kernel or not :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  3:23           ` David S. Miller
@ 2003-06-04  4:35             ` David Mosberger
  2003-06-04  4:40               ` Nivedita Singhvi
  2003-06-04  4:47               ` David S. Miller
  0 siblings, 2 replies; 15+ messages in thread
From: David Mosberger @ 2003-06-04  4:35 UTC (permalink / raw)
  To: David S. Miller
  Cc: niv, kuznet, jmorris, davidm, gandalf, linux-kernel, linux-ia64,
	netdev, akpm

>>>>> On Tue, 03 Jun 2003 20:23:20 -0700 (PDT), "David S. Miller" <davem@redhat.com> said:

  DaveM>    From: Nivedita Singhvi <niv@us.ibm.com> Date: Tue, 03 Jun
  DaveM> 2003 19:01:25 -0700
  DaveM>    But, FYI, DaveM and Alexey, we tried reproducing the
  DaveM> stalls we (Dave Hansen, Troy Wilson) had seen during
  DaveM> SpecWeb99 runs and couldn't reproduce them on 2.5.69. (Same
  DaveM> config, etc). So its possible our hang/stalls were some other
  DaveM> issue that got silently fixed (or more likely, possibly the
  DaveM> same thing but other changes minimized us running into the
  DaveM> problem).

  DaveM> I think this means nothing, and that you can infer nothing
  DaveM> from such results.

  DaveM> My understanding is that the problem case triggers only when
  DaveM> a timeout based retransmit occurs.  On LAN this tends to be
  DaveM> extremely rare.  Although under enough traffic load it can
  DaveM> occur.

  DaveM> So if your old SpecWEB99 lab tended more to trigger timeout
  DaveM> based retransmits on LAN, and your new test network does not,
  DaveM> then your new test network will tend to not reproduce the bug
  DaveM> regardless of whether the bug is present in the kernel or not
  DaveM> :-)

Is this where I get to plug httperf?  It triggered the bug reliably in
less than 10 secs. ;-)

	--david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  4:35             ` David Mosberger
@ 2003-06-04  4:40               ` Nivedita Singhvi
  2003-06-04  5:34                 ` David Mosberger
  2003-06-04  4:47               ` David S. Miller
  1 sibling, 1 reply; 15+ messages in thread
From: Nivedita Singhvi @ 2003-06-04  4:40 UTC (permalink / raw)
  To: davidm
  Cc: David S. Miller, kuznet, jmorris, gandalf, linux-kernel,
	linux-ia64, netdev, akpm

David Mosberger wrote:

>   DaveM> So if your old SpecWEB99 lab tended more to trigger timeout
>   DaveM> based retransmits on LAN, and your new test network does not,
>   DaveM> then your new test network will tend to not reproduce the bug
>   DaveM> regardless of whether the bug is present in the kernel or not
>   DaveM> :-)
> 
> Is this where I get to plug httperf?  It triggered the bug reliably in
> less than 10 secs. ;-)

Tarnation!! Ran httperf! Didnt hit it! :(. What were your
settings?

I extracted an old debug patch to implement dropping of
packets - have a sysctl that controls the rate at which I
can drop IP packets, so can also generate any kind of packet
loss..So thought I would bang away with netperf using
sendfile()/TCP_CORK. Thought it was in that code path.
Will be running tests tmrw and the rest of this
week on 2.5.70 +- patch. Will see if I can provoke any
further hangs, stalls, wackiness of any flavor...

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  4:40               ` Nivedita Singhvi
@ 2003-06-04  5:34                 ` David Mosberger
  2003-06-04  5:52                   ` David S. Miller
  2003-06-04  6:04                   ` Nivedita Singhvi
  0 siblings, 2 replies; 15+ messages in thread
From: David Mosberger @ 2003-06-04  5:34 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: davidm, David S. Miller, kuznet, jmorris, gandalf, linux-kernel,
	linux-ia64, netdev, akpm

>>>>> On Tue, 03 Jun 2003 21:40:18 -0700, Nivedita Singhvi <niv@us.ibm.com> said:

  Nivedita> David Mosberger wrote:
  DaveM> So if your old SpecWEB99 lab tended more to trigger timeout
  DaveM> based retransmits on LAN, and your new test network does not,
  DaveM> then your new test network will tend to not reproduce the bug
  DaveM> regardless of whether the bug is present in the kernel or not
  DaveM> :-)

  >>  Is this where I get to plug httperf?  It triggered the bug
  >> reliably in less than 10 secs. ;-)

  Nivedita> Tarnation!! Ran httperf! Didnt hit it! :(. What were your
  Nivedita> settings?

I used:

 $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \
	--uri pathto30KBfile

on 3 clients (for a total of 3000 conns/sec).  You can't go higher
than 1000 conn/sec per client (IP address) because otherwise you run
out of port space (due to TIME_WAIT).

This load worked well for a machine with a single GigE card.  All
network tunables were on the default setting (in particular, the tx
queue len was 300, which is were the losses came from).

With this load, I saw bad RTT values in the route cache within a
couple of seconds after starting the third httperf generator.  It then
took a bit longer (on the order of 1-2 minutes) until the first
TCPAbortFailed errors started to pop up.

	--david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  5:34                 ` David Mosberger
@ 2003-06-04  5:52                   ` David S. Miller
  2003-06-04  6:12                     ` David Mosberger
  2003-06-04  6:04                   ` Nivedita Singhvi
  1 sibling, 1 reply; 15+ messages in thread
From: David S. Miller @ 2003-06-04  5:52 UTC (permalink / raw)
  To: davidm, davidm
  Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev,
	akpm

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Tue, 3 Jun 2003 22:34:30 -0700
   
   You can't go higher than 1000 conn/sec per client (IP address)
   because otherwise you run out of port space (due to TIME_WAIT).

echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle

It should eliminate this limit.  Unfortunately we can't enable
this by default because of NAT :(

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  5:52                   ` David S. Miller
@ 2003-06-04  6:12                     ` David Mosberger
  0 siblings, 0 replies; 15+ messages in thread
From: David Mosberger @ 2003-06-04  6:12 UTC (permalink / raw)
  To: David S. Miller
  Cc: davidm, davidm, niv, kuznet, jmorris, gandalf, linux-kernel,
	linux-ia64, netdev, akpm

>>>>> On Tue, 03 Jun 2003 22:52:45 -0700 (PDT), "David S. Miller" <davem@redhat.com> said:

  David>    From: David Mosberger <davidm@napali.hpl.hp.com> Date:
  David> Tue, 3 Jun 2003 22:34:30 -0700

  David>    You can't go higher than 1000 conn/sec per client (IP
  David> address) because otherwise you run out of port space (due to
  David> TIME_WAIT).

  DaveM> echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle

  DaveM> It should eliminate this limit.  Unfortunately we can't
  DaveM> enable this by default because of NAT :(

Ah, yes, provided PAWS is enabled, this would give you a time_wait
timeout of 3.5*RTO.  Nice.

	--david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  5:34                 ` David Mosberger
  2003-06-04  5:52                   ` David S. Miller
@ 2003-06-04  6:04                   ` Nivedita Singhvi
  2003-06-04  6:19                     ` David Mosberger
  1 sibling, 1 reply; 15+ messages in thread
From: Nivedita Singhvi @ 2003-06-04  6:04 UTC (permalink / raw)
  To: davidm
  Cc: David S. Miller, kuznet, jmorris, gandalf, linux-kernel,
	linux-ia64, netdev, akpm

David Mosberger wrote:

>  $ httperf --rate 1000 --num-conns 1000000 --verbose --hog --server HOST \
> 	--uri pathto30KBfile

Hmm, ditto, except I was way down at --rate 300 (was seeing client
errors of fd-unavail). Have ulimited upwards but am still seeing
them..

> on 3 clients (for a total of 3000 conns/sec).  You can't go higher
> than 1000 conn/sec per client (IP address) because otherwise you run
> out of port space (due to TIME_WAIT).

You can hike /proc/sys/net/ipv4/tcp_tw_recycle for that.

> This load worked well for a machine with a single GigE card.  All
> network tunables were on the default setting (in particular, the tx
> queue len was 300, which is were the losses came from).
> 
> With this load, I saw bad RTT values in the route cache within a
> couple of seconds after starting the third httperf generator.  It then
> took a bit longer (on the order of 1-2 minutes) until the first
> TCPAbortFailed errors started to pop up

I saw a few AbortOnTimeouts, but no AbortFailed counts.

Those should be TCPAbortOnTimeout counts, rather than TCPAbortFailed
errors, I would expect? Why AbortFailed?  Coming from IP via
tcp_transmit_skb()?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  6:04                   ` Nivedita Singhvi
@ 2003-06-04  6:19                     ` David Mosberger
  2003-06-04  7:51                       ` David S. Miller
  0 siblings, 1 reply; 15+ messages in thread
From: David Mosberger @ 2003-06-04  6:19 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: davidm, David S. Miller, kuznet, jmorris, gandalf, linux-kernel,
	linux-ia64, netdev, akpm

>>>>> On Tue, 03 Jun 2003 23:04:02 -0700, Nivedita Singhvi <niv@us.ibm.com> said:

  Nivedita> Those should be TCPAbortOnTimeout counts, rather than
  Nivedita> TCPAbortFailed errors, I would expect? Why AbortFailed?
  Nivedita> Coming from IP via tcp_transmit_skb()?

Yes, the "connection hangs/disappearances" where triggered by
TCPAbortOnTimeout; the TCPAbortFailed errors were indicating that
tcp_transmit_skb() had failed, i.e., the tx queue was overrun (that's
were the losses came from).

	--david

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  6:19                     ` David Mosberger
@ 2003-06-04  7:51                       ` David S. Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David S. Miller @ 2003-06-04  7:51 UTC (permalink / raw)
  To: davidm, davidm
  Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev,
	akpm

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Tue, 3 Jun 2003 23:19:31 -0700

   Yes, the "connection hangs/disappearances" where triggered by
   TCPAbortOnTimeout;

This is correct.

And it is the reason the connection dies silently.  Because
such write timeouts invoke tcp_done() which closes the connection
off silently.  This is correct behavior (sans the RTT bug David fixed
of course :)) because a host which hasn't responded at all from
so many repeated retransmission attempts isn't likely to get any
reset we send either :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fix TCP roundtrip time update code
  2003-06-04  4:35             ` David Mosberger
  2003-06-04  4:40               ` Nivedita Singhvi
@ 2003-06-04  4:47               ` David S. Miller
  1 sibling, 0 replies; 15+ messages in thread
From: David S. Miller @ 2003-06-04  4:47 UTC (permalink / raw)
  To: davidm, davidm
  Cc: niv, kuznet, jmorris, gandalf, linux-kernel, linux-ia64, netdev,
	akpm

   From: David Mosberger <davidm@napali.hpl.hp.com>
   Date: Tue, 3 Jun 2003 21:35:55 -0700

   Is this where I get to plug httperf?  It triggered the bug reliably in
   less than 10 secs. ;-)

distcc was a reliable test case too...

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-06-04  7:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200306031552.h53FqknC023999@napali.hpl.hp.com>
2003-06-03 17:41 ` fix TCP roundtrip time update code Martin Josefsson
2003-06-03 18:45   ` David Mosberger
2003-06-04  0:24     ` James Morris
2003-06-04  0:43       ` kuznet
2003-06-04  2:01         ` Nivedita Singhvi
2003-06-04  3:23           ` David S. Miller
2003-06-04  4:35             ` David Mosberger
2003-06-04  4:40               ` Nivedita Singhvi
2003-06-04  5:34                 ` David Mosberger
2003-06-04  5:52                   ` David S. Miller
2003-06-04  6:12                     ` David Mosberger
2003-06-04  6:04                   ` Nivedita Singhvi
2003-06-04  6:19                     ` David Mosberger
2003-06-04  7:51                       ` David S. Miller
2003-06-04  4:47               ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).