R: Kernel bug handling TCP_RTO

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* R: Kernel bug handling TCP_RTO_MAX?
@ 2002-12-12 20:18 Andreani Stefano
  2002-12-12 20:32 ` David S. Miller
  2002-12-12 21:16 ` Alan Cox
  0 siblings, 2 replies; 19+ messages in thread
From: Andreani Stefano @ 2002-12-12 20:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, linux-net

Never say never ;-) 
I need to change it now as a temporary workaround for a problem in the UMTS core network of my company. But I think there could be thousands of situations where a fine tuning of this TCP parameter could be useful.

Any contributes on the problem?

Stefano.

-----Messaggio originale-----
Da: David S. Miller [mailto:davem@redhat.com]
Inviato: giovedì 12 dicembre 2002 20.59
A: Andreani Stefano
Cc: linux-kernel@vger.kernel.org; linux-net@vger.kernel.org
Oggetto: Re: Kernel bug handling TCP_RTO_MAX?

   From: "Andreani Stefano" <stefano.andreani.ap@h3g.it>
   Date: Thu, 12 Dec 2002 20:15:42 +0100

   Problem: I need to change the max value of the TCP retransmission
   timeout.

Why?  There should be zero reason to change this value.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-12 20:18 Andreani Stefano
@ 2002-12-12 20:32 ` David S. Miller
  2002-12-12 21:16 ` Alan Cox
  1 sibling, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-12-12 20:32 UTC (permalink / raw)
  To: stefano.andreani.ap; +Cc: linux-kernel, linux-net

   From: "Andreani Stefano" <stefano.andreani.ap@h3g.it>
   Date: Thu, 12 Dec 2002 21:18:21 +0100

   Never say never ;-) 
   I need to change it now as a temporary workaround for a problem in
   the UMTS core network of my company. But I think there could be
   thousands of situations where a fine tuning of this TCP parameter
   could be useful.

You still aren't giving specific examples and details of
the problem you are seeing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
@ 2002-12-12 20:37 Nivedita Singhvi
  0 siblings, 0 replies; 19+ messages in thread
From: Nivedita Singhvi @ 2002-12-12 20:37 UTC (permalink / raw)
  To: stefano.andreani.ap, linux-kernel

> Never say never ;-) 
> I need to change it now as a temporary workaround for a problem in the UMTS core \
> network of my company. But I think there could be thousands of situations where a \
> fine tuning of this TCP parameter could be useful.
> 
> Any contributes on the problem?

If what you are trying to do is terminate the connection earlier,
than reduce the tcp sysctl variable tcp_retries2. This should be the
maximum number of retransmits TCP will make in established state.

The TCP_RTO_MAX parameter is simply an *upper bound* on the 
value of the retransmission timeout, which increases exponentially
from the original timeout value. 

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-12 20:18 Andreani Stefano
  2002-12-12 20:32 ` David S. Miller
@ 2002-12-12 21:16 ` Alan Cox
  2002-12-13  2:26   ` Nivedita Singhvi
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Cox @ 2002-12-12 21:16 UTC (permalink / raw)
  To: Andreani Stefano; +Cc: David S. Miller, Linux Kernel Mailing List, linux-net

On Thu, 2002-12-12 at 20:18, Andreani Stefano wrote:
> Never say never ;-) 
> I need to change it now as a temporary workaround for a problem in the UMTS core network of my company. But I think there could be thousands of situations where a fine tuning of this TCP parameter could be useful.
>
The default is too short ?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-12 21:16 ` Alan Cox
@ 2002-12-13  2:26   ` Nivedita Singhvi
  2002-12-13  3:39     ` Matti Aarnio
  0 siblings, 1 reply; 19+ messages in thread
From: Nivedita Singhvi @ 2002-12-13  2:26 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andreani Stefano, David S. Miller, Linux Kernel Mailing List,
	linux-net

Alan Cox wrote:
> 
> On Thu, 2002-12-12 at 20:18, Andreani Stefano wrote:
> > Never say never ;-)
> > I need to change it now as a temporary workaround for a 
> > problem in the UMTS core network of my company. But I think 
> > there could be thousands of situations where a fine tuning 
> > of this TCP parameter could be useful.
> >
> The default is too short ?

Short?? :). On the contrary...

[I apologize for the length of this note, it became a river ]

here's what it would roughly look like:

assuming HZ = 100 (2.4)

tcp_retries2 = 15 (default) /* The # of retransmits */

TCP_RTO_MAX = 120*HZ = 120 seconds = 120000ms
TCP_RTO_MAX2 = 6*HZ = 6 seconds = 6000 ms /* modified value */

TCP_RTO_MIN = HZ/5 = 200ms

Assuming you are on a local lan, your round trip
times are going to be much less than 200 ms, and
so using the TCP_RTO_MIN of 200ms ("The algorithm 
ensures that the rto cant go below that").

At each retransmit, TCP backs off exponentially:

Retransmission #	Default rto (ms)	With TCP_RTO_MAX(2) (ms)
1			200			200
2			400			400
3			800			800
4			1600			1600
5			3200			3200
6			6400			6000
7			12800			6000
8			25600			6000
9			51200			6000
10			102400			6000 
11			120000			6000
12			120000			6000
13			120000			6000
14			120000			6000
15			120000			6000

Total time = 		804.6 seconds		66.2 seconds
			13.4 minutes

So the minimum total time to time out a tcp connection 
(barring application close) would be ~13 minutes in the
default case and 66 seconds with a modified TCP_RTO_MAX
of 6*HZ.

I can see the argument for lowering both, the TCP_RTO_MAX
and the TCP_RTO_MIN default values.

I just did a bunch of testing over satellite, and round trip
times were of the order of 850ms ~ 4000ms.  

The max retransmission timeout of 120 seconds is two orders of
magnitude larger than really the slowest round trip times 
probably experienced on this planet..(Are we trying to make this
work to the moon and back? Surely NASA has its own code??)

Particularly since we also retransmit 15 times, cant we conclude
"Its dead, Jim" earlier??

200ms is for the minimum retransmission timeout is roughly a
thousand times, if not more, the round trip time on a 
fast lan. Since the algorithm is adaptive (a function of the
measured round trip times), what would be the negative
repercussions of lowering this? 

It may not be a good idea to make either tunable, but what about
the default init rto value, TCP_TIMEOUT_INIT, since that would allow a 
starting point of something close to a suitable value? 

The problem with all of the above is that the TCP engine is
global and undifferentiated, and tuning for at least these parameters
is the same regardless of the interface or route or environment..

Yes, we should and want to meet the standards for the internet, and
behave in a network friendly fashion. But all networks != internet.

I'm thinking for eg of a dedicated fast gigabit or better connection 
between a tier 2 webserver and a backend database, for example, that 
has every need of performance and few of standards compliance..

It would be wonderful if we could tune TCP on a per-interface or a 
per-route basis (everything public, for a start, considered the 
internet, and non-routable networks (10, etc), could be configured 
suitably for its environment. (TCP over private LAN - rfc?). Trusting
users would be a big issue..

Any thoughts? How stupid is this? Old hat?? 

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  2:26   ` Nivedita Singhvi
@ 2002-12-13  3:39     ` Matti Aarnio
  2002-12-13  4:45       ` Nivedita Singhvi
  2002-12-13  5:23       ` David S. Miller
  0 siblings, 2 replies; 19+ messages in thread
From: Matti Aarnio @ 2002-12-13  3:39 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Alan Cox, Andreani Stefano, David S. Miller,
	Linux Kernel Mailing List, linux-net



On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
> Alan Cox wrote:
> > The default is too short ?
> 
> Short?? :). On the contrary...
> 
> here's what it would roughly look like:
> 
> assuming HZ = 100 (2.4)
> 
> tcp_retries2 = 15 (default) /* The # of retransmits */
> 
> TCP_RTO_MAX = 120*HZ = 120 seconds = 120000ms
> TCP_RTO_MAX2 = 6*HZ = 6 seconds = 6000 ms /* modified value */
> 
> TCP_RTO_MIN = HZ/5 = 200ms
> 
> Assuming you are on a local lan, your round trip
> times are going to be much less than 200 ms, and
> so using the TCP_RTO_MIN of 200ms ("The algorithm 
> ensures that the rto cant go below that").

  The RTO steps in only when there is a need to RETRANSMIT.
  For that reason, it makes no sense to place its start
  any shorter.

> At each retransmit, TCP backs off exponentially:
> 
> Retransmission #	Default rto (ms)	With TCP_RTO_MAX(2) (ms)
> 1			200			200
...
> 14			120000			6000
> 15			120000			6000
> 
> Total time = 		804.6 seconds		66.2 seconds
> 			13.4 minutes
> 		
> So the minimum total time to time out a tcp connection 
> (barring application close) would be ~13 minutes in the
> default case and 66 seconds with a modified TCP_RTO_MAX
> of 6*HZ.

  You can have this by doing carefull non-blocking socket
  coding, and protocol traffic monitoring along with
  protocol level keepalive ping-pong packets to have
  something flying around  (like NJE ping-pong, not
  that every IBM person knows what that is/was..)

> I can see the argument for lowering both, the TCP_RTO_MAX
> and the TCP_RTO_MIN default values.

  I don't.

> I just did a bunch of testing over satellite, and round trip
> times were of the order of 850ms ~ 4000ms.  
> 
> The max retransmission timeout of 120 seconds is two orders of
> magnitude larger than really the slowest round trip times 
> probably experienced on this planet..(Are we trying to make this
> work to the moon and back? Surely NASA has its own code??)

  We try not to kill overloaded network routers while they
  are trying to compensate some line breakage and doing
  large-scale network topology re-routing.

> Particularly since we also retransmit 15 times, cant we conclude
> "Its dead, Jim" earlier??

  No.  I have had LAN spanning-tree flaps taking 60 seconds
  (actually a bit over 30 seconds), and years ago Linux's
  TCP code timed out in that.  It was most annoying to
  use some remote system thru such a network...

> 200ms is for the minimum retransmission timeout is roughly a
> thousand times, if not more, the round trip time on a 
> fast lan. Since the algorithm is adaptive (a function of the
> measured round trip times), what would be the negative
> repercussions of lowering this? 

  When things _fail_ in the lan, what would be sensible value ?
  How long will such abnormality last ?

  In overload, resending quickly won't help a bit, just raise
  the backoff (and prolong overload.)

  Loosing a packet sometimes, and that way needing to retransmit
  is the gray area I can't define quickly.  If it is rare, it
  really does not matter.  If it happens often, there could be
  so serious trouble that having quicker retransmit will only
  aggreviate the trouble more.

> It may not be a good idea to make either tunable, but what about
> the default init rto value, TCP_TIMEOUT_INIT, since that would allow a 
> starting point of something close to a suitable value? 
> 
> The problem with all of the above is that the TCP engine is
> global and undifferentiated, and tuning for at least these parameters
> is the same regardless of the interface or route or environment..

  You are looking for "STP" perhaps ?
  It has a feature of waking all streams retransmits, in between 
  particular machines, when at least one STP frame travels in between
  the hosts.

  I can't find it now from my RFC collection.  Odd at that..
  Neither as a draft.  has it been abandoned ?

> Yes, we should and want to meet the standards for the internet, and
> behave in a network friendly fashion. But all networks != internet.
> 
> I'm thinking for eg of a dedicated fast gigabit or better connection 
> between a tier 2 webserver and a backend database, for example, that 
> has every need of performance and few of standards compliance..
> 
> It would be wonderful if we could tune TCP on a per-interface or a 
> per-route basis (everything public, for a start, considered the 
> internet, and non-routable networks (10, etc), could be configured 
> suitably for its environment. (TCP over private LAN - rfc?). Trusting
> users would be a big issue..
> 
> Any thoughts? How stupid is this? Old hat?? 

  More and more of STP ..

> thanks,
> Nivedita

/Matti Aarnio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  3:39     ` Matti Aarnio
@ 2002-12-13  4:45       ` Nivedita Singhvi
  2002-12-13  6:26         ` Nivedita Singhvi
  2002-12-13 11:40         ` Andrew McGregor
  2002-12-13  5:23       ` David S. Miller
  1 sibling, 2 replies; 19+ messages in thread
From: Nivedita Singhvi @ 2002-12-13  4:45 UTC (permalink / raw)
  To: Matti Aarnio
  Cc: Alan Cox, Andreani Stefano, David S. Miller,
	Linux Kernel Mailing List, linux-net

Matti Aarnio wrote:

> > Assuming you are on a local lan, your round trip
> > times are going to be much less than 200 ms, and
> > so using the TCP_RTO_MIN of 200ms ("The algorithm
> > ensures that the rto cant go below that").
> 
>   The RTO steps in only when there is a need to RETRANSMIT.
>   For that reason, it makes no sense to place its start
>   any shorter.

Not sure I understood your point clearly here - that things
are going to be broken, so dont kick it off too early?

For the most part, dropped packets are recovered by fast 
retransmit getting triggered. So when the retransmission 
timer goes off, I'd agree things are in all likelihood 
messed up. BUT..the default TCP_TIMEOUT_INIT = 300ms, which
is what the timeout calculation engine is fed to begin
with. After that, the actual measured round trip times
smooth out and help make the retransmit timeout accurate.

TCP_RTO_MIN is the lower bound for the rto. On fast
lans, though, if measured round trip times are say .01ms, 
and our MIN is 200ms, thats a thousand times the value - which 
means that we are reacting to events  too far back in time
on the fast lan scale.  If there was congestion
way back then, does that reflect conditions now?? 

> > So the minimum total time to time out a tcp connection
> > (barring application close) would be ~13 minutes in the
> > default case and 66 seconds with a modified TCP_RTO_MAX
> > of 6*HZ.
> 
>   You can have this by doing carefull non-blocking socket
>   coding, and protocol traffic monitoring along with
>   protocol level keepalive ping-pong packets to have
>   something flying around  (like NJE ping-pong, not
>   that every IBM person knows what that is/was..)

Er, this IBMer is unfortunately rather underinformed on that
subject ;) I'll look it up, but I can guestimate what you are 
referring to..True, but for the most part, getting every 
application to be performant and knowledgeable about 
network conditions and program accordingly is hard :). And 
if by protocol level you mean transport level, then we're back to 
altering the protocol. Wouldnt pingpongs just add to the
traffic under all conditions (I admit this is a rather lame point :)).

>   We try not to kill overloaded network routers while they
>   are trying to compensate some line breakage and doing
>   large-scale network topology re-routing.

Good point! :). I have little experience with Internet router traffic
snarls, and am certainly not arguing for a major alteration to
TCP exponential backoff :). See below..(the environment I was
thinking of..)

> > Particularly since we also retransmit 15 times, cant we conclude
> > "Its dead, Jim" earlier??
> 
>   No.  I have had LAN spanning-tree flaps taking 60 seconds
>   (actually a bit over 30 seconds), and years ago Linux's
>   TCP code timed out in that.  It was most annoying to
>   use some remote system thru such a network...

Urgh. Bletch. OK. But minor nit here - how often does that 
happen? Whats the right thing to do in that situation?
Which situation should we optimize our settings for?
I accept, though, that we need that kind of time frame..

>   When things _fail_ in the lan, what would be sensible value ?
>   How long will such abnormality last ?

Hmm, good questions, but ones I'm going to handwave at :).

One, my assumption that the ratio of the (say) ave expected
round trip times to the rto value should be around the same -
i.e why not be as conservative/aggressive as the normal default:

our default init rto is 300, so currently we're going to timeout
on anything thats a 100ms over the min of 200. that is far
less conservative than setting an rto of 200 when your round
trip time is a thousand or 10,000 times less..does that make sense?

The other assumption that I'm operating under is that when
things fail talking to a directly attached host - its because
that host has died (even if its only the app or the NIC, whatever).
i.e. the situation is that your connection is going to break,
except you are going to futilely retransmit 15 times and
wait an interminably long time before you do..hence the 
advantage of learning whats happening quickly..

>   In overload, resending quickly won't help a bit, just raise
>   the backoff (and prolong overload.)

See above..

>   Loosing a packet sometimes, and that way needing to retransmit
>   is the gray area I can't define quickly.  If it is rare, it
>   really does not matter.  If it happens often, there could be
>   so serious trouble that having quicker retransmit will only
>   aggreviate the trouble more.

Thats true..

>   You are looking for "STP" perhaps ?
>   It has a feature of waking all streams retransmits, in between
>   particular machines, when at least one STP frame travels in between
>   the hosts.
> 
>   I can't find it now from my RFC collection.  Odd at that..
>   Neither as a draft.  has it been abandoned ?

Learn something new every day :). Thanks for the ptr. I'll
look it up..

> > It would be wonderful if we could tune TCP on a per-interface or a
> > per-route basis (everything public, for a start, considered the
> > internet, and non-routable networks (10, etc), could be configured
> > suitably for its environment. (TCP over private LAN - rfc?). Trusting
> > users would be a big issue..
> >
> > Any thoughts? How stupid is this? Old hat??
> 
>   More and more of STP ..

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  3:39     ` Matti Aarnio
  2002-12-13  4:45       ` Nivedita Singhvi
@ 2002-12-13  5:23       ` David S. Miller
  1 sibling, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-12-13  5:23 UTC (permalink / raw)
  To: matti.aarnio; +Cc: niv, alan, stefano.andreani.ap, linux-kernel, linux-net

   From: Matti Aarnio <matti.aarnio@zmailer.org>
   Date: Fri, 13 Dec 2002 05:39:28 +0200

   On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
   > Assuming you are on a local lan, your round trip
   > times are going to be much less than 200 ms, and
   > so using the TCP_RTO_MIN of 200ms ("The algorithm 
   > ensures that the rto cant go below that").
   
     The RTO steps in only when there is a need to RETRANSMIT.
     For that reason, it makes no sense to place its start
     any shorter.

Actually, TCP_RTO_MIN cannot be made any smaller without
some serious thought.

The reason it is 200ms is due to the granularity of the BSD
TCP socket timers. 

In short, the repercussions are not exactly well known, so it's
a research problem to fiddle here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  4:45       ` Nivedita Singhvi
@ 2002-12-13  6:26         ` Nivedita Singhvi
  2002-12-13 11:40         ` Andrew McGregor
  1 sibling, 0 replies; 19+ messages in thread
From: Nivedita Singhvi @ 2002-12-13  6:26 UTC (permalink / raw)
  To: Matti Aarnio, Alan Cox, Andreani Stefano, David S. Miller,
	Linux Kernel Mailing List, linux-net

Nivedita Singhvi wrote:

> our default init rto is 300, so currently we're going to timeout
> on anything thats a 100ms over the min of 200. that is far
> less conservative than setting an rto of 200 when your round
> trip time is a thousand or 10,000 times less..does that make sense?

Doh! init rto is NOT 300ms, its 3 seconds. That minor blooper
shreds my comparison argument a tad :)..but Dave's point renders
that moot, in any case..

"David S. Miller" wrote:

> Actually, TCP_RTO_MIN cannot be made any smaller without
> some serious thought.
> 
> The reason it is 200ms is due to the granularity of the BSD
> TCP socket timers.
> 
> In short, the repercussions are not exactly well known, so it's
> a research problem to fiddle here.

Ack.

Sometime in the not too distant future, the next generation of 
infrastructure will require this to be reworked  :).

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
@ 2002-12-13  6:55 David Stevens
  2002-12-13  6:59 ` David S. Miller
  0 siblings, 1 reply; 19+ messages in thread
From: David Stevens @ 2002-12-13  6:55 UTC (permalink / raw)
  To: David S. Miller
  Cc: matti.aarnio, niv, alan, stefano.andreani.ap, linux-kernel,
	linux-net





      I believe the very large BSD number was based on the large
granularity of the timer (500ms for slowtimeout), designed for use on a VAX
780. The PC on my desk is 3500 times faster than a VAX 780, and you can
send a lot of data on Gigabit Ethernet instead of sitting on your hands for
an enormous min timeout on modern hardware. Switched gigabit isn't exactly
the same kind of environment as shared 10 Mbps (or 2 Mbps) when that stuff
went in, but the min timeouts are the same.
      I think the exponential back-off should handle most issues for
underestimated timers, and the min RTO should be the timer granularity.
Variability in that is already accounted for by the RTT estimator.
      I certainly agree it needs careful investigating, but it's been a pet
peeve of mine for years on BSD systems that it forced an arbitrary minimum
that had no accounting for hardware differences over the last 20 years.

                                    +-DLS


"David S. Miller" <davem@redhat.com>@vger.kernel.org on 12/12/2002 09:23:35
PM

Sent by:    linux-net-owner@vger.kernel.org


To:    matti.aarnio@zmailer.org
cc:    niv@us.ltcfwd.linux.ibm.com, alan@lxorguk.ukuu.org.uk,
       stefano.andreani.ap@h3g.it, linux-kernel@vger.kernel.org,
       linux-net@vger.kernel.org
Subject:    Re: R: Kernel bug handling TCP_RTO_MAX?



   From: Matti Aarnio <matti.aarnio@zmailer.org>
   Date: Fri, 13 Dec 2002 05:39:28 +0200

   On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
   > Assuming you are on a local lan, your round trip
   > times are going to be much less than 200 ms, and
   > so using the TCP_RTO_MIN of 200ms ("The algorithm
   > ensures that the rto cant go below that").

     The RTO steps in only when there is a need to RETRANSMIT.
     For that reason, it makes no sense to place its start
     any shorter.

Actually, TCP_RTO_MIN cannot be made any smaller without
some serious thought.

The reason it is 200ms is due to the granularity of the BSD
TCP socket timers.

In short, the repercussions are not exactly well known, so it's
a research problem to fiddle here.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  6:55 David Stevens
@ 2002-12-13  6:59 ` David S. Miller
  2002-12-13 11:46   ` Bogdan Costescu
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Miller @ 2002-12-13  6:59 UTC (permalink / raw)
  To: dlstevens
  Cc: matti.aarnio, niv, alan, stefano.andreani.ap, linux-kernel,
	linux-net

   From: David Stevens <dlstevens@us.ibm.com>
   Date: Thu, 12 Dec 2002 23:55:35 -0700
   
         I believe the very large BSD number was based on the large
   granularity of the timer (500ms for slowtimeout), designed for use on a VAX
   780. The PC on my desk is 3500 times faster than a VAX 780, and you can
   send a lot of data on Gigabit Ethernet instead of sitting on your hands for
   an enormous min timeout on modern hardware. Switched gigabit isn't exactly
   the same kind of environment as shared 10 Mbps (or 2 Mbps) when that stuff
   went in, but the min timeouts are the same.

This is well understood, the problem is that BSD's coarse timers are
going to cause all sorts of problems when a Linux stack with a reduced
MIN RTO talks to it.

Consider also, delayed ACKs and possible false retransmits this could
induce with a smaller MIN RTO.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  4:45       ` Nivedita Singhvi
  2002-12-13  6:26         ` Nivedita Singhvi
@ 2002-12-13 11:40         ` Andrew McGregor
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew McGregor @ 2002-12-13 11:40 UTC (permalink / raw)
  To: Nivedita Singhvi, Matti Aarnio
  Cc: Alan Cox, Andreani Stefano, David S. Miller,
	Linux Kernel Mailing List, linux-net

Er, wasn't that SCTP?  If so, that's RFC 3309 and many, many drafts.  You 
might also want to look at DCCP (draft-ietf-dccp-*) and the various 
documents from the IETF's PILC group.  There is also a proposal for a new 
TCP-style protocol with a real differential controller, the name of which I 
can't recall right now.

See also draft-allman-tcp-sack for another proposal for a fix that won't 
break old stacks.  Also draft-ietf-tsvwg-tcp-eifel-alg, 
draft-ietf-tsvwg-tcp-eifel-response and many more.

I can't claim to be a TCP expert, but TCP_RTO_MIN can certainly have a 
different value for IPv6, where I believe millisecond reolution timers are 
required, so 2ms would be correct.

Unfortuntately, TCP is incredibly subtle.  So, the IETF are really 
conservative about even suggesting modifications to it, because a common 
and badly behaved stack can cause major disasters in the 'net.

Andrew

--On Thursday, December 12, 2002 20:45:24 -0800 Nivedita Singhvi 
<niv@us.ibm.com> wrote:

>>   You are looking for "STP" perhaps ?
>>   It has a feature of waking all streams retransmits, in between
>>   particular machines, when at least one STP frame travels in between
>>   the hosts.
>>
>>   I can't find it now from my RFC collection.  Odd at that..
>>   Neither as a draft.  has it been abandoned ?
>
> Learn something new every day :). Thanks for the ptr. I'll
> look it up..
>
>> > It would be wonderful if we could tune TCP on a per-interface or a
>> > per-route basis (everything public, for a start, considered the
>> > internet, and non-routable networks (10, etc), could be configured
>> > suitably for its environment. (TCP over private LAN - rfc?). Trusting
>> > users would be a big issue..
>> >
>> > Any thoughts? How stupid is this? Old hat??
>>
>>   More and more of STP ..
>
> thanks,
> Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13  6:59 ` David S. Miller
@ 2002-12-13 11:46   ` Bogdan Costescu
  2002-12-13 11:48     ` Andrew McGregor
  0 siblings, 1 reply; 19+ messages in thread
From: Bogdan Costescu @ 2002-12-13 11:46 UTC (permalink / raw)
  To: David S. Miller
  Cc: dlstevens, matti.aarnio, niv, alan, stefano.andreani.ap,
	linux-kernel, linux-net

On Thu, 12 Dec 2002, David S. Miller wrote:

> This is well understood, the problem is that BSD's coarse timers are
> going to cause all sorts of problems when a Linux stack with a reduced
> MIN RTO talks to it.

Sorry to jump into the discussion without a good understanding of inner 
workings of TCP, I just want to share my view as a possible user of this:
one of the messages at the beginning of the thread said that this would be 
useful on a closed network and I think that this point was overlooked.

Think of a closed network with only Linux machines on it (world
domination, right :-))  like a Beowulf cluster, web frontends talking to
NFS fileservers, web frontends talking to database backends, etc. Again as 
proposed earlier, border hosts (those connected to both the closed 
network and outside one) could change their communication parameters based 
on device or route and this would become an internal affair that would not 
affect communication with other stacks.

I don't want to suggest to make this the default behaviour; rather, have 
it a parameter that can be changed by the sysadmin and have the current 
value as default.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 11:46   ` Bogdan Costescu
@ 2002-12-13 11:48     ` Andrew McGregor
  2002-12-13 12:33       ` Bogdan Costescu
  2002-12-13 18:07       ` Nivedita Singhvi
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew McGregor @ 2002-12-13 11:48 UTC (permalink / raw)
  To: Bogdan Costescu, David S. Miller
  Cc: dlstevens, matti.aarnio, niv, alan, stefano.andreani.ap,
	linux-kernel, linux-net

You're going to make lots of IETFer's really annoyed by suggesting that :-)

Honestly, there are lots of other ways to solve this, and it would be nice 
if the IETF's recent additions got implemented; there are many relevant 
things going on there.  Those interested should just talk to the draft 
authors about implementing things.  It's an open organisation just like 
linux-kernel after all, just a bit more formal.

In a closed network, why not have SOCK_STREAM map to something faster than 
TCP anyway?  That is, if I connect(address matching localnet), SOCK_STREAM 
maps to (eg) SCTP.  That would be a far more dramatic performance hack!

Andrew

--On Friday, December 13, 2002 12:46:15 +0100 Bogdan Costescu 
<bogdan.costescu@iwr.uni-heidelberg.de> wrote:

> On Thu, 12 Dec 2002, David S. Miller wrote:
>
>> This is well understood, the problem is that BSD's coarse timers are
>> going to cause all sorts of problems when a Linux stack with a reduced
>> MIN RTO talks to it.
>
> Sorry to jump into the discussion without a good understanding of inner
> workings of TCP, I just want to share my view as a possible user of this:
> one of the messages at the beginning of the thread said that this would
> be  useful on a closed network and I think that this point was overlooked.
>
> Think of a closed network with only Linux machines on it (world
> domination, right :-))  like a Beowulf cluster, web frontends talking to
> NFS fileservers, web frontends talking to database backends, etc. Again
> as  proposed earlier, border hosts (those connected to both the closed
> network and outside one) could change their communication parameters
> based  on device or route and this would become an internal affair that
> would not  affect communication with other stacks.
>
> I don't want to suggest to make this the default behaviour; rather, have
> it a parameter that can be changed by the sysadmin and have the current
> value as default.
>
> --
> Bogdan Costescu
>
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 11:48     ` Andrew McGregor
@ 2002-12-13 12:33       ` Bogdan Costescu
  2002-12-13 13:07         ` Andrew McGregor
  2002-12-13 18:07       ` Nivedita Singhvi
  1 sibling, 1 reply; 19+ messages in thread
From: Bogdan Costescu @ 2002-12-13 12:33 UTC (permalink / raw)
  To: Andrew McGregor
  Cc: David S. Miller, dlstevens, matti.aarnio, niv, alan,
	stefano.andreani.ap, linux-kernel, linux-net

On Sat, 14 Dec 2002, Andrew McGregor wrote:

> You're going to make lots of IETFer's really annoyed by suggesting that :-)

I hope not. That was the reason for allowing it to be tuned and for having 
a default value equal to the existing one.

> In a closed network, why not have SOCK_STREAM map to something faster than 
> TCP anyway?

Sure, just give me a protocol that:
- is reliable
- has low latency
- comes with the standard kernel
and I'll just use it. But you always get only 2 out ot 3...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 12:33       ` Bogdan Costescu
@ 2002-12-13 13:07         ` Andrew McGregor
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew McGregor @ 2002-12-13 13:07 UTC (permalink / raw)
  To: Bogdan Costescu
  Cc: David S. Miller, dlstevens, matti.aarnio, niv, alan,
	stefano.andreani.ap, linux-kernel, linux-net

--On Friday, December 13, 2002 13:33:16 +0100 Bogdan Costescu 
<bogdan.costescu@iwr.uni-heidelberg.de> wrote:

> On Sat, 14 Dec 2002, Andrew McGregor wrote:
>
>> You're going to make lots of IETFer's really annoyed by suggesting that
>> :-)
>
> I hope not. That was the reason for allowing it to be tuned and for
> having  a default value equal to the existing one.

I know the folks in question :-)  Actually, they'd be nice about it, but 
say something like:

Well, RFC 2988 says that the present value is too small and should be 1s, 
although I take it from other discussion that experiment shows 200ms to be 
OK.

Instead, RFCs 3042 and 3390 present the IETF's preferred approach that has 
actually made it through the process.  But there are lots of drafts in 
progress, so that isn't the final word, although it is certainly better 
than tuning down RTO_MAX.

Now, I have no idea if the kernel presently implements the latter two by 
default (and on a quick look I can't find either in the code).  If not, it 
should.  Shouldn't the initial window be a tunable?

>> In a closed network, why not have SOCK_STREAM map to something faster
>> than  TCP anyway?
>
> Sure, just give me a protocol that:
> - is reliable
> - has low latency
> - comes with the standard kernel
> and I'll just use it. But you always get only 2 out ot 3...
>
> --
> Bogdan Costescu

SCTP is in 2.5 now.  Does that not fit the bill?  I admit, I don't know 
about the reliability, although I guess I'm going to find out as I have 
cause to use it shortly.  Wearing an IETF hat, I'd like to hear about this, 
as I'm on a bit of a practicality crusade there :-)

Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 11:48     ` Andrew McGregor
  2002-12-13 12:33       ` Bogdan Costescu
@ 2002-12-13 18:07       ` Nivedita Singhvi
  2002-12-13 22:25         ` Andrew McGregor
  2002-12-13 22:58         ` Matti Aarnio
  1 sibling, 2 replies; 19+ messages in thread
From: Nivedita Singhvi @ 2002-12-13 18:07 UTC (permalink / raw)
  To: Andrew McGregor
  Cc: Bogdan Costescu, David S. Miller, dlstevens, matti.aarnio, alan,
	stefano.andreani.ap, linux-kernel, linux-net

Andrew McGregor wrote:

> In a closed network, why not have SOCK_STREAM map to something faster than
> TCP anyway?  That is, if I connect(address matching localnet), SOCK_STREAM
> maps to (eg) SCTP.  That would be a far more dramatic performance hack!
> 
> Andrew

Not that simple. SCTP (if that is what Matti was referring to) is 
a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
getting done implementing a testsuite against the SCTP API.

i.e. You have to know you want an SCTP socket at the time you
open the socket. You certainly have no idea whether youre on
a closed network or not, for that matter, the app may want to talk
on multiple interfaces etc. (Most hosts will have one interface
on a public net)..

Currently, Linux SCTP doesn't yet support TCP style i.e SOCK_STREAM
sockets, we only do udp-style sockets (SOCK_SEQPACKET).  We will be
putting in SOCK_STREAM support next, but understand that performance
is not something that has been addressed yet, and a performant SCTP
is still some ways away (though I'm sure Jon and Sridhar will be 
working their tails off to do so  ;)). 

But dont expect SCTP to be the surreptitious underlying layer
carrying TCP traffic, if thats an expectation that anyone has :)

Solving this problem without application involvement is a 
more limited scenario..

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 18:07       ` Nivedita Singhvi
@ 2002-12-13 22:25         ` Andrew McGregor
  2002-12-13 22:58         ` Matti Aarnio
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew McGregor @ 2002-12-13 22:25 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Bogdan Costescu, David S. Miller, dlstevens, matti.aarnio, alan,
	stefano.andreani.ap, linux-kernel, linux-net



--On Friday, December 13, 2002 10:07:01 -0800 Nivedita Singhvi 
<niv@us.ibm.com> wrote:

> Andrew McGregor wrote:
>
>> In a closed network, why not have SOCK_STREAM map to something faster
>> than TCP anyway?  That is, if I connect(address matching localnet),
>> SOCK_STREAM maps to (eg) SCTP.  That would be a far more dramatic
>> performance hack!
>>
>> Andrew
>
> Not that simple. SCTP (if that is what Matti was referring to) is
> a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
> getting done implementing a testsuite against the SCTP API.
>
> i.e. You have to know you want an SCTP socket at the time you
> open the socket. You certainly have no idea whether youre on
> a closed network or not, for that matter, the app may want to talk
> on multiple interfaces etc. (Most hosts will have one interface
> on a public net)..

Things are never that simple.  But I was basically talking about a local 
policy to change the (semantics of the) API in certain cases.  It's 
probably a bad idea and would cause all kinds of breakage, but it is 
interesting to think about.

>
> Currently, Linux SCTP doesn't yet support TCP style i.e SOCK_STREAM
> sockets, we only do udp-style sockets (SOCK_SEQPACKET).  We will be
> putting in SOCK_STREAM support next, but understand that performance
> is not something that has been addressed yet, and a performant SCTP
> is still some ways away (though I'm sure Jon and Sridhar will be
> working their tails off to do so  ;)).

I wasn't aware of the current status.  Ok, that's just where it's at.

>
> But dont expect SCTP to be the surreptitious underlying layer
> carrying TCP traffic, if thats an expectation that anyone has :)

That's my particular kind of crazy idea.

>
> Solving this problem without application involvement is a
> more limited scenario..

Indeed.

>
> thanks,
> Nivedita
>
>

Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: R: Kernel bug handling TCP_RTO_MAX?
  2002-12-13 18:07       ` Nivedita Singhvi
  2002-12-13 22:25         ` Andrew McGregor
@ 2002-12-13 22:58         ` Matti Aarnio
  1 sibling, 0 replies; 19+ messages in thread
From: Matti Aarnio @ 2002-12-13 22:58 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Andrew McGregor, Bogdan Costescu, David S. Miller, dlstevens,
	matti.aarnio, alan, stefano.andreani.ap, linux-kernel, linux-net

On Fri, Dec 13, 2002 at 10:07:01AM -0800, Nivedita Singhvi wrote:
> Andrew McGregor wrote:
> > In a closed network, why not have SOCK_STREAM map to something faster than
> > TCP anyway?  That is, if I connect(address matching localnet), SOCK_STREAM
> > maps to (eg) SCTP.  That would be a far more dramatic performance hack!
> > 
> > Andrew
> 
> Not that simple. SCTP (if that is what Matti was referring to) is 
> a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
> getting done implementing a testsuite against the SCTP API.

  Most likely that is what I did mean.
Things in IETF do on occasion change names, or I don't always
remember all characters in (E)TLA-acronyms I use rarely...

...
> But dont expect SCTP to be the surreptitious underlying layer
> carrying TCP traffic, if thats an expectation that anyone has :)

At least I didn't expect that, don't know of others.

It all depends on application coders, if users will be able
to use arbitrary network protocols -- say any SOCK_STREAM
protocol supported now, and in future by system kernel.
Ever heard of "TLI" ?

> Solving this problem without application involvement is a 
> more limited scenario..

Yes, but sufficiently important to occasionally.

Doing things like this mapping might make limited sense
via routing table lookups.

> thanks,
> Nivedita

/Matti Aarnio

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-12-13 22:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-12 20:37 R: Kernel bug handling TCP_RTO_MAX? Nivedita Singhvi
  -- strict thread matches above, loose matches on Subject: below --
2002-12-13  6:55 David Stevens
2002-12-13  6:59 ` David S. Miller
2002-12-13 11:46   ` Bogdan Costescu
2002-12-13 11:48     ` Andrew McGregor
2002-12-13 12:33       ` Bogdan Costescu
2002-12-13 13:07         ` Andrew McGregor
2002-12-13 18:07       ` Nivedita Singhvi
2002-12-13 22:25         ` Andrew McGregor
2002-12-13 22:58         ` Matti Aarnio
2002-12-12 20:18 Andreani Stefano
2002-12-12 20:32 ` David S. Miller
2002-12-12 21:16 ` Alan Cox
2002-12-13  2:26   ` Nivedita Singhvi
2002-12-13  3:39     ` Matti Aarnio
2002-12-13  4:45       ` Nivedita Singhvi
2002-12-13  6:26         ` Nivedita Singhvi
2002-12-13 11:40         ` Andrew McGregor
2002-12-13  5:23       ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox