Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 23:15 UTC (permalink / raw)
  To: rick.jones2; +Cc: ian.mcdonald, netdev, ilpo.jarvinen
In-Reply-To: <46D5FBF3.5050700@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 16:06:27 -0700

> I belive the biggest component comes from link-layer retransmissions. 
> There can also be some short outtages thanks to signal blocking, 
> tunnels, people with big hats and whatnot that the link-layer 
> retransmissions are trying to address.  The three seconds seems to be a 
> value that gives the certainty that 99 times out of 10 the segment was 
> indeed lost.
> 
> The trace I've been sent shows clean RTTs ranging from ~200 milliseconds 
> to ~7000 milliseconds.

Thanks for the info.

It's pretty easy to generate examples where we might have some sockets
talking over interfaces on such a network and others which are not.
Therefore, if we do this, a per-route metric is probably the best bet.

Ilpo, I'm also very interested to see what you think of all of this
:-)

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Rick Jones @ 2007-08-29 23:06 UTC (permalink / raw)
  To: David Miller; +Cc: ian.mcdonald, netdev
In-Reply-To: <20070829.153503.18295527.davem@davemloft.net>

> All of this seems to suggest that the RTO calculation is wrong.

That is a possiblity.  Or at least could be enhanced.

> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
> 
> What exactly causes such a huge delay?  What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?

I belive the biggest component comes from link-layer retransmissions. 
There can also be some short outtages thanks to signal blocking, 
tunnels, people with big hats and whatnot that the link-layer 
retransmissions are trying to address.  The three seconds seems to be a 
value that gives the certainty that 99 times out of 10 the segment was 
indeed lost.

The trace I've been sent shows clean RTTs ranging from ~200 milliseconds 
to ~7000 milliseconds.

rick

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Edgar E. Iglesias @ 2007-08-29 22:53 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
In-Reply-To: <20070829.153503.18295527.davem@davemloft.net>

On Wed, Aug 29, 2007 at 03:35:03PM -0700, David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 15:29:03 -0700
> 
> > David Miller wrote:
> > > None of the research folks want to commit to saying a lower value is
> > > OK, even though it's quite clear that on a local 10 gigabit link a
> > > minimum value of even 200 is absolutely and positively absurd.
> > > 
> > > So what do these cellphone network people want to do, increate the
> > > minimum RTO or increase it?  Exactly how does it help them?
> > 
> > They want to increase it.  The folks who triggered this want to make it 
> > 3 seconds to avoid spurrious RTOs.  Their experience the "other 
> > platform" they widh to replace suggests that 3 seconds is a good value 
> > for their network.
> > 
> > > If the issue is wireless loss, algorithms like FRTO might help them,
> > > because FRTO tries to make a distinction between capacity losses
> > > (which should adjust cwnd) and radio losses (which are not capacity
> > > based and therefore should not affect cwnd).
> > 
> > I was looking at that.  FRTO seems only to affect the cwnd calculations, 
> > and not the RTO calculation, so it seems to "deal with" spurrious RTOs 
> > rather than preclude them.  There is a strong desire here to not have 
> > spurrious RTO's in the first place.  Each spurrious retransmission will 
> > increase a user's charges.
> 
> All of this seems to suggest that the RTO calculation is wrong.
> 
> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
> 
> What exactly causes such a huge delay?  What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?

I don't know what they are doing, but it reminds me of what happens when
you run TCP over a reliable medium. You don't see loss, instead the
RTT starts to jitter alot. 

IIRC FRTO does help avoid unnecessary retransmits (although the RTO still
hits).

Best regards
-- 
        Programmer
        Edgar E. Iglesias <edgar.iglesias@axis.com> 46.46.272.1946

^ permalink raw reply

* Re: NCR, was [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:59 UTC (permalink / raw)
  To: jheffner; +Cc: shemminger, ian.mcdonald, rick.jones2, netdev
In-Reply-To: <46D5FA04.1060600@psc.edu>

From: John Heffner <jheffner@psc.edu>
Date: Wed, 29 Aug 2007 18:58:12 -0400

> I don't believe this was the case.  NCR is substantially different, and 
> came out of work at Texas A&M.  The original (only) implementation was 
> in Linux IIRC.  Its goal was to do better.  Their papers say it does. 
> It might be worth looking at.
> 
> In my own experience with reordering, Alexey's code had some 
> hard-to-track-down bugs (look at all the work Ilpo's been doing), and 
> the relative simplicity of NCR may be one of the reasons it does well in 
> tests.

Interesting, thanks for the info John.

^ permalink raw reply

* Re: NCR, was [PATCH] make _minimum_ TCP retransmission timeout configurable
From: John Heffner @ 2007-08-29 22:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, ian.mcdonald, rick.jones2, netdev
In-Reply-To: <20070829155106.43cf69eb@freepuppy.rosehill.hemminger.net>

Stephen Hemminger wrote:
> On Wed, 29 Aug 2007 15:28:12 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>> And reading NCR some more, we already have something similar in the
>> form of Alexey's reordering detection, in fact it handles exactly the
>> case NCR supposedly deals with.  We do not trigger loss recovery
>> strictly on the 3rd duplicate ACK, and we've known about and dealt
>> with the reordering issue explicitly for years.
>>
> 
> Yeah, it looked like another case of BSD RFC writers reinventing
> Linux algorithms, but it is worth getting the behaviour standardized
> and more widely reviewed.

I don't believe this was the case.  NCR is substantially different, and 
came out of work at Texas A&M.  The original (only) implementation was 
in Linux IIRC.  Its goal was to do better.  Their papers say it does. 
It might be worth looking at.

In my own experience with reordering, Alexey's code had some 
hard-to-track-down bugs (look at all the work Ilpo's been doing), and 
the relative simplicity of NCR may be one of the reasons it does well in 
tests.

   -John

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: John Heffner @ 2007-08-29 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
In-Reply-To: <46D5F7C8.8090806@psc.edu>

John Heffner wrote:
>> What exactly causes such a huge delay?  What is the TCP measured RTO
>> in these circumstances where spurious RTOs happen and a 3 second
>> minimum RTO makes things better?
> 
> I haven't done a lot of work on wireless myself, but my understanding is 
> that one of the biggest problems is the behavior link-layer 
> retransmission schemes.  They can suddenly increase the delay of packets 
> by a significant amount when you get a burst of radio interference. It's 
> hard for TCP to gracefully handle this kind of jump without some minimum 
> RTO, especially since wlan RTTs can often be quite small.

(Replying to myself) Though F-RTO does often help in this case.

   -John

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Stephen Hemminger @ 2007-08-29 22:51 UTC (permalink / raw)
  To: David Miller; +Cc: ian.mcdonald, rick.jones2, netdev
In-Reply-To: <20070829.152812.74548456.davem@davemloft.net>

On Wed, 29 Aug 2007 15:28:12 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Wed, 29 Aug 2007 15:13:01 -0700
> 
> > There was some discussion about implementing TCP NCR (RFC4653)
> > and Narasimha Reddy said he might have something that could be used.
> 
> Although this looks interesting, I'm unsure it will help these
> cell folks.  Actually I can't tell for sure until Rick provides
> us with some more details of the exact issue at hand.
> 
> NCR seems to deal with when the trigger loss recovery, whereas
> the cell phone network folks aparently want to jack up TCP_RTO_MIN
> so that hard timeout based retranmits are deferred a lot more
> than normal.
> 
> And reading NCR some more, we already have something similar in the
> form of Alexey's reordering detection, in fact it handles exactly the
> case NCR supposedly deals with.  We do not trigger loss recovery
> strictly on the 3rd duplicate ACK, and we've known about and dealt
> with the reordering issue explicitly for years.
> 

Yeah, it looked like another case of BSD RFC writers reinventing
Linux algorithms, but it is worth getting the behaviour standardized
and more widely reviewed.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: John Heffner @ 2007-08-29 22:48 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, ian.mcdonald, netdev
In-Reply-To: <20070829.153503.18295527.davem@davemloft.net>

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Wed, 29 Aug 2007 15:29:03 -0700
> 
>> David Miller wrote:
>>> None of the research folks want to commit to saying a lower value is
>>> OK, even though it's quite clear that on a local 10 gigabit link a
>>> minimum value of even 200 is absolutely and positively absurd.
>>>
>>> So what do these cellphone network people want to do, increate the
>>> minimum RTO or increase it?  Exactly how does it help them?
>> They want to increase it.  The folks who triggered this want to make it 
>> 3 seconds to avoid spurrious RTOs.  Their experience the "other 
>> platform" they widh to replace suggests that 3 seconds is a good value 
>> for their network.
>>
>>> If the issue is wireless loss, algorithms like FRTO might help them,
>>> because FRTO tries to make a distinction between capacity losses
>>> (which should adjust cwnd) and radio losses (which are not capacity
>>> based and therefore should not affect cwnd).
>> I was looking at that.  FRTO seems only to affect the cwnd calculations, 
>> and not the RTO calculation, so it seems to "deal with" spurrious RTOs 
>> rather than preclude them.  There is a strong desire here to not have 
>> spurrious RTO's in the first place.  Each spurrious retransmission will 
>> increase a user's charges.
> 
> All of this seems to suggest that the RTO calculation is wrong.

I think there's definitely room for improving the RTO calculation. 
However, this may not be the end-all fix...


> It seems that packets in this network can be delayed several orders of
> magnitude longer than the usual round trip as measured by TCP.
> 
> What exactly causes such a huge delay?  What is the TCP measured RTO
> in these circumstances where spurious RTOs happen and a 3 second
> minimum RTO makes things better?

I haven't done a lot of work on wireless myself, but my understanding is 
that one of the biggest problems is the behavior link-layer 
retransmission schemes.  They can suddenly increase the delay of packets 
by a significant amount when you get a burst of radio interference. 
It's hard for TCP to gracefully handle this kind of jump without some 
minimum RTO, especially since wlan RTTs can often be quite small.

   -John

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:37 UTC (permalink / raw)
  To: ian.mcdonald; +Cc: rick.jones2, netdev
In-Reply-To: <5640c7e00708291533s30868cc8t69b989f0e53b9569@mail.gmail.com>

From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 10:33:32 +1200

> Correct - they often have flaws in them, just like all documents. If
> that is the case we should try and get the RFCs fixed.

In many cases it is not the wording, but the actual concept or idea
the RFC itself is describing which is fatally flawed.

TCP timestamps are a great example, as designed they simply do not
work when ACKs are reordered by the network because it makes the PAWS
test fail for the out of order ACKs.

Therefore everyone adds an extra fuzz to the PAWS test so that a small
window of "older" packets are allowed to pass the check.

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:35 UTC (permalink / raw)
  To: rick.jones2; +Cc: ian.mcdonald, netdev
In-Reply-To: <46D5F32F.2070502@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 15:29:03 -0700

> David Miller wrote:
> > None of the research folks want to commit to saying a lower value is
> > OK, even though it's quite clear that on a local 10 gigabit link a
> > minimum value of even 200 is absolutely and positively absurd.
> > 
> > So what do these cellphone network people want to do, increate the
> > minimum RTO or increase it?  Exactly how does it help them?
> 
> They want to increase it.  The folks who triggered this want to make it 
> 3 seconds to avoid spurrious RTOs.  Their experience the "other 
> platform" they widh to replace suggests that 3 seconds is a good value 
> for their network.
> 
> > If the issue is wireless loss, algorithms like FRTO might help them,
> > because FRTO tries to make a distinction between capacity losses
> > (which should adjust cwnd) and radio losses (which are not capacity
> > based and therefore should not affect cwnd).
> 
> I was looking at that.  FRTO seems only to affect the cwnd calculations, 
> and not the RTO calculation, so it seems to "deal with" spurrious RTOs 
> rather than preclude them.  There is a strong desire here to not have 
> spurrious RTO's in the first place.  Each spurrious retransmission will 
> increase a user's charges.

All of this seems to suggest that the RTO calculation is wrong.

It seems that packets in this network can be delayed several orders of
magnitude longer than the usual round trip as measured by TCP.

What exactly causes such a huge delay?  What is the TCP measured RTO
in these circumstances where spurious RTOs happen and a 3 second
minimum RTO makes things better?

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Ian McDonald @ 2007-08-29 22:33 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev
In-Reply-To: <20070829.152059.48818008.davem@davemloft.net>

On 8/30/07, David Miller <davem@davemloft.net> wrote:
> In fact this is a great example why we don't treat RFCs as dictations
> from the gods.  They are often wrong, impractical, or full of fatal
> flaws.
>
Correct - they often have flaws in them, just like all documents. If
that is the case we should try and get the RFCs fixed. I've raised
this in a discussion in the ICCRG group and see if I get any sort of
response.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Rick Jones @ 2007-08-29 22:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, ian.mcdonald, netdev
In-Reply-To: <20070829151301.495f3d6e@freepuppy.rosehill.hemminger.net>

 From what I've seen thusfar, the issue isn't so much actual loss, but 
very variable RTTs leading to spurrious RTOs.

rick jones

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Rick Jones @ 2007-08-29 22:29 UTC (permalink / raw)
  To: David Miller; +Cc: ian.mcdonald, netdev
In-Reply-To: <20070829.144656.104048365.davem@davemloft.net>

David Miller wrote:
> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
> 
> 
>>So I'm suspecting that the default should be changed to 1000 to match
>>the RFC which would solve this issue. I note that the RFC is a SHOULD
>>rather than a MUST. I had a quick look around and not sure why Linux
>>overrides the RFC on this one.
> 
> 
> Everyone uses this value, even BSD since ancient times.

Or at least something close to it - some use 500 milliseconds for 
"tcp_rto_min."

> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
> 
> So what do these cellphone network people want to do, increate the
> minimum RTO or increase it?  Exactly how does it help them?

They want to increase it.  The folks who triggered this want to make it 
3 seconds to avoid spurrious RTOs.  Their experience the "other 
platform" they widh to replace suggests that 3 seconds is a good value 
for their network.

> If the issue is wireless loss, algorithms like FRTO might help them,
> because FRTO tries to make a distinction between capacity losses
> (which should adjust cwnd) and radio losses (which are not capacity
> based and therefore should not affect cwnd).

I was looking at that.  FRTO seems only to affect the cwnd calculations, 
and not the RTO calculation, so it seems to "deal with" spurrious RTOs 
rather than preclude them.  There is a strong desire here to not have 
spurrious RTO's in the first place.  Each spurrious retransmission will 
increase a user's charges.

rick

^ permalink raw reply

* Re: [PATCH 1/1] ipv6: corrects sended rtnetlink message
From: Milan Kocian @ 2007-08-29 21:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20070821.002058.102575162.davem@davemloft.net>

On Tue, Aug 21, 2007 at 12:20:58AM -0700, David Miller wrote:
> From: Milan Kocian <milon@wq.cz>
> Date: Wed, 15 Aug 2007 16:33:22 +0200
> 
> > ipv6 sends a RTM_DELLINK netlink message on both events: NETDEV_DOWN,
> > NETDEV_UNREGISTER. Corrected by sending RTM_NEWLINK on NETDEV_DOWN event
> > and RTM_DELLINK on NETDEV_UNREGISTER event.
> 
> Why would we indicate that a new device has appeared on NETDEV_DOWN?
> 
> I don't see any sense in saying "RTM_NEWLINK" for a removal, it's
> for additions.
> 
Sorry for my late reply. I was out.

Because RTM_NEWLINK is used to notify about device status change
(as I see in net/core/rtnetlink.c) and RTM_DELLINK to inform about
NETDEV_UNREGISTER. Why should it be else in ipv6 subsystem ? And
userspace programs (quagga) suppose it. Now userspace get two rtnetlink's
'LINK' messages on 'ip l s down' event. First is RTM_NEWLINK from 
net/core/rtnetlink.c and second is RTM_DELLINK from ipv6.

quagga story:
On NEWLINK (flag IFF_UP is down) message quagga flushes all routes from RIB
but leaves ip adresses. On DELLINK message q. flushes routes and addresses.
On 'ip l s up' event ipv6 sends addresses and routes again but ipv4 not
(it sends only routes). Thus q. stays without knowledge about ipv4's addresses
after 'ip l s down/up' commands.

git change:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=979ad663125af4be120697263038bb06ddbb83b4

So from this point of view I tried to synchronize types of messages
on the same events.

IMHO second possibility is to remove rtnetlink notification about
NETDEV_DOWN/_UNREGISTER from ipv6 subsystem because it is duplicate message.

regards,

milon

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:28 UTC (permalink / raw)
  To: shemminger; +Cc: ian.mcdonald, rick.jones2, netdev
In-Reply-To: <20070829151301.495f3d6e@freepuppy.rosehill.hemminger.net>

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Wed, 29 Aug 2007 15:13:01 -0700

> There was some discussion about implementing TCP NCR (RFC4653)
> and Narasimha Reddy said he might have something that could be used.

Although this looks interesting, I'm unsure it will help these
cell folks.  Actually I can't tell for sure until Rick provides
us with some more details of the exact issue at hand.

NCR seems to deal with when the trigger loss recovery, whereas
the cell phone network folks aparently want to jack up TCP_RTO_MIN
so that hard timeout based retranmits are deferred a lot more
than normal.

And reading NCR some more, we already have something similar in the
form of Alexey's reordering detection, in fact it handles exactly the
case NCR supposedly deals with.  We do not trigger loss recovery
strictly on the 3rd duplicate ACK, and we've known about and dealt
with the reordering issue explicitly for years.

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:23 UTC (permalink / raw)
  To: ian.mcdonald; +Cc: rick.jones2, netdev
In-Reply-To: <5640c7e00708291510p778f387w51d50e981ba49a25@mail.gmail.com>

From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 10:10:37 +1200

> Understand what you are saying. That is why I questioned as 200 msecs
> makes no sense on a LAN with < 1 msec RTT. So if the current is
> ridiculous and 1000 is even more so, why do we use? Just because that
> is how TCP is written I'm guessing.

We considered getting rid of the lower bound several times, but didn't
want to investigate it fully back then.

> I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
> be a slight variation) but we ended up putting a minimum on it as you
> also face a problem if it fires too frequently (i.e. link is in
> usecs).
> 
> I might ask around on research lists and see why this issue has never
> been revisited.

There is also the argument that on a local lan congestion control
stops to make any sense.  The problem it that you can't detect what is
a local lan, and any config knob to indicate this is an unacceptable
hack.

Any "congestion" you see on a local high speed lan will be gone before
you can react to it, so it's pretty pointless to do anything.

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 22:20 UTC (permalink / raw)
  To: rick.jones2; +Cc: ian.mcdonald, netdev
In-Reply-To: <46D5EEB6.3020602@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 29 Aug 2007 15:09:58 -0700

> If nothing else, 200 ms is a "principle of least surprise" thing since 
> that is the current value (in MS) for TCP_RTO_MIN.

And Solaris and MacOS-X and...

In fact this is a great example why we don't treat RFCs as dictations
from the gods.  They are often wrong, impractical, or full of fatal
flaws.

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Stephen Hemminger @ 2007-08-29 22:13 UTC (permalink / raw)
  To: David Miller; +Cc: ian.mcdonald, rick.jones2, netdev
In-Reply-To: <20070829.144656.104048365.davem@davemloft.net>

On Wed, 29 Aug 2007 14:46:56 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
> 
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
> 
> Everyone uses this value, even BSD since ancient times.
> 
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
> 
> So what do these cellphone network people want to do, increate the
> minimum RTO or increase it?  Exactly how does it help them?
> 
> If the issue is wireless loss, algorithms like FRTO might help them,
> because FRTO tries to make a distinction between capacity losses
> (which should adjust cwnd) and radio losses (which are not capacity
> based and therefore should not affect cwnd).


The following could help with loss.
There was some discussion about implementing TCP NCR (RFC4653)
and Narasimha Reddy said he might have something that could be used.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Rick Jones @ 2007-08-29 22:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <46D5E191.6070401@cosmosbay.com>


> I am sure you can use CTL_UNNUMBERED instead of adding yet another 
> sysctl value, as advised in include/linux/sysctl.h
> 
> **  For new interfaces unless you really need a binary number
> **  please use CTL_UNNUMBERED.

fair enough.  i was just repeating past behaviour :)

rick jones

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Ian McDonald @ 2007-08-29 22:10 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev
In-Reply-To: <20070829.144656.104048365.davem@davemloft.net>

On 8/30/07, David Miller <davem@davemloft.net> wrote:
> From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
> Date: Thu, 30 Aug 2007 09:32:38 +1200
>
> > So I'm suspecting that the default should be changed to 1000 to match
> > the RFC which would solve this issue. I note that the RFC is a SHOULD
> > rather than a MUST. I had a quick look around and not sure why Linux
> > overrides the RFC on this one.
>
> Everyone uses this value, even BSD since ancient times.
>
> None of the research folks want to commit to saying a lower value is
> OK, even though it's quite clear that on a local 10 gigabit link a
> minimum value of even 200 is absolutely and positively absurd.
>
Understand what you are saying. That is why I questioned as 200 msecs
makes no sense on a LAN with < 1 msec RTT. So if the current is
ridiculous and 1000 is even more so, why do we use? Just because that
is how TCP is written I'm guessing.

I know that in DCCP CCID3 the RTO is 4 x RTT (from memory - it might
be a slight variation) but we ended up putting a minimum on it as you
also face a problem if it fires too frequently (i.e. link is in
usecs).

I might ask around on research lists and see why this issue has never
been revisited.

Now to the original issue - high RTT links. If that is an issue, and I
believe it would be, then it's probably better to do this on a per
route basis or similar, although then we're becoming a defacto X x rtt
type setup. Rereading the RFC this actually doesn't seem prohibited
and here is the code from DCCP CCID3 that we use:

		/*
		 * Update timeout interval for the nofeedback timer.
		 * We use a configuration option to increase the lower bound.
		 * This can help avoid triggering the nofeedback timer too
		 * often ('spinning') on LANs with small RTTs.
		 */
		hctx->ccid3hctx_t_rto = max_t(u32, 4 * hctx->ccid3hctx_rtt,
						   CONFIG_IP_DCCP_CCID3_RTO *
						   (USEC_PER_SEC/1000));
		/*
		 * Schedule no feedback timer to expire in
		 * max(t_RTO, 2 * s/X)  =  max(t_RTO, 2 * t_ipi)
		 */
		t_nfb = max(hctx->ccid3hctx_t_rto, 2 * hctx->ccid3hctx_t_ipi);

		ccid3_pr_debug("%s(%p), Scheduled no feedback timer to "
			       "expire in %lu jiffies (%luus)\n",
			       dccp_role(sk),
			       sk, usecs_to_jiffies(t_nfb), t_nfb);

		sk_reset_timer(sk, &hctx->ccid3hctx_no_feedback_timer,
				   jiffies + usecs_to_jiffies(t_nfb));

Maybe the TCP code could do this also (with a sysctl to turn behaviour
off and on) and then it would save system administrators having to
"tune" the TCP stack if they want this sort of behaviour.

Ian
-- 
Web1: http://wand.net.nz/~iam4/
Web2: http://www.jandi.co.nz
Blog: http://iansblog.jandi.co.nz

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: Rick Jones @ 2007-08-29 22:09 UTC (permalink / raw)
  To: Ian McDonald; +Cc: netdev
In-Reply-To: <5640c7e00708291432q6acde704od52247647a6b453@mail.gmail.com>

Ian McDonald wrote:
> Hmmm... RFC2988 says:
>    (2.4) Whenever RTO is computed, if it is less than 1 second then the
>          RTO SHOULD be rounded up to 1 second.
> 
>          Traditionally, TCP implementations use coarse grain clocks to
>          measure the RTT and trigger the RTO, which imposes a large
>          minimum value on the RTO.  Research suggests that a large
>          minimum RTO is needed to keep TCP conservative and avoid
>          spurious retransmissions [AP99].  Therefore, this
>          specification requires a large minimum RTO as a conservative
>          approach, while at the same time acknowledging that at some
>          future point, research may show that a smaller minimum RTO is
>          acceptable or superior.
> 
> I went and had a look and this RFC has not been obsoleted. RFC3390
> also backs this assertion up.
> 
> So I'm suspecting that the default should be changed to 1000 to match
> the RFC which would solve this issue. I note that the RFC is a SHOULD
> rather than a MUST. I had a quick look around and not sure why Linux
> overrides the RFC on this one.

If nothing else, 200 ms is a "principle of least surprise" thing since 
that is the current value (in MS) for TCP_RTO_MIN.

rick jones

^ permalink raw reply

* Re: [NFS] [3/4] 2.6.23-rc4: known regressions
From: J. Bruce Fields @ 2007-08-29 21:54 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, linux-wireless, Michael Buesch, Stefano Brivio,
	Andrew Clayton, Shish, Roman Zippel, Sam Ravnborg, Karl Meyer,
	Trond Myklebust, LKML, Christian Casteyde, Francois Romieu, nfs,
	Netdev, linux-fsdevel, Hugh Dickins, Andrew Morton, kbuild-devel,
	Martin Langer, Andreas Jaggi, Danny van Dyk
In-Reply-To: <46D5906B.1030701@googlemail.com>

On Wed, Aug 29, 2007 at 05:27:39PM +0200, Michal Piotrowski wrote:
> FS
> 
> Subject         : [NFSD OOPS] 2.6.23-rc1-git10
> References      : http://lkml.org/lkml/2007/8/2/462
> Last known good : ?
> Submitter       : Andrew Clayton <andrew@digital-domain.net>
> Caused-By       : ?
> Handled-By      : ?
> Status          : unknown

This is a bug, but (alas) appears to be a preexisting bug, so may not
belong on this list.--b.

^ permalink raw reply

* Re: [PATCH] Use task_pid_nr() in ip_vs_sync.c
From: sukadev @ 2007-08-29 21:50 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Andrew Morton, Oleg Nesterov, Linux Containers,
	Linux Kernel Mailing List, Linux Netdev List
In-Reply-To: <46D57508.9040401@openvz.org>

Pavel Emelianov [xemul@openvz.org] wrote:
| The sync_master_pid and sync_backup_pid are set in set_sync_pid()
| and are used later for set/not-set checks and in printk. So it
| is safe to use the global pid value in this case.
| 
| Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
| 
| ---
| 
| diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
| index 959c08d..d0798a5 100644
| --- a/net/ipv4/ipvs/ip_vs_sync.c
| +++ b/net/ipv4/ipvs/ip_vs_sync.c
| @@ -794,7 +794,7 @@ static int sync_thread(void *startup)
| 
| 	add_wait_queue(&sync_wait, &wait);
| 
| -	set_sync_pid(state, current->pid);
| +	set_sync_pid(state, task_pid_nr(current));
| 	complete(tinfo->startup);
| 
| 	/*

^ permalink raw reply

* Re: [PATCH] make _minimum_ TCP retransmission timeout configurable
From: David Miller @ 2007-08-29 21:46 UTC (permalink / raw)
  To: ian.mcdonald; +Cc: rick.jones2, netdev
In-Reply-To: <5640c7e00708291432q6acde704od52247647a6b453@mail.gmail.com>

From: "Ian McDonald" <ian.mcdonald@jandi.co.nz>
Date: Thu, 30 Aug 2007 09:32:38 +1200

> So I'm suspecting that the default should be changed to 1000 to match
> the RFC which would solve this issue. I note that the RFC is a SHOULD
> rather than a MUST. I had a quick look around and not sure why Linux
> overrides the RFC on this one.

Everyone uses this value, even BSD since ancient times.

None of the research folks want to commit to saying a lower value is
OK, even though it's quite clear that on a local 10 gigabit link a
minimum value of even 200 is absolutely and positively absurd.

So what do these cellphone network people want to do, increate the
minimum RTO or increase it?  Exactly how does it help them?

If the issue is wireless loss, algorithms like FRTO might help them,
because FRTO tries to make a distinction between capacity losses
(which should adjust cwnd) and radio losses (which are not capacity
based and therefore should not affect cwnd).

^ permalink raw reply

* Re: [PATCH] Remove write-only variable from pktgen_thread
From: sukadev @ 2007-08-29 21:33 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Andrew Morton, Oleg Nesterov, Linux Containers,
	Linux Kernel Mailing List, Linux Netdev List
In-Reply-To: <46D572FA.9090302@openvz.org>

Pavel Emelianov [xemul@openvz.org] wrote:
| The pktgen_thread.pid is set to current->pid and is never used
| after this. So remove this at all.
| 
| Found during isolating the explicit pid/tgid usage.
| 
| Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Good observation that its not being used :-)

Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
| 
| ---
| 
| diff --git a/net/core/pktgen.c b/net/core/pktgen.c
| index 3a3154e..93695c2 100644
| --- a/net/core/pktgen.c
| +++ b/net/core/pktgen.c
| @@ -380,7 +380,6 @@ struct pktgen_thread {
| 	/* Field for thread to receive "posted" events terminate, stop ifs 
| 	etc. */
| 
| 	u32 control;
| -	int pid;
| 	int cpu;
| 
| 	wait_queue_head_t queue;
| @@ -3462,8 +3461,6 @@ static int pktgen_thread_worker(void *ar
| 
| 	init_waitqueue_head(&t->queue);
| 
| -	t->pid = current->pid;
| -
| 	pr_debug("pktgen: starting pktgen/%d:  pid=%d\n", cpu, 
| 	task_pid_nr(current));
| 
| 	max_before_softirq = t->max_before_softirq;

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox