netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: Ian McDonald <ian.mcdonald@jandi.co.nz>
Cc: OBATA Noboru <noboru.obata.ar@hitachi.com>,
	David Miller <davem@davemloft.net>,
	Stephen Hemminger <shemminger@linux-foundation.org>,
	netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable
Date: Mon, 25 Jun 2007 15:29:26 -0700	[thread overview]
Message-ID: <468041C6.4060309@hp.com> (raw)
In-Reply-To: <5640c7e00706251518y75a34578xec45ce100b6df832@mail.gmail.com>

Ian McDonald wrote:
> On 6/26/07, OBATA Noboru <noboru.obata.ar@hitachi.com> wrote:
> 
>> From: OBATA Noboru <noboru.obata.ar@hitachi.com>
>>
>> Make TCP_RTO_MAX a variable, and allow a user to change it via a
>> new sysctl entry /proc/sys/net/ipv4/tcp_rto_max.  A user can
>> then guarantee TCP retransmission to be more controllable, say,
>> at least once per 10 seconds, by setting it to 10.  This is
>> quite helpful on failover-capable network devices, such as an
>> active-backup bonding device.  On such devices, it is desirable
>> that TCP retransmits a packet shortly after the failover, which
>> is what I would like to do with this patch.  Please see
>> Background and Problem below for rationale in detail.
>>
> RFC2988 says this:
>   (2.4) Whenever RTO is computed, if it is less than 1 second then the
>         RTO SHOULD be rounded up to 1 second.
> 
>         Traditionally, TCP implementations use coarse grain clocks to
>         measure the RTT and trigger the RTO, which imposes a large
>         minimum value on the RTO.  Research suggests that a large
>         minimum RTO is needed to keep TCP conservative and avoid
>         spurious retransmissions [AP99].  Therefore, this
>         specification requires a large minimum RTO as a conservative
>         approach, while at the same time acknowledging that at some
>         future point, research may show that a smaller minimum RTO is
>         acceptable or superior.
> 
>   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
>         seconds.
> 
> Your code doesn't seem to meet requirements of section 2.5 as your
> minimum is 1 second.

(At the risk of having another Emily Litella moment entering a 
discussion late...)

I thought that those sorts of things were generally referring to the 
_default_ setting?

> I think if you're trying to solve the bonding issue then you should
> solve that issue, not hack the TCP implementation as that opens it up
> to abuse in other ways.

FWIW, other stacks have a "tcp_rexmit_interval_max" without too much 
trouble:

$ ndd -h tcp_rexmit_interval_max

tcp_rexmit_interval_max:

     Upper limit for computed round trip time-out. [1,7200000]
     Default: 60000 (1 minute)

[Interesting to me that the default happens to be the aforementioned 60 
seconds :) ]

In the abstract, if we wanted a quick recovery in TCP from a link 
failover, I suppose it could be possible for a machine-local link 
failover if the link-failover code could then call back up into TCP to 
say "Yo, TCP, any connections you had going over this link/path/route 
should probably go ahead and try retransmitting now rather than later."

Of course, that does seem rather more complicated than having the 
administrator set an upper bound on the RTO, and wouldn't deal with 
non-machine-local link failover.

rick jones

  parent reply	other threads:[~2007-06-25 22:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-25 13:09 [PATCH 2.6.22-rc5] TCP: Make TCP_RTO_MAX a variable OBATA Noboru
2007-06-25 13:15 ` Patrick McHardy
2007-06-25 14:45   ` Siim Põder
2007-06-25 16:08   ` Stephen Hemminger
2007-06-27 21:57   ` [MaybeSpam] " noboru.obata.ar
2007-06-25 16:07 ` Stephen Hemminger
2007-07-12  6:45   ` OBATA Noboru
2007-06-25 22:18 ` Ian McDonald
2007-06-25 22:28   ` Stephen Hemminger
2007-06-25 22:29   ` Rick Jones [this message]
2007-07-12  6:53     ` OBATA Noboru
2007-07-12  9:54       ` Ian McDonald
2007-07-12  6:56   ` OBATA Noboru
2007-06-28  1:00 ` YOSHIFUJI Hideaki / 吉藤英明

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=468041C6.4060309@hp.com \
    --to=rick.jones2@hp.com \
    --cc=davem@davemloft.net \
    --cc=ian.mcdonald@jandi.co.nz \
    --cc=netdev@vger.kernel.org \
    --cc=noboru.obata.ar@hitachi.com \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).