All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Damian Lukowski <damian@tvk.rwth-aachen.de>,
	Netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
Date: Mon, 31 Aug 2009 16:04:11 +0200	[thread overview]
Message-ID: <4A9BD85B.9070708@gmail.com> (raw)
In-Reply-To: <4A9BD7AB.7060207@gmail.com>

Eric Dumazet a écrit :
> Damian Lukowski a écrit :
>> RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
>> which may represent a number of allowed retransmissions or a timeout value.
>> Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
>> in number of allowed retransmissions.
>>
>> For any desired threshold R2 (by means of time) one can specify tcp_retries2
>> (by means of number of retransmissions) such that TCP will not time out
>> earlier than R2. This is the case, because the RTO schedule follows a fixed
>> pattern, namely exponential backoff.
>>
>> However, the RTO behaviour is not predictable any more if RTO backoffs can be
>> reverted, as it is the case in the draft
>> "Make TCP more Robust to Long Connectivity Disruptions"
>> (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).
>>
>> In the worst case TCP would time out a connection after 3.2 seconds, if the
>> initial RTO equaled MIN_RTO and each backoff has been reverted.
>>
>> This patch introduces a function retransmits_timed_out(N),
>> which calculates the timeout of a TCP connection, assuming an initial
>> RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.
>>
>> Whenever timeout decisions are made by comparing the retransmission counter
>> to some value N, this function can be used, instead.
>>
>> The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
>> can occur than the value indicates. However, it yields a timeout which is
>> similar to the one of an unpatched, exponentially backing off TCP in the same
>> scenario. As no application could rely on an RTO greater than MIN_RTO, there
>> should be no risk of a regression.
>>
>> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
>> ---
>>  include/net/tcp.h    |   18 ++++++++++++++++++
>>  net/ipv4/tcp_timer.c |   11 +++++++----
>>  2 files changed, 25 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index c35b329..17d1a88 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -1247,6 +1247,24 @@ static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
>>  #define tcp_for_write_queue_from_safe(skb, tmp, sk)			\
>>  	skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
>>  
>> +static inline bool retransmits_timed_out(const struct sock *sk,
>> +					 unsigned int boundary)
>> +{
>> +	int limit, K;
>> +	if (!inet_csk(sk)->icsk_retransmits)
>> +		return false;
>> +
>> +	K = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
>> +
>> +	if (boundary <= K)
>> +		limit = ((2 << boundary) - 1) * TCP_RTO_MIN;
>> +	else
>> +		limit = ((2 << K) - 1) * TCP_RTO_MIN +
>> +			(boundary - K) * TCP_RTO_MAX;
> 
> Doing this computation might allow us to respect RFC 1122 here :
>  
> "The value of R2 SHOULD correspond to at least 100 seconds."
> 
> adding a third parameter to retransmits_timed_out(), min_limit,
> being 100*HZ if sysctl_tcp_retries2 was used...
> 
> limit = min(min_limit, limit);
> 

Oh well, typical min/max error, sorry :)

> 
>> +
>> +	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= limit;
>> +}
>> +
>>  static inline struct sk_buff *tcp_send_head(struct sock *sk)
>>  {
>>  	return sk->sk_send_head;
>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
>> index a3ba494..2972d7b 100644
>> --- a/net/ipv4/tcp_timer.c
>> +++ b/net/ipv4/tcp_timer.c
>> @@ -137,13 +137,14 @@ static int tcp_write_timeout(struct sock *sk)
>>  {
>>  	struct inet_connection_sock *icsk = inet_csk(sk);
>>  	int retry_until;
>> +	bool do_reset;
>>  
>>  	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
>>  		if (icsk->icsk_retransmits)
>>  			dst_negative_advice(&sk->sk_dst_cache);
>>  		retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
>>  	} else {
>> -		if (icsk->icsk_retransmits >= sysctl_tcp_retries1) {
>> +		if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
>>  			/* Black hole detection */
>>  			tcp_mtu_probing(icsk, sk);
>>  
>> @@ -155,13 +156,15 @@ static int tcp_write_timeout(struct sock *sk)
>>  			const int alive = (icsk->icsk_rto < TCP_RTO_MAX);
>>  
>>  			retry_until = tcp_orphan_retries(sk, alive);
>> +			do_reset = alive ||
>> +				   !retransmits_timed_out(sk, retry_until);
>>  
>> -			if (tcp_out_of_resources(sk, alive || icsk->icsk_retransmits < retry_until))
>> +			if (tcp_out_of_resources(sk, do_reset))
>>  				return 1;
>>  		}
>>  	}
>>  
>> -	if (icsk->icsk_retransmits >= retry_until) {
>> +	if (retransmits_timed_out(sk, retry_until)) {
>>  		/* Has it gone just too far? */
>>  		tcp_write_err(sk);
>>  		return 1;
>> @@ -385,7 +388,7 @@ void tcp_retransmit_timer(struct sock *sk)
>>  out_reset_timer:
>>  	icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
>>  	inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
>> -	if (icsk->icsk_retransmits > sysctl_tcp_retries1)
>> +	if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
>>  		__sk_dst_reset(sk);
>>  
>>  out:;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


  reply	other threads:[~2009-08-31 14:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-26 10:16 [PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value Damian Lukowski
2009-08-31 13:26 ` Ilpo Järvinen
2009-09-01 11:28   ` Damian Lukowski
2009-09-01 13:41     ` Ilpo Järvinen
2009-08-31 14:01 ` Eric Dumazet
2009-08-31 14:04   ` Eric Dumazet [this message]
2009-09-01 11:49   ` Damian Lukowski
2009-09-01 12:25     ` Eric Dumazet
2009-09-01 13:23       ` Ilpo Järvinen
     [not found] ` <CAFbMe2Mu46N8kRrYkGzYRLuntEd2J9aO_d0Jw_y0gsQA4q-qHw@mail.gmail.com>
2012-06-01 22:58   ` Jerry Chu
2012-06-04 17:50     ` Damian Lukowski
2012-06-04 19:18       ` Ilpo Järvinen
2012-06-04 23:57         ` Jerry Chu
2012-06-04 23:50       ` Jerry Chu
2012-06-05 17:42         ` Jerry Chu
2012-06-05 18:39           ` Damian Lukowski
2012-06-05 21:22             ` Jerry Chu
2012-06-06 20:35               ` Damian Lukowski
2012-06-06 22:41                 ` Yuchung Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A9BD85B.9070708@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=damian@tvk.rwth-aachen.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.