Re: [PATCH] revert TCP retransmission backoff on ICMP destination unreachable

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Damian Lukowski <damian@tvk.rwth-aachen.de>
To: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Cc: David Miller <davem@davemloft.net>, Netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH] revert TCP retransmission backoff on ICMP destination unreachable
Date: Tue, 18 Aug 2009 19:40:01 +0200	[thread overview]
Message-ID: <4A8AE771.1070308@tvk.rwth-aachen.de> (raw)
In-Reply-To: <alpine.DEB.2.00.0908181514560.10654@wel-95.cs.helsinki.fi>

> On Fri, 14 Aug 2009, Damian Lukowski wrote:
> 
>>> Longer than 80 columns, and use an inline function instead
>>> of a macro in order to get proper type checking.
>>> [...]
>>> Do not break up the function local variables with spurious new lines
>>> like this, please.
>>> [...]
>>> The indentation and tabbing is messed up in all of the code you are
>>> adding, please fix it up to be consistent with the surrounding code
>>> and the rest of the TCP stack.
>>>
>>> Do not use C++ style // comments.
>>
>> Better?
> 
> Please, include the changelog message on resubmits too next time.

I'm sorry, which message do you mean? I used plain diff without GIT or
anything.

>> +        if ((code != ICMP_NET_UNREACH && code != ICMP_HOST_UNREACH) ||
>> +            seq != tp->snd_una  || !icsk->icsk_retransmits ||
>> +            !icsk->icsk_backoff)
> 
> I'd recommend you break this into two if()\nbreak's, one for filtering
> other icmps and other for filtering mismatching against the socket's state.

Ok.

>> +        icsk->icsk_backoff--;
>> +        icsk->icsk_rto >>= 1;
> 
> Are you absolute sure that we don't go to zero here when phase of the
> moon is just right? I'd put a max(..., 1) guard there.

Well, the retransmission timer doubles icsk_rto whenever it increases
icsk_backoff, so we should reach the original icsk_rto, when
icsk_backoff becomes zero.

>> +        skb_r = skb_peek(&sk->sk_write_queue);
> 
> tcp_write_queue_head(sk)

Ok.
> 
>> +        BUG_ON(!skb_r);
>> +
>> +        if (sock_owned_by_user(sk)) {
>> +            /* Deferring retransmission clocked by ICMP
>> +             * due to locked socket. */
>> +            inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> +            min(icsk->icsk_rto, TCP_RESOURCE_PROBE_INTERVAL),
> 
> ??? (besides having indent wrong). What makes you think HZ/2 is right
> value if icsk_rto is large?

To be honest, I have taken over the code from my predecessor and didn't
question this part. When thinking about it, I would remove this block
completely. Will bad things happen, if I mess with the timer, when the
socket is locked?

>> +            TCP_RTO_MAX);
> 
> Perhaps you lack here something to exit?
> 
>> +        }
>> +
>> +        if (tcp_time_stamp - TCP_SKB_CB(skb_r)->when >
>> +            inet_csk(sk)->icsk_rto) {
> 
> icsk->icsk_rto !?!
> 
>> +            /* RTO revert clocked out retransmission. */
>> +            tcp_retransmit_skb(sk, skb_r);
> 
> This is plain wrong, tcp_sock counters will get messed up if
> TCPCB_SACKED_RETRANS is already set while calling tcp_retransmit_skb. As
> a sidenote, it would be probably useful to move the check + clear of
> that bit before doing the retransmission to some common place one day.

What, if I call tcp_retransmit_timer() manually instead? Will there
still be problems with SACK?

>> +            inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
>> +                          icsk->icsk_rto, TCP_RTO_MAX);
> 
> RFC 2988, step 5.5 missing? Have you verified this patch for real?

Thats the whole point of the draft. If consecutive retransmissions
trigger fitting ICMPs, the RTO value is not doubled, but remains
constant. So you can have a retransmission schedule of
t, t+r, t+2r, t+3r, t+4r,  t+5r  ...
instead of
t, t+r, t+3r, t+7r, t+15r, t+31r ...

I can show you trace plots of patched an unpatched TCP, if you like.
Should I attach files in the mails, or better post hyperlinks to them?

>> +        } else {
>> +            /* RTO revert shortened timer. */
>> +            inet_csk_reset_xmit_timer(
>> +                sk, ICSK_TIME_RETRANS,
>> +                icsk->icsk_rto-
>> +                (tcp_time_stamp-TCP_SKB_CB(skb_r)->when),
> 
> Spacing.
> 
>> +                TCP_RTO_MAX);
>> +        }
>> +
> 
> How about:
> 
>         u32 remaining;
> 
>         remaining = icsk->icsk_rto - min(icsk->icsk_rto,
>                          tcp_time_stamp - TCP_SKB_CB(skb_r)->when));
>         if (!remaining) {
>             tcp_retransmit_skb(...);
>             remaining = icsk->icsk_rto;
>         }
>         inet_csk_reset_xmit_timer(..., remaining);

Seems to be ok, but isn't tcp_retransmit_timer(...) better?

>> -        if (icsk->icsk_retransmits >= sysctl_tcp_retries1) {
>> +        if (retrans_overstepped(sk, sysctl_tcp_retries1)) {
> 
> ??? Could you justify these changes and explain their relation to the
> draft and icmp messages? If not, please drop them and try with proper
> description and description for them in a separate patch. I fail to see
> what you're trying to achieve here. You just ended up redefining the
> sysctl meaning too I think...

Well, you're quite right. RFCs specify timeouts in dimensions of time
(seconds, minutes, etc.), but Linux specifies timeouts in form of
numbers of retransmissions. One can do this, as long as the
retransmission schedule (exponential backoff in RFC case) is known. But
with this patch, the schedule can be completely different. In the
"worst" case, the time intervals between consecutive retransmission
probes are constant, leading to a TCP timeout after few seconds, instead
of several minutes.
What I did here, is to calculate the TCP timeout as if the schedule were
still an exponential backoff, given the allowed number of
retransmissions. It is possible, that TCP will now retransmit many more
times than tcp_retries indicates, but the duration until the TCP
connection times out, is more or less equivalent to unpatched TCP with
same tcp_retries values.
So, whenever some control block uses count values, but means time
values, I have to replace the control block appropriately; and that's
what retrans_overstepped() is supposed to do.

Thanks for your help, so far
 Damian

next prev parent reply	other threads:[~2009-08-18 17:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-11 11:27 [PATCH] revert TCP retransmission backoff on ICMP destination unreachable Damian Lukowski
2009-08-13 23:08 ` David Miller
2009-08-14 12:08   ` Damian Lukowski
2009-08-18 13:45     ` Ilpo Järvinen
2009-08-18 17:40       ` Damian Lukowski [this message]
2009-08-18 22:07         ` Ilpo Järvinen
2009-08-18 23:56           ` Damian Lukowski
2009-08-19 10:55             ` Ilpo Järvinen
2009-08-19  6:29         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A8AE771.1070308@tvk.rwth-aachen.de \
    --to=damian@tvk.rwth-aachen.de \
    --cc=davem@davemloft.net \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).