From: Weiping Pan <panweiping3@gmail.com>
To: Nandita Dukkipati <nanditad@google.com>, Per Hurtig <per.hurtig@kau.se>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
Netdev <netdev@vger.kernel.org>,
"Anna Brunström" <anna.brunstrom@kau.se>,
mohammad.rajiullah@kau.se, "Neal Cardwell" <ncardwell@google.com>,
sergei.shtylyov@cogentembedded.com
Subject: Re: [PATCH] tcp: fixing TLP's FIN recovery
Date: Thu, 12 Jun 2014 22:21:22 +0800 [thread overview]
Message-ID: <5399B762.5090002@gmail.com> (raw)
In-Reply-To: <CAB_+Fg5=Z0wfaAZgUMXKttHKx9WoVLQv5ZqDeUQJmGA0LJTwPg@mail.gmail.com>
On 06/09/2014 03:02 PM, Nandita Dukkipati wrote:
> On Sun, Jun 8, 2014 at 12:41 AM, Per Hurtig <per.hurtig@kau.se> wrote:
>>
>> On sön 8 jun 2014 04:58:25, Eric Dumazet wrote:
>>> On Sat, 2014-06-07 at 16:34 +0200, Per Hurtig wrote:
>>>> Fix to a problem observed when losing a FIN segment that does not
>>>> contain data. In such situations, TLP is unable to recover from
>>>> *any* tail loss and instead adds at least PTO ms to the
>>>> retransmission process, i.e., RTO = RTO + PTO.
>>>>
>>>> Signed-off-by: Per Hurtig <per.hurtig@kau.se>
>>>> ---
>>>> net/ipv4/tcp_output.c | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index d463c35..6573765 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -2130,8 +2130,10 @@ void tcp_send_loss_probe(struct sock *sk)
>>>> if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
>>>> goto rearm_timer;
>>>>
>>>> - /* Probe with zero data doesn't trigger fast recovery. */
>>>> - if (skb->len > 0)
>>>> + /* Probe with zero data doesn't trigger fast recovery, if FIN
>>>> + * flag is not set.
>>>> + */
>>>> + if ((skb->len > 0) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
>>>> err = __tcp_retransmit_skb(sk, skb);
>>>>
>>>> /* Record snd_nxt for loss detection. */
>>>
>>>
>>> You know, I believe the test was exactly to avoid sending data less FIN
>>> packets.
>>>
>>> If you write :
>>>
>>> if (A || !A)
>>>
>>> Better remove the condition, completely ;)
>>>
>> Obviously, but I don't think that FINs are the only segments
>> who are targeted by this condition (or targeted at all given
>> the implications of this statement). Furthermore, the comment above
>> the if statement would probably have mentioned FINs explicity
>> and not zero sized segments in general if this were the case.
>>
>>
>>
>>> Nandita, why FIN packet wont trigger fast retransnmits ?
>>>
>> They do, that's the whole thing with this patch.
>>
>>
>>> It sounds like if the timer is the issue you want to fix, you might
>>> simply rearm a timer with RTO-PTO instead of RTO ?
>>>
>>>
>> No I want to enable TLP for tail loss where an empty FIN is involved,
>> this does not work now.
> I understand the tail loss case you want to solve - essentially when a
> tail loss occurs that involves data segments as well as that of an
> empty FIN. However, have you verified that re-sending an empty FIN
> triggers fast recovery? I would be surprised if it did, because I
> think the sender needs to receive a SACK of at least 1-byte of data
> before sender can trigger FACK based fast recovery.
When we queue an out of order pure FIN packet, we do not check whether
it has data or not.
tcp_rcv_established
-->tcp_data_queue
---->tcp_data_queue_ofo
Then the pure FIN packet can generate SACK, which will trigger fast
recovery or early retransmit on the sender.
>
> If you have verified that a pure FIN does indeed trigger recovery, can
> you tell me what part of the code makes that happen?
Here is the patch I use, I think the original if statement is useless,
so I delete it.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 12d6016..4b301e9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2077,9 +2077,7 @@ void tcp_send_loss_probe(struct sock *sk)
if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
goto rearm_timer;
- /* Probe with zero data doesn't trigger fast recovery. */
- if (skb->len > 0)
- err = __tcp_retransmit_skb(sk, skb);
+ err = __tcp_retransmit_skb(sk, skb);
/* Record snd_nxt for loss detection. */
if (likely(!err))
I find that pure FIN can trigger fast recovery or early retransmit,
depending on the value of fackets_out.
I write two packetdrill scripts,
fin_fack.pkt can show that pure FIN can trigger fast recovery,
fin_er.pkt can show that pure FIN can trigger early retransmit.
# cat fin_fack.pkt
// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
// RTT = 100ms
0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Send 8 MSS.
// tcp_min_tso_segs is 2
0.200 write(4, ..., 8000) = 8000
+.000 > . 1:2001(2000) ack 1
+.000 > . 2001:4001(2000) ack 1
+.000 > . 4001:6001(2000) ack 1
+.000 > P. 6001:8001(2000) ack 1
+.000 close(4) = 0
+.000 > F. 8001:8001(0) ack 1
// Receiver ACKs 4 packets, the fifth to eighth packets are lost
0.300 < . 1:1(0) ack 4001 win 257
// PTO = 2RTT, TLP is triggered
0.500 > F. 8001:8001(0) ack 1 win 229
0.600 < . 1:1(0) ack 4001 win 257 <sack 8001:8002,nop,nop>
// got SACK, FACK triggers fast recovery and fast retransmit
0.600 > . 4001:5001(1000) ack 1 win 229
0.600 > . 5001:6001(1000) ack 1 win 229
0.700 < . 1:1(0) ack 6001 win 257 <sack 8001:8002,nop,nop>
// fast retransmit
0.700 > . 6001:7001(1000) ack 1 win 229
0.700 > P. 7001:8001(1000) ack 1 win 229
0.700 < . 1:1(0) ack 8002 win 257
// peer close
0.800 < F. 1:1(0) ack 8002 win 229
0.800 > . 8002:8002(0) ack 2
# cat fin_er.pkt
// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
// RTT = 100ms
0.100 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Send 8 MSS.
// tcp_min_tso_segs is 2
0.200 write(4, ..., 8000) = 8000
+.000 > . 1:2001(2000) ack 1
+.000 > . 2001:4001(2000) ack 1
+.000 > . 4001:6001(2000) ack 1
+.000 > P. 6001:8001(2000) ack 1
+.000 close(4) = 0
+.000 > F. 8001:8001(0) ack 1
// Receiver ACKs 7 packets, the eighth packet is lost
0.300 < . 1:1(0) ack 7001 win 257
// PTO = 2RTT, TLP is triggered
0.500 > F. 8001:8001(0) ack 1 win 229
0.600 < . 1:1(0) ack 7001 win 257 <sack 8001:8002,nop,nop>
// got SACK, trigger early retransmit in RTT/4
0.625 > P. 7001:8001(1000) ack 1 win 229
0.725 < . 1:1(0) ack 8002 win 257
// peer close
0.800 < F. 1:1(0) ack 8002 win 229
0.800 > . 8002:8002(0) ack 2
Test results:
before the patch:
# ./packetdrill -v fin_fack.pkt
inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet: 0.100182 S. 3040039935:3040039935(0) ack 1 win
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet: 0.200005 . 1:1(0) ack 3040039936 win 257
outbound sniffed packet: 0.200108 . 3040039936:3040041936(2000) ack 1
win 229
outbound sniffed packet: 0.200117 . 3040041936:3040043936(2000) ack 1
win 229
outbound sniffed packet: 0.200123 . 3040043936:3040045936(2000) ack 1
win 229
outbound sniffed packet: 0.200130 P. 3040045936:3040047936(2000) ack 1
win 229
outbound sniffed packet: 0.200328 F. 3040047936:3040047936(0) ack 1 win
229
inbound injected packet: 0.300004 . 1:1(0) ack 3040043936 win 257
outbound sniffed packet: 0.800768 . 3040043936:3040044936(1000) ack 1
win 229
fin_fack.pkt:27: error handling packet: live packet field
ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410)
script packet: 0.500000 F. 8001:8001(0) ack 1 win 229
actual packet: 0.800768 . 4001:5001(1000) ack 1 win 229
# ./packetdrill -v fin_er.pkt
inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet: 0.100172 S. 1475097861:1475097861(0) ack 1 win
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet: 0.200005 . 1:1(0) ack 1475097862 win 257
outbound sniffed packet: 0.200113 . 1475097862:1475099862(2000) ack 1
win 229
outbound sniffed packet: 0.200121 . 1475099862:1475101862(2000) ack 1
win 229
outbound sniffed packet: 0.200128 . 1475101862:1475103862(2000) ack 1
win 229
outbound sniffed packet: 0.200134 P. 1475103862:1475105862(2000) ack 1
win 229
outbound sniffed packet: 0.200305 F. 1475105862:1475105862(0) ack 1 win
229
inbound injected packet: 0.300004 . 1:1(0) ack 1475104862 win 257
outbound sniffed packet: 0.800764 P. 1475104862:1475105862(1000) ack 1
win 229
fin_er.pkt:27: error handling packet: live packet field
ipv4_total_length: expected: 40 (0x28) vs actual: 1040 (0x410)
script packet: 0.500000 F. 8001:8001(0) ack 1 win 229
actual packet: 0.800764 P. 7001:8001(1000) ack 1 win 229
after the patch:
# ./packetdrill -v fin_fack.pkt
inbound injected packet: 0.100026 S 0:0(0) win 32792 <mss
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet: 0.100198 S. 3395593992:3395593992(0) ack 1 win
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet: 0.200013 . 1:1(0) ack 3395593993 win 257
outbound sniffed packet: 0.200115 . 3395593993:3395595993(2000) ack 1
win 229
outbound sniffed packet: 0.200131 . 3395595993:3395597993(2000) ack 1
win 229
outbound sniffed packet: 0.200138 . 3395597993:3395599993(2000) ack 1
win 229
outbound sniffed packet: 0.200145 P. 3395599993:3395601993(2000) ack 1
win 229
outbound sniffed packet: 0.200345 F. 3395601993:3395601993(0) ack 1 win
229
inbound injected packet: 0.300016 . 1:1(0) ack 3395597993 win 257
outbound sniffed packet: 0.499792 F. 3395601993:3395601993(0) ack 1 win
229
inbound injected packet: 0.600024 . 1:1(0) ack 3395597993 win 257 <sack
3395601993:3395601994,nop,nop>
outbound sniffed packet: 0.600074 . 3395597993:3395598993(1000) ack 1
win 229
outbound sniffed packet: 0.600080 . 3395598993:3395599993(1000) ack 1
win 229
inbound injected packet: 0.700016 . 1:1(0) ack 3395599993 win 257 <sack
3395601993:3395601994,nop,nop>
outbound sniffed packet: 0.700062 . 3395599993:3395600993(1000) ack 1
win 229
outbound sniffed packet: 0.700066 P. 3395600993:3395601993(1000) ack 1
win 229
inbound injected packet: 0.700164 . 1:1(0) ack 3395601994 win 257
inbound injected packet: 0.800016 F. 1:1(0) ack 3395601994 win 229
outbound sniffed packet: 0.800062 . 3395601994:3395601994(0) ack 2 win 229
# ./packetdrill -v fin_er.pkt
inbound injected packet: 0.100009 S 0:0(0) win 32792 <mss
1000,sackOK,nop,nop,nop,wscale 7>
outbound sniffed packet: 0.100180 S. 3074568182:3074568182(0) ack 1 win
29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
inbound injected packet: 0.200005 . 1:1(0) ack 3074568183 win 257
outbound sniffed packet: 0.200106 . 3074568183:3074570183(2000) ack 1
win 229
outbound sniffed packet: 0.200115 . 3074570183:3074572183(2000) ack 1
win 229
outbound sniffed packet: 0.200122 . 3074572183:3074574183(2000) ack 1
win 229
outbound sniffed packet: 0.200128 P. 3074574183:3074576183(2000) ack 1
win 229
outbound sniffed packet: 0.200326 F. 3074576183:3074576183(0) ack 1 win
229
inbound injected packet: 0.300003 . 1:1(0) ack 3074575183 win 257
outbound sniffed packet: 0.499765 F. 3074576183:3074576183(0) ack 1 win
229
inbound injected packet: 0.600006 . 1:1(0) ack 3074575183 win 257 <sack
3074576183:3074576184,nop,nop>
outbound sniffed packet: 0.624764 P. 3074575183:3074576183(1000) ack 1
win 229
inbound injected packet: 0.725004 . 1:1(0) ack 3074576184 win 257
inbound injected packet: 0.800003 F. 1:1(0) ack 3074576184 win 229
outbound sniffed packet: 0.800047 . 3074576184:3074576184(0) ack 2 win 229
thanks
Weiping Pan
>
> Nandita
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-06-12 14:21 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-06 18:46 tcp: fixing TLP's FIN recovery Per Hurtig
2014-06-06 19:07 ` Eric Dumazet
2014-06-07 11:10 ` [PATCH] " Per Hurtig
2014-06-07 13:56 ` Sergei Shtylyov
2014-06-07 14:34 ` Per Hurtig
2014-06-08 2:58 ` Eric Dumazet
2014-06-08 7:41 ` Per Hurtig
2014-06-08 16:35 ` Eric Dumazet
2014-06-09 7:04 ` Nandita Dukkipati
2014-06-09 7:02 ` Nandita Dukkipati
2014-06-09 13:13 ` Per Hurtig
2014-06-09 14:33 ` Eric Dumazet
2014-06-09 14:39 ` Eric Dumazet
2014-06-09 14:42 ` Per Hurtig
2014-06-09 15:04 ` Eric Dumazet
2014-06-09 15:56 ` Per Hurtig
2014-06-09 16:15 ` Eric Dumazet
2014-06-09 16:24 ` Eric Dumazet
2014-06-09 18:33 ` Eric Dumazet
2014-06-12 14:21 ` Weiping Pan [this message]
2014-06-12 14:32 ` Eric Dumazet
2014-06-12 15:08 ` [PATCH v2 1/1] " Per Hurtig
2014-06-12 15:28 ` Eric Dumazet
2014-06-12 17:36 ` Nandita Dukkipati
2014-06-12 17:46 ` Neal Cardwell
2014-06-12 18:06 ` David Miller
2014-10-07 15:03 ` Josh Hunt
2014-10-07 20:17 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5399B762.5090002@gmail.com \
--to=panweiping3@gmail.com \
--cc=anna.brunstrom@kau.se \
--cc=eric.dumazet@gmail.com \
--cc=mohammad.rajiullah@kau.se \
--cc=nanditad@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=per.hurtig@kau.se \
--cc=sergei.shtylyov@cogentembedded.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).