TCP being hoodwinked into spurious retransmissions by lack of timestamps?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TCP being hoodwinked into spurious retransmissions by lack of timestamps?
@ 2014-03-04  0:29 Rick Jones
  2014-03-04  3:22 ` John Heffner
  2014-03-25 17:39 ` Rick Jones
  0 siblings, 2 replies; 11+ messages in thread
From: Rick Jones @ 2014-03-04  0:29 UTC (permalink / raw)
  To: netdev

I've been looking at some packet traces of an application looking to 
upload a Large Quantity (tm) of data to a server across the Big Bad 
Internet (tm).  They've been Linux senders, and the destination while 
supporting SACK and window scaling does not support TCP timestamps. 
(TCP timestamp support was requested of the supplier of said server many 
many months ago now.)

This destination system has been issuing RSTs at seemingly random points 
in the middle of a large fraction of the attempted transfers.  In 
looking at the traces, they all seem to be variations on the theme of 
what is shown by:

ftp://netperf.org/retrans_question/for_netdev.png

which is a passing of ftp://netperf.org/retrans_question/for_netdev.pcap 
through tcptrace -nG and zoomed-in to the end.  I've seen this with a 
3.2.0 kernel as the sender, have reports of it happening with whatever 
is in Fedora Core 20, and the traces above are from a 3.11.0 kernel as 
the sender.

The large quantity of (likely) unnecessary retransmissions shouldn't be 
triggering a RST by the receiver, but the failures consistently show 
that and I was wondering if the (spurious) retransmissions were perhaps 
"encouraged" (so to speak) by the lack of TCP Timestamps.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04  0:29 TCP being hoodwinked into spurious retransmissions by lack of timestamps? Rick Jones
@ 2014-03-04  3:22 ` John Heffner
  2014-03-04 18:50   ` Rick Jones
  2014-03-21 21:53   ` Rick Jones
  2014-03-25 17:39 ` Rick Jones
  1 sibling, 2 replies; 11+ messages in thread
From: John Heffner @ 2014-03-04  3:22 UTC (permalink / raw)
  To: Rick Jones; +Cc: Netdev

Running with such a large window scale and no timestamps (PAWS
protection) is generally not a great idea, but I don't think is part
of the issue here.

If you look where things really go wrong, the receiver is sending
anomalous SACK blocks that will trigger the SACK renege handling path.
 Reneging triggers go-back-n behavior, so we see the spurious
retransmits from there on.

The most notable bad segment is this one:
18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.],
ack 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}],
length 0
It contains a SACK block contiguous with the acked seqno.  There is
some other strangeness just before that, where the SACK block shrinks
then grows again.

One other thing that jumped out at me is there is no actual loss, just
reordering.

  -John

On Mon, Mar 3, 2014 at 7:29 PM, Rick Jones <rick.jones2@hp.com> wrote:
> I've been looking at some packet traces of an application looking to upload
> a Large Quantity (tm) of data to a server across the Big Bad Internet (tm).
> They've been Linux senders, and the destination while supporting SACK and
> window scaling does not support TCP timestamps. (TCP timestamp support was
> requested of the supplier of said server many many months ago now.)
>
> This destination system has been issuing RSTs at seemingly random points in
> the middle of a large fraction of the attempted transfers.  In looking at
> the traces, they all seem to be variations on the theme of what is shown by:
>
> ftp://netperf.org/retrans_question/for_netdev.png
>
> which is a passing of ftp://netperf.org/retrans_question/for_netdev.pcap
> through tcptrace -nG and zoomed-in to the end.  I've seen this with a 3.2.0
> kernel as the sender, have reports of it happening with whatever is in
> Fedora Core 20, and the traces above are from a 3.11.0 kernel as the sender.
>
> The large quantity of (likely) unnecessary retransmissions shouldn't be
> triggering a RST by the receiver, but the failures consistently show that
> and I was wondering if the (spurious) retransmissions were perhaps
> "encouraged" (so to speak) by the lack of TCP Timestamps.
>
> happy benchmarking,
>
> rick jones
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04  3:22 ` John Heffner
@ 2014-03-04 18:50   ` Rick Jones
  2014-03-04 19:14     ` John Heffner
  2014-03-21 21:53   ` Rick Jones
  1 sibling, 1 reply; 11+ messages in thread
From: Rick Jones @ 2014-03-04 18:50 UTC (permalink / raw)
  To: John Heffner; +Cc: Netdev

On 03/03/2014 07:22 PM, John Heffner wrote:
> Running with such a large window scale and no timestamps (PAWS
> protection) is generally not a great idea, but I don't think is part
> of the issue here.

OK.  I'll look for other, additional sticks with which to beat the 
provider of the system that doesn't do timestamping :)

> If you look where things really go wrong, the receiver is sending
> anomalous SACK blocks that will trigger the SACK renege handling path.
>   Reneging triggers go-back-n behavior, so we see the spurious
> retransmits from there on.

Should those subsequent ACKs be clocking-out additional retransmissions 
like they appear to do? (Assuming I'm not projecting into the trace) Or 
is that an unavoidable consequence of there being no timestamps with 
which to tell which send was being ACKed?

> The most notable bad segment is this one:
> 18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.],
> ack 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}],
> length 0
> It contains a SACK block contiguous with the acked seqno.

I've gone back through one of the other traces and found the same thing 
therein.

> There is some other strangeness just before that, where the SACK
> block shrinks then grows again.

That would be this yes?

15:20:46.798816 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3660468:3661928, ack 4262, win 297, length 1460
15:20:46.799027 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3168256, win 32081, options [nop,nop,sack 1 {3171368:3172828}], length 0
15:20:46.799042 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3661928:3664848, ack 4262, win 297, length 2920
15:20:46.799465 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3169716, win 32241, options [nop,nop,sack 1 {3171368:3172828}], length 0
15:20:46.799479 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3664848:3666308, ack 4262, win 297, length 1460
15:20:46.799497 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3169716, win 32241, options [nop,nop,sack 1 {3171368:3174288}], length 0
15:20:46.799504 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3169716, win 32241, options [nop,nop,sack 1 {3171368:3175748}], length 0
15:20:46.799509 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3666308:3667768, ack 4262, win 297, length 1460
15:20:46.799773 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3171176, win 32491, options [nop,nop,sack 1 {3171368:3172828}], length 0
15:20:46.799787 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3667768:3669228, ack 4262, win 297, length 1460
15:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack 
3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}], length 0
15:20:46.800081 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq 
3171368:3172828, ack 4262, win 297, length 1460

Might that be packet-reordering in the other direction?  Sadly, I don't 
have good "both sides" traces as the receiving system doesn't seem to 
capture traffic terribly well.  I suppose TCP timestamps might have 
helped answer that question.

> One other thing that jumped out at me is there is no actual loss, just
> reordering.

That was interesting wasn't it - calling it the "Big Bad Internet (tm)" 
I was expecting to see *some* actual packet loss.   Though in this case 
the traffic, while going across the continental US, may not have been 
crossing Internet providers.

thanks muchly,

rick jones

>
>    -John
>
> On Mon, Mar 3, 2014 at 7:29 PM, Rick Jones <rick.jones2@hp.com> wrote:
>> I've been looking at some packet traces of an application looking to upload
>> a Large Quantity (tm) of data to a server across the Big Bad Internet (tm).
>> They've been Linux senders, and the destination while supporting SACK and
>> window scaling does not support TCP timestamps. (TCP timestamp support was
>> requested of the supplier of said server many many months ago now.)
>>
>> This destination system has been issuing RSTs at seemingly random points in
>> the middle of a large fraction of the attempted transfers.  In looking at
>> the traces, they all seem to be variations on the theme of what is shown by:
>>
>> ftp://netperf.org/retrans_question/for_netdev.png
>>
>> which is a passing of ftp://netperf.org/retrans_question/for_netdev.pcap
>> through tcptrace -nG and zoomed-in to the end.  I've seen this with a 3.2.0
>> kernel as the sender, have reports of it happening with whatever is in
>> Fedora Core 20, and the traces above are from a 3.11.0 kernel as the sender.
>>
>> The large quantity of (likely) unnecessary retransmissions shouldn't be
>> triggering a RST by the receiver, but the failures consistently show that
>> and I was wondering if the (spurious) retransmissions were perhaps
>> "encouraged" (so to speak) by the lack of TCP Timestamps.
>>
>> happy benchmarking,
>>
>> rick jones
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 18:50   ` Rick Jones
@ 2014-03-04 19:14     ` John Heffner
  2014-03-04 19:33       ` Rick Jones
  0 siblings, 1 reply; 11+ messages in thread
From: John Heffner @ 2014-03-04 19:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: Netdev

On Tue, Mar 4, 2014 at 1:50 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 03/03/2014 07:22 PM, John Heffner wrote:
>> If you look where things really go wrong, the receiver is sending
>> anomalous SACK blocks that will trigger the SACK renege handling path.
>>   Reneging triggers go-back-n behavior, so we see the spurious
>> retransmits from there on.
>
>
> Should those subsequent ACKs be clocking-out additional retransmissions like
> they appear to do? (Assuming I'm not projecting into the trace) Or is that
> an unavoidable consequence of there being no timestamps with which to tell
> which send was being ACKed?

It's possible that if they had timestamps, the ACKs for for original
transmits might not open up the window, thus preventing the
retransmits from going out.  But I'm not nearly familiar enough with
the details of the current retransmit machinery to know for sure.
Reneging like this (go-back-n before a timeout) is not a
commonly-exercised case, so I wouldn't be too surprised if the sender
isn't doing the smartest possible thing.


>> The most notable bad segment is this one:
>> 18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.],
>> ack 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}],
>> length 0
>> It contains a SACK block contiguous with the acked seqno.
>
>
> I've gone back through one of the other traces and found the same thing
> therein.
>
>
>> There is some other strangeness just before that, where the SACK
>> block shrinks then grows again.
>
>
> That would be this yes?
>
> 15:20:46.798816 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3660468:3661928, ack 4262, win 297, length 1460
> 15:20:46.799027 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3168256, win 32081, options [nop,nop,sack 1 {3171368:3172828}], length 0
> 15:20:46.799042 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3661928:3664848, ack 4262, win 297, length 2920
> 15:20:46.799465 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3172828}], length 0
> 15:20:46.799479 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3664848:3666308, ack 4262, win 297, length 1460
> 15:20:46.799497 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3174288}], length 0
> 15:20:46.799504 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3175748}], length 0
> 15:20:46.799509 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3666308:3667768, ack 4262, win 297, length 1460
> 15:20:46.799773 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3171176, win 32491, options [nop,nop,sack 1 {3171368:3172828}], length 0
> 15:20:46.799787 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3667768:3669228, ack 4262, win 297, length 1460
> 15:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
> 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}], length 0
> 15:20:46.800081 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
> 3171368:3172828, ack 4262, win 297, length 1460
>
> Might that be packet-reordering in the other direction?  Sadly, I don't have
> good "both sides" traces as the receiving system doesn't seem to capture
> traffic terribly well.  I suppose TCP timestamps might have helped answer
> that question.

Regardless of any possible reordering, in this case we know something
odd is going on in the receiver because ACK advances at the same time
the SACK block shrinks.

  -John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 19:14     ` John Heffner
@ 2014-03-04 19:33       ` Rick Jones
  2014-03-04 20:35         ` Neal Cardwell
  0 siblings, 1 reply; 11+ messages in thread
From: Rick Jones @ 2014-03-04 19:33 UTC (permalink / raw)
  To: John Heffner; +Cc: Netdev

>>> There is some other strangeness just before that, where the SACK
>>> block shrinks then grows again.
>>
>>
>> That would be this yes?
>>
>> 15:20:46.798816 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3660468:3661928, ack 4262, win 297, length 1460
>> 15:20:46.799027 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3168256, win 32081, options [nop,nop,sack 1 {3171368:3172828}], length 0
>> 15:20:46.799042 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3661928:3664848, ack 4262, win 297, length 2920
>> 15:20:46.799465 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3172828}], length 0
>> 15:20:46.799479 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3664848:3666308, ack 4262, win 297, length 1460
>> 15:20:46.799497 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3174288}], length 0
>> 15:20:46.799504 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3175748}], length 0
>> 15:20:46.799509 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3666308:3667768, ack 4262, win 297, length 1460
>> 15:20:46.799773 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3171176, win 32491, options [nop,nop,sack 1 {3171368:3172828}], length 0
>> 15:20:46.799787 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3667768:3669228, ack 4262, win 297, length 1460
>> 15:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>> 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}], length 0
>> 15:20:46.800081 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>> 3171368:3172828, ack 4262, win 297, length 1460
>>
>> Might that be packet-reordering in the other direction?  Sadly, I don't have
>> good "both sides" traces as the receiving system doesn't seem to capture
>> traffic terribly well.  I suppose TCP timestamps might have helped answer
>> that question.
>
> Regardless of any possible reordering, in this case we know something
> odd is going on in the receiver because ACK advances at the same time
> the SACK block shrinks.

Ah yes, I'd not picked-up on that.

thanks,

rick jones

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 19:33       ` Rick Jones
@ 2014-03-04 20:35         ` Neal Cardwell
  2014-03-04 21:56           ` Rick Jones
  2014-03-04 22:23           ` Yuchung Cheng
  0 siblings, 2 replies; 11+ messages in thread
From: Neal Cardwell @ 2014-03-04 20:35 UTC (permalink / raw)
  To: Rick Jones; +Cc: John Heffner, Netdev, Yuchung Cheng

What's the receiver OS in this trace? It's reneging on SACKs. :-) Take
a look at this ACK:

18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: .
4262:4262(0) ack 3171368 win 32716 <nop,nop,sack 1 {3171368:3177208}>

Note that it's ACKing 3171368 and SACKing the adjacent sequence range:
{3171368:3177208}. That's not cool.

I think that's causing the Linux sender to enter the
tcp_check_sack_reneging() code path, which calls tcp_enter_loss().

It seems that the Linux sender did not enable FRTO for that
tcp_enter_loss() invocation. Maybe there is some way we can revise the
logic to enable FRTO in cases like this, so we can detect that the
retransmission was not needed, and abort the stream of spurious
retransmissions...

neal


On Tue, Mar 4, 2014 at 2:33 PM, Rick Jones <rick.jones2@hp.com> wrote:
>>>> There is some other strangeness just before that, where the SACK
>>>> block shrinks then grows again.
>>>
>>>
>>>
>>> That would be this yes?
>>>
>>> 15:20:46.798816 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3660468:3661928, ack 4262, win 297, length 1460
>>> 15:20:46.799027 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3168256, win 32081, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>> 15:20:46.799042 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3661928:3664848, ack 4262, win 297, length 2920
>>> 15:20:46.799465 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>> 15:20:46.799479 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3664848:3666308, ack 4262, win 297, length 1460
>>> 15:20:46.799497 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3174288}], length 0
>>> 15:20:46.799504 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3175748}], length 0
>>> 15:20:46.799509 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3666308:3667768, ack 4262, win 297, length 1460
>>> 15:20:46.799773 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3171176, win 32491, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>> 15:20:46.799787 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3667768:3669228, ack 4262, win 297, length 1460
>>> 15:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>> 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}], length 0
>>> 15:20:46.800081 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>> 3171368:3172828, ack 4262, win 297, length 1460
>>>
>>> Might that be packet-reordering in the other direction?  Sadly, I don't
>>> have
>>> good "both sides" traces as the receiving system doesn't seem to capture
>>> traffic terribly well.  I suppose TCP timestamps might have helped answer
>>> that question.
>>
>>
>> Regardless of any possible reordering, in this case we know something
>> odd is going on in the receiver because ACK advances at the same time
>> the SACK block shrinks.
>
>
> Ah yes, I'd not picked-up on that.
>
> thanks,
>
> rick jones
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 20:35         ` Neal Cardwell
@ 2014-03-04 21:56           ` Rick Jones
  2014-03-04 22:23           ` Yuchung Cheng
  1 sibling, 0 replies; 11+ messages in thread
From: Rick Jones @ 2014-03-04 21:56 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: John Heffner, Netdev, Yuchung Cheng

On 03/04/2014 12:35 PM, Neal Cardwell wrote:
> What's the receiver OS in this trace?

It is my understanding that the receiver has a FreeBSD-based stack.

rick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 20:35         ` Neal Cardwell
  2014-03-04 21:56           ` Rick Jones
@ 2014-03-04 22:23           ` Yuchung Cheng
  2014-03-04 23:14             ` Rick Jones
  1 sibling, 1 reply; 11+ messages in thread
From: Yuchung Cheng @ 2014-03-04 22:23 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Rick Jones, John Heffner, Netdev

On Tue, Mar 4, 2014 at 12:35 PM, Neal Cardwell <ncardwell@google.com> wrote:
> What's the receiver OS in this trace? It's reneging on SACKs. :-) Take
> a look at this ACK:
>
> 18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: .
> 4262:4262(0) ack 3171368 win 32716 <nop,nop,sack 1 {3171368:3177208}>
>
> Note that it's ACKing 3171368 and SACKing the adjacent sequence range:
> {3171368:3177208}. That's not cool.
>
> I think that's causing the Linux sender to enter the
> tcp_check_sack_reneging() code path, which calls tcp_enter_loss().
>
> It seems that the Linux sender did not enable FRTO for that
> tcp_enter_loss() invocation. Maybe there is some way we can revise the
> logic to enable FRTO in cases like this, so we can detect that the
> retransmission was not needed, and abort the stream of spurious
> retransmissions...
Sure we can try:

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6e48093..735ece6 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1972,7 +1972,7 @@ void tcp_enter_loss(struct sock *sk, int how)
         * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing
         */
        tp->frto = sysctl_tcp_frto &&
-                  (new_recovery || icsk->icsk_retransmits) &&
+                  (new_recovery || icsk->icsk_retransmits || how) &&
                   !inet_csk(sk)->icsk_mtup.probe_size;
 }


However that only works if we got new data to send. For a better
solution, with the lack of TS option or DSACK support, we can
1) use Neal's neat idea to send a different size packet on the first
retransmission after timeout, and use that to distinguish if the ACK
is for the original or retry.

2) Do not blindly marked any packet unsacked lost in tcp_enter_loss;
My idea would be to do that only if the packet was sent min_rtt ago;

I can try to implement these ideas if people are interested.

>
> neal
>
>
> On Tue, Mar 4, 2014 at 2:33 PM, Rick Jones <rick.jones2@hp.com> wrote:
>>>>> There is some other strangeness just before that, where the SACK
>>>>> block shrinks then grows again.
>>>>
>>>>
>>>>
>>>> That would be this yes?
>>>>
>>>> 15:20:46.798816 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3660468:3661928, ack 4262, win 297, length 1460
>>>> 15:20:46.799027 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3168256, win 32081, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>>> 15:20:46.799042 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3661928:3664848, ack 4262, win 297, length 2920
>>>> 15:20:46.799465 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>>> 15:20:46.799479 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3664848:3666308, ack 4262, win 297, length 1460
>>>> 15:20:46.799497 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3174288}], length 0
>>>> 15:20:46.799504 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3169716, win 32241, options [nop,nop,sack 1 {3171368:3175748}], length 0
>>>> 15:20:46.799509 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3666308:3667768, ack 4262, win 297, length 1460
>>>> 15:20:46.799773 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3171176, win 32491, options [nop,nop,sack 1 {3171368:3172828}], length 0
>>>> 15:20:46.799787 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3667768:3669228, ack 4262, win 297, length 1460
>>>> 15:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: Flags [.], ack
>>>> 3171368, win 32716, options [nop,nop,sack 1 {3171368:3177208}], length 0
>>>> 15:20:46.800081 IP 91.216.86.7.56064 > 75.236.145.7.443: Flags [.], seq
>>>> 3171368:3172828, ack 4262, win 297, length 1460
>>>>
>>>> Might that be packet-reordering in the other direction?  Sadly, I don't
>>>> have
>>>> good "both sides" traces as the receiving system doesn't seem to capture
>>>> traffic terribly well.  I suppose TCP timestamps might have helped answer
>>>> that question.
>>>
>>>
>>> Regardless of any possible reordering, in this case we know something
>>> odd is going on in the receiver because ACK advances at the same time
>>> the SACK block shrinks.
>>
>>
>> Ah yes, I'd not picked-up on that.
>>
>> thanks,
>>
>> rick jones
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04 22:23           ` Yuchung Cheng
@ 2014-03-04 23:14             ` Rick Jones
  0 siblings, 0 replies; 11+ messages in thread
From: Rick Jones @ 2014-03-04 23:14 UTC (permalink / raw)
  To: Yuchung Cheng, Neal Cardwell; +Cc: John Heffner, Netdev

On 03/04/2014 02:23 PM, Yuchung Cheng wrote:
> On Tue, Mar 4, 2014 at 12:35 PM, Neal Cardwell <ncardwell@google.com> wrote:
>> What's the receiver OS in this trace? It's reneging on SACKs. :-) Take
>> a look at this ACK:
>>
>> 18:20:46.800063 IP 75.236.145.7.443 > 91.216.86.7.56064: .
>> 4262:4262(0) ack 3171368 win 32716 <nop,nop,sack 1 {3171368:3177208}>
>>
>> Note that it's ACKing 3171368 and SACKing the adjacent sequence range:
>> {3171368:3177208}. That's not cool.
>>
>> I think that's causing the Linux sender to enter the
>> tcp_check_sack_reneging() code path, which calls tcp_enter_loss().
>>
>> It seems that the Linux sender did not enable FRTO for that
>> tcp_enter_loss() invocation. Maybe there is some way we can revise the
>> logic to enable FRTO in cases like this, so we can detect that the
>> retransmission was not needed, and abort the stream of spurious
>> retransmissions...
> Sure we can try:
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 6e48093..735ece6 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1972,7 +1972,7 @@ void tcp_enter_loss(struct sock *sk, int how)
>           * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing
>           */
>          tp->frto = sysctl_tcp_frto &&
> -                  (new_recovery || icsk->icsk_retransmits) &&
> +                  (new_recovery || icsk->icsk_retransmits || how) &&
>                     !inet_csk(sk)->icsk_mtup.probe_size;
>   }
>
>
> However that only works if we got new data to send. For a better
> solution, with the lack of TS option or DSACK support, we can
> 1) use Neal's neat idea to send a different size packet on the first
> retransmission after timeout, and use that to distinguish if the ACK
> is for the original or retry.

What would one do if the ACK arriving after the short retransmission was 
farther to the right than the end of the original packet?  Won't that be 
ambiguous?

> 2) Do not blindly marked any packet unsacked lost in tcp_enter_loss;
> My idea would be to do that only if the packet was sent min_rtt ago;
>
> I can try to implement these ideas if people are interested.

If these near-heroics are unnecessary if timestamps are present, I'm not 
sure I'd push too hard.  Unless you think that timestamps not being 
present is sufficiently common.

rick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04  3:22 ` John Heffner
  2014-03-04 18:50   ` Rick Jones
@ 2014-03-21 21:53   ` Rick Jones
  1 sibling, 0 replies; 11+ messages in thread
From: Rick Jones @ 2014-03-21 21:53 UTC (permalink / raw)
  To: John Heffner; +Cc: Netdev

On 03/03/2014 07:22 PM, John Heffner wrote:
> Running with such a large window scale and no timestamps (PAWS
> protection) is generally not a great idea, but I don't think is part
> of the issue here.
>
> If you look where things really go wrong, the receiver is sending
> anomalous SACK blocks that will trigger the SACK renege handling path.
>   Reneging triggers go-back-n behavior, so we see the spurious
> retransmits from there on.

What triggers go-back-n when SACK is not in use?  I ask because at least 
once I have seen the same sort of thing without SACK enabled on the 
connection.  The total quantity of retransmissions is roughly the same, 
but spread-out - looks like cwnd shrinks considerably and re-grows in 
the no-SACK case.  Not sure I still have that trace though :(

rick jones

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: TCP being hoodwinked into spurious retransmissions by lack of timestamps?
  2014-03-04  0:29 TCP being hoodwinked into spurious retransmissions by lack of timestamps? Rick Jones
  2014-03-04  3:22 ` John Heffner
@ 2014-03-25 17:39 ` Rick Jones
  1 sibling, 0 replies; 11+ messages in thread
From: Rick Jones @ 2014-03-25 17:39 UTC (permalink / raw)
  To: netdev

On 03/03/2014 04:29 PM, Rick Jones wrote:
> I've been looking at some packet traces of an application looking to
> upload a Large Quantity (tm) of data to a server across the Big Bad
> Internet (tm).  They've been Linux senders, and the destination while
> supporting SACK and window scaling does not support TCP timestamps. (TCP
> timestamp support was requested of the supplier of said server many many
> months ago now.)
>
> This destination system has been issuing RSTs at seemingly random points
> in the middle of a large fraction of the attempted transfers.  In
> looking at the traces, they all seem to be variations on the theme of
> what is shown by:
>
> ftp://netperf.org/retrans_question/for_netdev.png
>
> which is a passing of ftp://netperf.org/retrans_question/for_netdev.pcap
> through tcptrace -nG and zoomed-in to the end.  I've seen this with a
> 3.2.0 kernel as the sender, have reports of it happening with whatever
> is in Fedora Core 20, and the traces above are from a 3.11.0 kernel as
> the sender.
>
> The large quantity of (likely) unnecessary retransmissions shouldn't be
> triggering a RST by the receiver, but the failures consistently show
> that and I was wondering if the (spurious) retransmissions were perhaps
> "encouraged" (so to speak) by the lack of TCP Timestamps.

I have learned why the receiving TCP has reset the connection.  It would 
seem that stack has a heuristic whereby if it receives more than 255 
retransmissions in a window it will abort the connection.

rick jones

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-03-25 17:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04  0:29 TCP being hoodwinked into spurious retransmissions by lack of timestamps? Rick Jones
2014-03-04  3:22 ` John Heffner
2014-03-04 18:50   ` Rick Jones
2014-03-04 19:14     ` John Heffner
2014-03-04 19:33       ` Rick Jones
2014-03-04 20:35         ` Neal Cardwell
2014-03-04 21:56           ` Rick Jones
2014-03-04 22:23           ` Yuchung Cheng
2014-03-04 23:14             ` Rick Jones
2014-03-21 21:53   ` Rick Jones
2014-03-25 17:39 ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).