Re: TCP connection issues against Amazon S3

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: TCP connection issues against Amazon S3
       [not found] <5DCDADEF-FF9C-4844-8A2C-62E2D3B3B8CE@bengler.no>
@ 2015-01-06 16:04 ` Eric Dumazet
  2015-01-06 16:11   ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-06 16:04 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: linux-kernel, Yuchung Cheng, netdev

On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
> (CCing Yuchung, as his name comes up in the relevant commits)
> 
> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
> intermittent TCP connection hangs for HTTP image requests against
> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
> transfer before timing out. We see this problem across a range of
> servers, in several data centres and networks, all located in Norway.
> 
> A packet dump [1] shows repeated ACK retransmits for some of the
> requests. Using Ubuntu mainline kernels, we found the problem to have
> been introduced between 3.11.10 and 3.12.0, possibly in
> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
> obvious drawbacks for transfer speeds. Other sysctls do not seem to
> affect it.
> 
> I am not sure if this is fundamentally a kernel bug or a network
> issue, but we did not see this problem with older kernels.
> 
> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2--


CC netdev

This looks like the bug we fixed here :

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359

Could you post output of 'nstat' command ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 16:04 ` TCP connection issues against Amazon S3 Eric Dumazet
@ 2015-01-06 16:11   ` Erik Grinaker
  2015-01-06 17:20     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Yuchung Cheng, netdev


> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>> (CCing Yuchung, as his name comes up in the relevant commits)
>> 
>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>> intermittent TCP connection hangs for HTTP image requests against
>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>> transfer before timing out. We see this problem across a range of
>> servers, in several data centres and networks, all located in Norway.
>> 
>> A packet dump [1] shows repeated ACK retransmits for some of the
>> requests. Using Ubuntu mainline kernels, we found the problem to have
>> been introduced between 3.11.10 and 3.12.0, possibly in
>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>> affect it.
>> 
>> I am not sure if this is fundamentally a kernel bug or a network
>> issue, but we did not see this problem with older kernels.
>> 
>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
> 
> 
> CC netdev
> 
> This looks like the bug we fixed here :
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359

Has that patch gone into a release? Because the problem persists with 3.18.1.

> Could you post output of 'nstat' command ?

Sure:

#kernel
IpInReceives                    1676194            0.0
IpInDelivers                    1676115            0.0
IpOutRequests                   1163105            0.0
IcmpInErrors                    826                0.0
IcmpInEchoReps                  826                0.0
IcmpOutErrors                   824                0.0
IcmpOutTimeExcds                10                 0.0
IcmpOutTimestamps               814                0.0
IcmpMsgInType8                  826                0.0
IcmpMsgOutType0                 814                0.0
IcmpMsgOutType3                 10                 0.0
TcpActiveOpens                  2004               0.0
TcpPassiveOpens                 1102               0.0
TcpAttemptFails                 5                  0.0
TcpEstabResets                  34                 0.0
TcpInSegs                       1667699            0.0
TcpOutSegs                      1159990            0.0
TcpRetransSegs                  26                 0.0
TcpOutRsts                      504                0.0
UdpInDatagrams                  2002               0.0
UdpOutDatagrams                 2087               0.0
Ip6InReceives                   1                  0.0
Ip6InNoRoutes                   1                  0.0
Ip6OutRequests                  10                 0.0
Ip6OutDiscards                  3                  0.0
Ip6OutNoRoutes                  1                  0.0
Ip6OutMcastPkts                 7                  0.0
Ip6InOctets                     76                 0.0
Ip6OutOctets                    728                0.0
Ip6OutMcastOctets               520                0.0
Ip6InNoECTPkts                  1                  0.0
Icmp6OutMsgs                    4                  0.0
Icmp6OutNeighborSolicits        1                  0.0
Icmp6OutMLDv2Reports            3                  0.0
Icmp6OutType135                 1                  0.0
Icmp6OutType143                 3                  0.0
TcpExtEmbryonicRsts             5                  0.0
TcpExtPruneCalled               1                  0.0
TcpExtTW                        1897               0.0
TcpExtDelayedACKs               3058               0.0
TcpExtDelayedACKLocked          13                 0.0
TcpExtDelayedACKLost            2330               0.0
TcpExtTCPPrequeued              3084               0.0
TcpExtTCPDirectCopyFromPrequeue 10944              0.0
TcpExtTCPHPHits                 1246417            0.0
TcpExtTCPPureAcks               7512               0.0
TcpExtTCPHPAcks                 3219               0.0
TcpExtTCPSackRecovery           2                  0.0
TcpExtTCPLossUndo               4                  0.0
TcpExtTCPFastRetrans            2                  0.0
TcpExtTCPTimeouts               18                 0.0
TcpExtTCPLossProbes             145                0.0
TcpExtTCPLossProbeRecovery      125                0.0
TcpExtTCPSackRecoveryFail       1                  0.0
TcpExtTCPRcvCollapsed           22                 0.0
TcpExtTCPDSACKOldSent           43                 0.0
TcpExtTCPDSACKRecv              3                  0.0
TcpExtTCPAbortOnData            113                0.0
TcpExtTCPDSACKIgnoredNoUndo     3                  0.0
TcpExtTCPSpuriousRTOs           2                  0.0
TcpExtTCPSackShiftFallback      3                  0.0
TcpExtTCPRcvCoalesce            927994             0.0
TcpExtTCPOFOQueue               300911             0.0
TcpExtTCPOFOMerge               76                 0.0
TcpExtTCPSpuriousRtxHostQueues  24                 0.0
IpExtInBcastPkts                5588               0.0
IpExtInOctets                   2454079082         0.0
IpExtOutOctets                  56232776           0.0
IpExtInBcastOctets              3218688            0.0
IpExtInNoECTPkts                1676194            0.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 16:11   ` Erik Grinaker
@ 2015-01-06 17:20     ` Eric Dumazet
  2015-01-06 18:17       ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-06 17:20 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: linux-kernel, Yuchung Cheng, netdev

On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
> > On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
> >> (CCing Yuchung, as his name comes up in the relevant commits)
> >> 
> >> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
> >> intermittent TCP connection hangs for HTTP image requests against
> >> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
> >> transfer before timing out. We see this problem across a range of
> >> servers, in several data centres and networks, all located in Norway.
> >> 
> >> A packet dump [1] shows repeated ACK retransmits for some of the
> >> requests. Using Ubuntu mainline kernels, we found the problem to have
> >> been introduced between 3.11.10 and 3.12.0, possibly in
> >> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
> >> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
> >> obvious drawbacks for transfer speeds. Other sysctls do not seem to
> >> affect it.
> >> 
> >> I am not sure if this is fundamentally a kernel bug or a network
> >> issue, but we did not see this problem with older kernels.
> >> 
> >> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
> > 
> > 
> > CC netdev
> > 
> > This looks like the bug we fixed here :
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
> 
> Has that patch gone into a release? Because the problem persists with 3.18.1.

Patch is in 3.18.1 yes.

So thats a separate issue. 

Can you confirm pcap was taken at receiver (195.159.221.106), not sender
(54.231.136.74) , and on which host is running the 'buggy kernel' ?

If the sender is broken, changing the kernel on receiver wont help.

BTW not using sack (on 54.231.132.98) is terrible for performance in
lossy environments.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 17:20     ` Eric Dumazet
@ 2015-01-06 18:17       ` Erik Grinaker
  2015-01-06 18:32         ` Eric Dumazet
  2015-01-06 18:33         ` Yuchung Cheng
  0 siblings, 2 replies; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 18:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Yuchung Cheng, netdev


> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>> 
>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>> intermittent TCP connection hangs for HTTP image requests against
>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>> transfer before timing out. We see this problem across a range of
>>>> servers, in several data centres and networks, all located in Norway.
>>>> 
>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>> affect it.
>>>> 
>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>> issue, but we did not see this problem with older kernels.
>>>> 
>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>> 
>>> 
>>> CC netdev
>>> 
>>> This looks like the bug we fixed here :
>>> 
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>> 
>> Has that patch gone into a release? Because the problem persists with 3.18.1.
> 
> Patch is in 3.18.1 yes.
> 
> So thats a separate issue. 
> 
> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
> (54.231.136.74) , and on which host is running the 'buggy kernel' ?

Yes, pcap was taken on receiver (195.159.221.106).

> If the sender is broken, changing the kernel on receiver wont help.
> 
> BTW not using sack (on 54.231.132.98) is terrible for performance in
> lossy environments.

It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.

If you are confident that the problem is with Amazon, I can get in touch with their engineering department.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 18:17       ` Erik Grinaker
@ 2015-01-06 18:32         ` Eric Dumazet
  2015-01-06 18:33         ` Yuchung Cheng
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2015-01-06 18:32 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: linux-kernel, Yuchung Cheng, netdev

On Tue, 2015-01-06 at 18:17 +0000, Erik Grinaker wrote:

> Yes, pcap was taken on receiver (195.159.221.106).
> 
> > If the sender is broken, changing the kernel on receiver wont help.
> > 
> > BTW not using sack (on 54.231.132.98) is terrible for performance in
> > lossy environments.
> 
> It may well be that the sender is broken; however, the sender is
> Amazon S3, so I do not have any control over it. And in any case, the
> problem goes away with 3.11.10 on receiver, but persists with 3.12.0
> (or later) on receiver, so there must be some change in 3.12.0 which
> has caused this to trigger.

In fact I saw nothing obviously wrong in pcap (but I have not done a
full analysis)

It might simply be an application bug, triggering a timeout too soon.

A kernel change can be good, but by changing timings a bit, trigger
application bugs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 18:17       ` Erik Grinaker
  2015-01-06 18:32         ` Eric Dumazet
@ 2015-01-06 18:33         ` Yuchung Cheng
  2015-01-06 19:01           ` Erik Grinaker
  2015-01-06 19:16           ` Rick Jones
  1 sibling, 2 replies; 26+ messages in thread
From: Yuchung Cheng @ 2015-01-06 18:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On Tue, Jan 6, 2015 at 10:17 AM, Erik Grinaker <erik@bengler.no> wrote:
>
>> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>>>
>>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>>> intermittent TCP connection hangs for HTTP image requests against
>>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>>> transfer before timing out. We see this problem across a range of
>>>>> servers, in several data centres and networks, all located in Norway.
>>>>>
>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?

I am trying to understand the problem. Could you confirm that it's the
HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
from the receiver (your host)?

btw I suspect some middleboxes are stripping SACKOK options from your
SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.



>>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>>> affect it.
>>>>>
>>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>>> issue, but we did not see this problem with older kernels.
>>>>>
>>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>>>
>>>>
>>>> CC netdev
>>>>
>>>> This looks like the bug we fixed here :
>>>>
>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>>>
>>> Has that patch gone into a release? Because the problem persists with 3.18.1.
>>
>> Patch is in 3.18.1 yes.
>>
>> So thats a separate issue.
>>
>> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
>> (54.231.136.74) , and on which host is running the 'buggy kernel' ?
>
> Yes, pcap was taken on receiver (195.159.221.106).
>
>> If the sender is broken, changing the kernel on receiver wont help.
>>
>> BTW not using sack (on 54.231.132.98) is terrible for performance in
>> lossy environments.
>
> It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.
>
> If you are confident that the problem is with Amazon, I can get in touch with their engineering department.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 18:33         ` Yuchung Cheng
@ 2015-01-06 19:01           ` Erik Grinaker
  2015-01-06 19:18             ` Yuchung Cheng
  2015-01-06 19:16           ` Rick Jones
  1 sibling, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 19:01 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev


> On 06 Jan 2015, at 18:33, Yuchung Cheng <ycheng@google.com> wrote:
> 
> On Tue, Jan 6, 2015 at 10:17 AM, Erik Grinaker <erik@bengler.no> wrote:
>> 
>>> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>>>> 
>>>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>>>> intermittent TCP connection hangs for HTTP image requests against
>>>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>>>> transfer before timing out. We see this problem across a range of
>>>>>> servers, in several data centres and networks, all located in Norway.
>>>>>> 
>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?

Ah, sorry, they are indeed DUPACKs; I thought they were the same thing.

> I am trying to understand the problem. Could you confirm that it's the
> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
> from the receiver (your host)?

Yes. We run HTTP GET requests against S3 for images (typically a few megs in size). Once in a while, the response transfer stalls about halfway through, until the client (Curl) times out. The packet dump shows loads of DUPACKs early on, then TCP retransmissions until the connection is closed.

> btw I suspect some middleboxes are stripping SACKOK options from your
> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.

That may be. I just tested this on a server in the Netherlands, and I can not reproduce the problem there, while I can reproduce it from multiple locations and ISPs in Norway. Would it be helpful to have a packet dump from the functioning Netherlands server as well?


>>>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>>>> affect it.
>>>>>> 
>>>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>>>> issue, but we did not see this problem with older kernels.
>>>>>> 
>>>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>>>> 
>>>>> 
>>>>> CC netdev
>>>>> 
>>>>> This looks like the bug we fixed here :
>>>>> 
>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>>>> 
>>>> Has that patch gone into a release? Because the problem persists with 3.18.1.
>>> 
>>> Patch is in 3.18.1 yes.
>>> 
>>> So thats a separate issue.
>>> 
>>> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
>>> (54.231.136.74) , and on which host is running the 'buggy kernel' ?
>> 
>> Yes, pcap was taken on receiver (195.159.221.106).
>> 
>>> If the sender is broken, changing the kernel on receiver wont help.
>>> 
>>> BTW not using sack (on 54.231.132.98) is terrible for performance in
>>> lossy environments.
>> 
>> It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.
>> 
>> If you are confident that the problem is with Amazon, I can get in touch with their engineering department.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 18:33         ` Yuchung Cheng
  2015-01-06 19:01           ` Erik Grinaker
@ 2015-01-06 19:16           ` Rick Jones
  2015-01-06 19:48             ` Rick Jones
  2015-01-06 19:50             ` Erik Grinaker
  1 sibling, 2 replies; 26+ messages in thread
From: Rick Jones @ 2015-01-06 19:16 UTC (permalink / raw)
  To: Yuchung Cheng, Erik Grinaker
  Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?
>
> I am trying to understand the problem. Could you confirm that it's the
> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
> from the receiver (your host)?
>
> btw I suspect some middleboxes are stripping SACKOK options from your
> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.

The TCP Timestamp option too it seems.

Speaking of middleboxes...  It is probably a fish that is red, but a 
while back I stepped in a middle box (a load balancer) which decided 
that if it saw "too many" retransmissions in a given TCP window that 
something was seriously wrong and it would toast the connection.  I 
thought though that was an active reset on the part of the middlebox. 
(And the client was the active sender not the back-end server)

I'm assuming one incident starts at XX:41:24.748265 in the trace?  That 
does look like it is slowly slogging its way through a bunch of lost 
traffic, which was I think part of the problem I was seeing with the 
middlebox I stepped in, but I don't think I see the reset where I would 
have expected it.  Still, it looks like the sender has an increasing TCP 
RTO as it is going through the slog (as it likely must since there are 
no TCP timestamps?), to the point it gets larger than I'm guessing curl 
was willing to wait, so the FIN at XX:41:53.269534 after a ten second or 
so gap.

rick jones

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 19:01           ` Erik Grinaker
@ 2015-01-06 19:18             ` Yuchung Cheng
  2015-01-06 19:42               ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Yuchung Cheng @ 2015-01-06 19:18 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On Tue, Jan 6, 2015 at 11:01 AM, Erik Grinaker <erik@bengler.no> wrote:
>
>> On 06 Jan 2015, at 18:33, Yuchung Cheng <ycheng@google.com> wrote:
>>
>> On Tue, Jan 6, 2015 at 10:17 AM, Erik Grinaker <erik@bengler.no> wrote:
>>>
>>>> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>>>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>>>>>
>>>>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>>>>> intermittent TCP connection hangs for HTTP image requests against
>>>>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>>>>> transfer before timing out. We see this problem across a range of
>>>>>>> servers, in several data centres and networks, all located in Norway.
>>>>>>>
>>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?
>
> Ah, sorry, they are indeed DUPACKs; I thought they were the same thing.
>
>> I am trying to understand the problem. Could you confirm that it's the
>> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
>> from the receiver (your host)?
>
> Yes. We run HTTP GET requests against S3 for images (typically a few megs in size). Once in a while, the response transfer stalls about halfway through, until the client (Curl) times out. The packet dump shows loads of DUPACKs early on, then TCP retransmissions until the connection is closed.

Without SACK, the sender uses NewReno fast recovery and recovers one
packet per RTT. In contrast, SACK-based fast recovery can potentially
recover all lost packets in one RTT.

I still can't explain the problem seen on newer kernel. But that got
to be some receiver related changes, not
0f7cc9a3c2bd89b15720dbf358e9b9e62af27126 b/c it's a sender side
change.

>
>> btw I suspect some middleboxes are stripping SACKOK options from your
>> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.
>
> That may be. I just tested this on a server in the Netherlands, and I can not reproduce the problem there, while I can reproduce it from multiple locations and ISPs in Norway. Would it be helpful to have a packet dump from the functioning Netherlands server as well?



>
>
>>>>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>>>>> affect it.
>>>>>>>
>>>>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>>>>> issue, but we did not see this problem with older kernels.
>>>>>>>
>>>>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>>>>>
>>>>>>
>>>>>> CC netdev
>>>>>>
>>>>>> This looks like the bug we fixed here :
>>>>>>
>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>>>>>
>>>>> Has that patch gone into a release? Because the problem persists with 3.18.1.
>>>>
>>>> Patch is in 3.18.1 yes.
>>>>
>>>> So thats a separate issue.
>>>>
>>>> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
>>>> (54.231.136.74) , and on which host is running the 'buggy kernel' ?
>>>
>>> Yes, pcap was taken on receiver (195.159.221.106).
>>>
>>>> If the sender is broken, changing the kernel on receiver wont help.
>>>>
>>>> BTW not using sack (on 54.231.132.98) is terrible for performance in
>>>> lossy environments.
>>>
>>> It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.
>>>
>>> If you are confident that the problem is with Amazon, I can get in touch with their engineering department.
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 19:18             ` Yuchung Cheng
@ 2015-01-06 19:42               ` Erik Grinaker
  2015-01-06 20:13                 ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 19:42 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev


> On 06 Jan 2015, at 19:18, Yuchung Cheng <ycheng@google.com> wrote:
> 
> On Tue, Jan 6, 2015 at 11:01 AM, Erik Grinaker <erik@bengler.no> wrote:
>> 
>>> On 06 Jan 2015, at 18:33, Yuchung Cheng <ycheng@google.com> wrote:
>>> 
>>> On Tue, Jan 6, 2015 at 10:17 AM, Erik Grinaker <erik@bengler.no> wrote:
>>>> 
>>>>> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>>>>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>>>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>>>>>> 
>>>>>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>>>>>> intermittent TCP connection hangs for HTTP image requests against
>>>>>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>>>>>> transfer before timing out. We see this problem across a range of
>>>>>>>> servers, in several data centres and networks, all located in Norway.
>>>>>>>> 
>>>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>>> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?
>> 
>> Ah, sorry, they are indeed DUPACKs; I thought they were the same thing.
>> 
>>> I am trying to understand the problem. Could you confirm that it's the
>>> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
>>> from the receiver (your host)?
>> 
>> Yes. We run HTTP GET requests against S3 for images (typically a few megs in size). Once in a while, the response transfer stalls about halfway through, until the client (Curl) times out. The packet dump shows loads of DUPACKs early on, then TCP retransmissions until the connection is closed.
> 
> Without SACK, the sender uses NewReno fast recovery and recovers one
> packet per RTT. In contrast, SACK-based fast recovery can potentially
> recover all lost packets in one RTT.

The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.

For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:

http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2


> I still can't explain the problem seen on newer kernel. But that got
> to be some receiver related changes, not
> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126 b/c it's a sender side
> change.

Yeah, I’m not really sure what exactly in 3.12.0 is causing it, that just seemed like a possible candidate to my untrained eye.


>>> btw I suspect some middleboxes are stripping SACKOK options from your
>>> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.
>> 
>> That may be. I just tested this on a server in the Netherlands, and I can not reproduce the problem there, while I can reproduce it from multiple locations and ISPs in Norway. Would it be helpful to have a packet dump from the functioning Netherlands server as well?
> 
> 
>> 
>> 
>>>>>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>>>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>>>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>>>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>>>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>>>>>> affect it.
>>>>>>>> 
>>>>>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>>>>>> issue, but we did not see this problem with older kernels.
>>>>>>>> 
>>>>>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>>>>>> 
>>>>>>> 
>>>>>>> CC netdev
>>>>>>> 
>>>>>>> This looks like the bug we fixed here :
>>>>>>> 
>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>>>>>> 
>>>>>> Has that patch gone into a release? Because the problem persists with 3.18.1.
>>>>> 
>>>>> Patch is in 3.18.1 yes.
>>>>> 
>>>>> So thats a separate issue.
>>>>> 
>>>>> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
>>>>> (54.231.136.74) , and on which host is running the 'buggy kernel' ?
>>>> 
>>>> Yes, pcap was taken on receiver (195.159.221.106).
>>>> 
>>>>> If the sender is broken, changing the kernel on receiver wont help.
>>>>> 
>>>>> BTW not using sack (on 54.231.132.98) is terrible for performance in
>>>>> lossy environments.
>>>> 
>>>> It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.
>>>> 
>>>> If you are confident that the problem is with Amazon, I can get in touch with their engineering department.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 19:16           ` Rick Jones
@ 2015-01-06 19:48             ` Rick Jones
  2015-01-06 19:50             ` Erik Grinaker
  1 sibling, 0 replies; 26+ messages in thread
From: Rick Jones @ 2015-01-06 19:48 UTC (permalink / raw)
  To: Yuchung Cheng, Erik Grinaker
  Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On 01/06/2015 11:16 AM, Rick Jones wrote:
> I'm assuming one incident starts at XX:41:24.748265 in the trace?  That
> does look like it is slowly slogging its way through a bunch of lost
> traffic, which was I think part of the problem I was seeing with the
> middlebox I stepped in, but I don't think I see the reset where I would
> have expected it.  Still, it looks like the sender has an increasing TCP
> RTO as it is going through the slog (as it likely must since there are
> no TCP timestamps?), to the point it gets larger than I'm guessing curl
> was willing to wait, so the FIN at XX:41:53.269534 after a ten second or
> so gap.

Should the receiver's autotuning be advertising an ever larger window 
the way it is while going through the slog of lost traffic?

rick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 19:16           ` Rick Jones
  2015-01-06 19:48             ` Rick Jones
@ 2015-01-06 19:50             ` Erik Grinaker
  1 sibling, 0 replies; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 19:50 UTC (permalink / raw)
  To: Rick Jones
  Cc: Yuchung Cheng, Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On 06 Jan 2015, at 19:16, Rick Jones <rick.jones2@hp.com> wrote:
> 
>>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?
>> 
>> I am trying to understand the problem. Could you confirm that it's the
>> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
>> from the receiver (your host)?
>> 
>> btw I suspect some middleboxes are stripping SACKOK options from your
>> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.
> 
> The TCP Timestamp option too it seems.
> 
> Speaking of middleboxes...  It is probably a fish that is red, but a while back I stepped in a middle box (a load balancer) which decided that if it saw "too many" retransmissions in a given TCP window that something was seriously wrong and it would toast the connection.  I thought though that was an active reset on the part of the middlebox. (And the client was the active sender not the back-end server)

It’s looking increasingly probable that it’s something like that, since the sender (S3) appears to disable SACKs on the failing clients, while it enables SACKs on other functioning clients.

> I'm assuming one incident starts at XX:41:24.748265 in the trace?  That does look like it is slowly slogging its way through a bunch of lost traffic, which was I think part of the problem I was seeing with the middlebox I stepped in, but I don't think I see the reset where I would have expected it.  Still, it looks like the sender has an increasing TCP RTO as it is going through the slog (as it likely must since there are no TCP timestamps?), to the point it gets larger than I'm guessing curl was willing to wait, so the FIN at XX:41:53.269534 after a ten second or so gap.

Yes, there is one incident starting at XX:41:23. All the RSTs are sent at the end though, at the 30s Curl timeout. I’ve put up a stripped down pcap of a single request here:

http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 19:42               ` Erik Grinaker
@ 2015-01-06 20:13                 ` Eric Dumazet
  2015-01-06 20:26                   ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-06 20:13 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:

> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
> 
> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
> 

Although sender seems to be reluctant to retransmit, this 'failure' is
caused by receiver closing the connection too soon.

Are you sure you do not ask curl to setup a very small completion
timer ?

12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0

-> Here receiver sends a FIN, because application closed the socket (or died)
12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0

Anyway, getting decent speed without SACK is going to be hard.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 20:13                 ` Eric Dumazet
@ 2015-01-06 20:26                   ` Erik Grinaker
  2015-01-06 21:04                     ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 20:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev


> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
> 
>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>> 
>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>> 
> 
> Although sender seems to be reluctant to retransmit, this 'failure' is
> caused by receiver closing the connection too soon.
> 
> Are you sure you do not ask curl to setup a very small completion
> timer ?

For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).

> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
> 
> -> Here receiver sends a FIN, because application closed the socket (or died)
> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
> 
> Anyway, getting decent speed without SACK is going to be hard.

Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 20:26                   ` Erik Grinaker
@ 2015-01-06 21:04                     ` Erik Grinaker
  2015-01-06 22:00                       ` Yuchung Cheng
  2015-01-07  1:23                       ` Lukas Tribus
  0 siblings, 2 replies; 26+ messages in thread
From: Erik Grinaker @ 2015-01-06 21:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev


> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
> 
>> 
>> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
>> 
>>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>>> 
>>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>>> 
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>>> 
>> 
>> Although sender seems to be reluctant to retransmit, this 'failure' is
>> caused by receiver closing the connection too soon.
>> 
>> Are you sure you do not ask curl to setup a very small completion
>> timer ?
> 
> For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).
> 
>> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
>> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
>> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
>> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
>> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
>> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
>> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
>> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
>> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
>> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
>> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
>> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
>> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
>> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
>> 
>> -> Here receiver sends a FIN, because application closed the socket (or died)
>> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
>> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
>> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
>> 
>> Anyway, getting decent speed without SACK is going to be hard.
> 
> Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.

It turns out the Norway and Netherlands servers were resolving different loadbalancers. The ones I reached in Norway did not support SACKs, while the ones in the Netherlands did. Going directly to a SACK-enabled IP fixes the problem.

This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s probably some minor change, which gets amplified by the lack of SACKs on the loadbalancer. Anyway, I’ll bring it up with Amazon.

Many thanks for your help, everyone.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 21:04                     ` Erik Grinaker
@ 2015-01-06 22:00                       ` Yuchung Cheng
  2015-01-07 13:31                         ` Erik Grinaker
  2015-01-07  1:23                       ` Lukas Tribus
  1 sibling, 1 reply; 26+ messages in thread
From: Yuchung Cheng @ 2015-01-06 22:00 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>
>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>
>>>
>>> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
>>>
>>>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>>>>
>>>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>>>>
>>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>>>>
>>>
>>> Although sender seems to be reluctant to retransmit, this 'failure' is
>>> caused by receiver closing the connection too soon.
>>>
>>> Are you sure you do not ask curl to setup a very small completion
>>> timer ?
>>
>> For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).
>>
>>> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
>>> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
>>> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
>>> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
>>> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
>>> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
>>> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
>>> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
>>> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
>>> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
>>> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
>>> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
>>> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
>>> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
>>>
>>> -> Here receiver sends a FIN, because application closed the socket (or died)
>>> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
>>> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
>>> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
>>>
>>> Anyway, getting decent speed without SACK is going to be hard.
>>
>> Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.
>
> It turns out the Norway and Netherlands servers were resolving different loadbalancers. The ones I reached in Norway did not support SACKs, while the ones in the Netherlands did. Going directly to a SACK-enabled IP fixes the problem.
>
> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
probably some minor change, which gets amplified by the lack of SACKs
on the loadbalancer. Anyway, I’ll bring it up with Amazon.
can you post traces with the older kernels?

>
> Many thanks for your help, everyone.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP connection issues against Amazon S3
  2015-01-06 21:04                     ` Erik Grinaker
  2015-01-06 22:00                       ` Yuchung Cheng
@ 2015-01-07  1:23                       ` Lukas Tribus
  2015-01-07 13:06                         ` Erik Grinaker
  1 sibling, 1 reply; 26+ messages in thread
From: Lukas Tribus @ 2015-01-07  1:23 UTC (permalink / raw)
  To: Erik Grinaker, Eric Dumazet
  Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

> This still doesn’t explain why it works with older kernels, but not newer ones.

Can you try the different 3.12-rc kernels? The information that this was
introduced in 3.12-rc1 as opposed to a specific -rc>1 releases may help
the guys here to pinpoint what exactly caused the behavior change on the
receiver side.

v3.12-rc1 to -rc7 is available as prebuild package on the ubuntu mainline kernel
archive [1] aswell.


-Lukas


[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/

 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07  1:23                       ` Lukas Tribus
@ 2015-01-07 13:06                         ` Erik Grinaker
  0 siblings, 0 replies; 26+ messages in thread
From: Erik Grinaker @ 2015-01-07 13:06 UTC (permalink / raw)
  To: Lukas Tribus
  Cc: Eric Dumazet, Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On 07 Jan 2015, at 01:23, Lukas Tribus <luky-37@hotmail.com> wrote:
>> This still doesn’t explain why it works with older kernels, but not newer ones.
> 
> Can you try the different 3.12-rc kernels? The information that this was
> introduced in 3.12-rc1 as opposed to a specific -rc>1 releases may help
> the guys here to pinpoint what exactly caused the behavior change on the
> receiver side.

I can reproduce the problem with 3.12.0-rc1 as well.

I also tried 3.11.10 again, to make sure it did not have the problem even when forcing requests to a non-SACK-enabled loadbalancer, and it works as it should (no hung connections).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-06 22:00                       ` Yuchung Cheng
@ 2015-01-07 13:31                         ` Erik Grinaker
  2015-01-07 15:58                           ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-07 13:31 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>> 
>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
> probably some minor change, which gets amplified by the lack of SACKs
> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
> can you post traces with the older kernels?

Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:

http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2

The transfer shows lots of DUPACKs and retransmits, but this does not seem to have as bad an effect as it did with the failing transfer we saw on newer kernels:

http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2

One big difference, which Rick touched on earlier, is that the newer kernels keep sending TCP window updates as it’s going through the retransmits. The older kernel does not do this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07 13:31                         ` Erik Grinaker
@ 2015-01-07 15:58                           ` Eric Dumazet
  2015-01-07 20:37                             ` Erik Grinaker
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-07 15:58 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
> > On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
> >> 
> >>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
> >> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
> > probably some minor change, which gets amplified by the lack of SACKs
> > on the loadbalancer. Anyway, I’ll bring it up with Amazon.
> > can you post traces with the older kernels?
> 
> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
> 
> The transfer shows lots of DUPACKs and retransmits, but this does not
> seem to have as bad an effect as it did with the failing transfer we
> saw on newer kernels:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
> 
> One big difference, which Rick touched on earlier, is that the newer
> kernels keep sending TCP window updates as it’s going through the
> retransmits. The older kernel does not do this.

The new kernel is the receiver : It does no retransmits.

Increasing window in ACK packets should not prevent sender into
retransmitting missing packets.

Sender is not a linux host and is very buggy IMO : If receiver
advertises a too big window, sender decides to not retransmit in some
cases.

You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
to work around the sender bug.

( Or use SO_RCVBUF in receiver application)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07 15:58                           ` Eric Dumazet
@ 2015-01-07 20:37                             ` Erik Grinaker
  2015-01-07 21:33                               ` Eric Dumazet
  2015-01-07 21:33                               ` Yuchung Cheng
  0 siblings, 2 replies; 26+ messages in thread
From: Erik Grinaker @ 2015-01-07 20:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>>>> 
>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
>>> probably some minor change, which gets amplified by the lack of SACKs
>>> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
>>> can you post traces with the older kernels?
>> 
>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>> 
>> The transfer shows lots of DUPACKs and retransmits, but this does not
>> seem to have as bad an effect as it did with the failing transfer we
>> saw on newer kernels:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>> 
>> One big difference, which Rick touched on earlier, is that the newer
>> kernels keep sending TCP window updates as it’s going through the
>> retransmits. The older kernel does not do this.
> 
> The new kernel is the receiver : It does no retransmits.
> 
> Increasing window in ACK packets should not prevent sender into
> retransmitting missing packets.
> 
> Sender is not a linux host and is very buggy IMO : If receiver
> advertises a too big window, sender decides to not retransmit in some
> cases.

I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.

That said, since this change breaks a service as popular as S3, it might be worth reconsidering.

> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
> to work around the sender bug.
> 
> ( Or use SO_RCVBUF in receiver application)

Thanks, setting SO_RCVBUF seems like a reasonable workaround.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07 20:37                             ` Erik Grinaker
@ 2015-01-07 21:33                               ` Eric Dumazet
  2015-01-08 17:47                                 ` Erik Grinaker
  2015-01-07 21:33                               ` Yuchung Cheng
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On Wed, 2015-01-07 at 20:37 +0000, Erik Grinaker wrote:

> I agree. I have contacted Amazon about this, but am not too hopeful
> for a quick fix; they have been promising SACK-support on their
> loadbalancers since 2006, for example.
> 
> That said, since this change breaks a service as popular as S3, it
> might be worth reconsidering.
> 

Which change are you talking about ? Have you done a bisection to
clearly identify the patch exposing this sender bug ?

We are not going to stick TCP stack to 20th century and buggy peers or
middleboxes, sorry.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07 20:37                             ` Erik Grinaker
  2015-01-07 21:33                               ` Eric Dumazet
@ 2015-01-07 21:33                               ` Yuchung Cheng
  1 sibling, 0 replies; 26+ messages in thread
From: Yuchung Cheng @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On Wed, Jan 7, 2015 at 12:37 PM, Erik Grinaker <erik@bengler.no> wrote:
> On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
>>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>>>>>
>>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>>>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
>>>> probably some minor change, which gets amplified by the lack of SACKs
>>>> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
>>>> can you post traces with the older kernels?
>>>
>>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>>>
>>> The transfer shows lots of DUPACKs and retransmits, but this does not
>>> seem to have as bad an effect as it did with the failing transfer we
>>> saw on newer kernels:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>
>>> One big difference, which Rick touched on earlier, is that the newer
>>> kernels keep sending TCP window updates as it’s going through the
>>> retransmits. The older kernel does not do this.
>>
>> The new kernel is the receiver : It does no retransmits.
>>
>> Increasing window in ACK packets should not prevent sender into
>> retransmitting missing packets.
>>
>> Sender is not a linux host and is very buggy IMO : If receiver
>> advertises a too big window, sender decides to not retransmit in some
>> cases.
>
> I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.
>
> That said, since this change breaks a service as popular as S3, it might be worth reconsidering.
With the newer kernel and bigger receive window, the sender skips (the
already slow NewReno) fast recovery and falls back to (exp backoff)
timeout recovery. Reducing rwin to accommodate the sender's bug seems
backward to me.


>
>> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
>> to work around the sender bug.
>>
>> ( Or use SO_RCVBUF in receiver application)
>
> Thanks, setting SO_RCVBUF seems like a reasonable workaround.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-07 21:33                               ` Eric Dumazet
@ 2015-01-08 17:47                                 ` Erik Grinaker
  2015-01-08 18:15                                   ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Erik Grinaker @ 2015-01-08 17:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On 07 Jan 2015, at 21:33, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-01-07 at 20:37 +0000, Erik Grinaker wrote:
> I agree. I have contacted Amazon about this, but am not too hopeful
>> for a quick fix; they have been promising SACK-support on their
>> loadbalancers since 2006, for example.
>> 
>> That said, since this change breaks a service as popular as S3, it
>> might be worth reconsidering.
> 
> Which change are you talking about ? Have you done a bisection to
> clearly identify the patch exposing this sender bug ?

FWIW, I've done a bisection, and it’s triggered by this change:

https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f

> We are not going to stick TCP stack to 20th century and buggy peers or
> middleboxes, sorry.

That’s fair enough.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-08 17:47                                 ` Erik Grinaker
@ 2015-01-08 18:15                                   ` Eric Dumazet
  2015-01-08 18:52                                     ` Rick Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2015-01-08 18:15 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

On Thu, 2015-01-08 at 17:47 +0000, Erik Grinaker wrote:

> FWIW, I've done a bisection, and it’s triggered by this change:
> 
> https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f


This totally makes sense, thanks for doing the bisection !

> 
> > We are not going to stick TCP stack to 20th century and buggy peers or
> > middleboxes, sorry.
> 
> That’s fair enough.

Strange thing is that sender does not misbehave at the beginning when
receiver window is still small. Only after a while.

It would be nice to know more details about sender OS/version.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP connection issues against Amazon S3
  2015-01-08 18:15                                   ` Eric Dumazet
@ 2015-01-08 18:52                                     ` Rick Jones
  0 siblings, 0 replies; 26+ messages in thread
From: Rick Jones @ 2015-01-08 18:52 UTC (permalink / raw)
  To: Eric Dumazet, Erik Grinaker
  Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev

> Strange thing is that sender does not misbehave at the beginning when
> receiver window is still small. Only after a while.

Just guessing, but when the receiver window is small, the sender cannot 
get a large quantity of data out there at once, so any string of lost 
packets will tend to be smaller.  If the sender is relying on the RTO to 
trigger the retransmits, and is not resetting his RTO until the clean 
ACK of a segment sent after snd_nxt when the loss is detected, the 
smaller loss strings will not get to the rather large RTO values seen in 
the trace before curl gives-up.  It may be that the sender is indeed 
misbehaving at the beginning, just that it isn't noticeable?

Different but perhaps related observation/question - without timestamps 
(which we don't have in this case), isn't there a certain ambiguity 
about arriving out-of-order segments? One doesn't really know if they 
are out-of-order because the network is re-ordering, or because they are 
retransmissions of segments we've not yet seen at the receiver.

rick

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-01-08 18:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5DCDADEF-FF9C-4844-8A2C-62E2D3B3B8CE@bengler.no>
2015-01-06 16:04 ` TCP connection issues against Amazon S3 Eric Dumazet
2015-01-06 16:11   ` Erik Grinaker
2015-01-06 17:20     ` Eric Dumazet
2015-01-06 18:17       ` Erik Grinaker
2015-01-06 18:32         ` Eric Dumazet
2015-01-06 18:33         ` Yuchung Cheng
2015-01-06 19:01           ` Erik Grinaker
2015-01-06 19:18             ` Yuchung Cheng
2015-01-06 19:42               ` Erik Grinaker
2015-01-06 20:13                 ` Eric Dumazet
2015-01-06 20:26                   ` Erik Grinaker
2015-01-06 21:04                     ` Erik Grinaker
2015-01-06 22:00                       ` Yuchung Cheng
2015-01-07 13:31                         ` Erik Grinaker
2015-01-07 15:58                           ` Eric Dumazet
2015-01-07 20:37                             ` Erik Grinaker
2015-01-07 21:33                               ` Eric Dumazet
2015-01-08 17:47                                 ` Erik Grinaker
2015-01-08 18:15                                   ` Eric Dumazet
2015-01-08 18:52                                     ` Rick Jones
2015-01-07 21:33                               ` Yuchung Cheng
2015-01-07  1:23                       ` Lukas Tribus
2015-01-07 13:06                         ` Erik Grinaker
2015-01-06 19:16           ` Rick Jones
2015-01-06 19:48             ` Rick Jones
2015-01-06 19:50             ` Erik Grinaker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).