From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lawrence Brakmo Subject: Re: [PATCH net-next v2] tcp: force cwnd at least 2 in tcp_cwnd_reduction Date: Fri, 29 Jun 2018 04:32:20 +0000 Message-ID: References: <20180627023403.3395818-1-brakmo@fb.com> <9745C5DD-BB86-4A7D-B607-4B8AA0E07245@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-7" Content-Transfer-Encoding: quoted-printable Cc: Yuchung Cheng , Matt Mathis , Netdev , Kernel Team , "Blake Matheny" , Alexei Starovoitov , Eric Dumazet , Wei Wang , Steve Ibanez , Yousuk Seung To: Neal Cardwell Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:56780 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752311AbeF2Ecc (ORCPT ); Fri, 29 Jun 2018 00:32:32 -0400 In-Reply-To: Content-Language: en-US Content-ID: <3ED63FD7E879FC4CBAFC6B09C02BA2FD@namprd15.prod.outlook.com> Sender: netdev-owner@vger.kernel.org List-ID: On 6/28/18, 1:48 PM, +ACI-netdev-owner+AEA-vger.kernel.org on behalf of Nea= l Cardwell+ACI- +ADw-netdev-owner+AEA-vger.kernel.org on behalf of ncardwel= l+AEA-google.com+AD4- wrote: On Thu, Jun 28, 2018 at 4:20 PM Lawrence Brakmo +ADw-brakmo+AEA-fb.com+= AD4- wrote: +AD4- +AD4- I just looked at 4.18 traces and the behavior is as follows: +AD4- +AD4- Host A sends the last packets of the request +AD4- +AD4- Host B receives them, and the last packet is marked with conge= stion (CE) +AD4- +AD4- Host B sends ACKs for packets not marked with congestion +AD4- +AD4- Host B sends data packet with reply and ACK for packet marked = with congestion (TCP flag ECE) +AD4- +AD4- Host A receives ACKs with no ECE flag +AD4- +AD4- Host A receives data packet with ACK for the last packet of re= quest and has TCP ECE bit set +AD4- +AD4- Host A sends 1st data packet of the next request with TCP flag= CWR +AD4- +AD4- Host B receives the packet (as seen in tcpdump at B), no CE fl= ag +AD4- +AD4- Host B sends a dup ACK that also has the TCP ECE flag +AD4- +AD4- Host A RTO timer fires+ACE- +AD4- +AD4- Host A to send the next packet +AD4- +AD4- Host A receives an ACK for everything it has sent (i.e. Host B= did receive 1st packet of request) +AD4- +AD4- Host A send more packets+ICY- =20 Thanks, Larry+ACE- This is very interesting. I don't know the cause, bu= t this reminds me of an issue Steve Ibanez raised on the netdev list last December, where he was seeing cases with DCTCP where a CWR packet would be received and buffered by Host B but not ACKed by Host B. This was the thread +ACI-Re: Linux ECN Handling+ACI-, starting around Decemb= er 5. I have cc-ed Steve. =20 I wonder if this may somehow be related to the DCTCP logic to rewind tp-+AD4-rcv+AF8-nxt and call tcp+AF8-send+AF8-ack(), and then restore t= p-+AD4-rcv+AF8-nxt, if DCTCP notices that the incoming CE bits have been changed while the receiver thinks it is holding on to a delayed ACK (in dctcp+AF8-ce+AF8-state+AF8-0+AF8-to+AF8-1() and dctcp+AF8-ce+AF8-state+= AF8-1+AF8-to+AF8-0()). I wonder if the +ACI-synthetic+ACI- call to tcp+AF8-send+AF8-ack() somehow has side eff= ects in the delayed ACK state machine that can cause the connection to forget that it still needs to fire a delayed ACK, even though it just sent an ACK just now. =20 neal Here is a packetdrill script that reproduces the problem: // Repro bug that does not ack data, not even with delayed-ack 0.000 socket(..., SOCK+AF8-STREAM, IPPROTO+AF8-TCP) +AD0- 3 0.000 setsockopt(3, SOL+AF8-SOCKET, SO+AF8-REUSEADDR, +AFs-1+AF0-, 4) +AD0-= 0 0.000 setsockopt(3, SOL+AF8-TCP, TCP+AF8-CONGESTION, +ACI-dctcp+ACI-, 5) +A= D0- 0 0.000 bind(3, ..., ...) +AD0- 0 0.000 listen(3, 1) +AD0- 0 0.100 +ADw- +AFs-ect0+AF0- SEW 0:0(0) win 32792 +ADw-mss 1000,sackOK,nop,no= p,nop,wscale 7+AD4- 0.100 +AD4- SE. 0:0(0) ack 1 +ADw-mss 1460,nop,nop,sackOK,nop,wscale 5+AD4- 0.110 +ADw- +AFs-ect0+AF0- . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) +AD0- 4 0.200 +ADw- +AFs-ect0+AF0- . 1:1001(1000) ack 1 win 257 0.200 +AD4- +AFs-ect0+AF0- . 1:1(0) ack 1001 0.200 write(4, ..., 1) +AD0- 1 0.200 +AD4- +AFs-ect0+AF0- P. 1:2(1) ack 1001 0.200 +ADw- +AFs-ect0+AF0- . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) +AD0- 1 0.200 +AD4- +AFs-ect0+AF0- P. 2:3(1) ack 2001 0.200 +ADw- +AFs-ect0+AF0- . 2001:3001(1000) ack 3 win 257 0.200 +ADw- +AFs-ect0+AF0- . 3001:4001(1000) ack 3 win 257 0.200 +AD4- +AFs-ect0+AF0- . 3:3(0) ack 4001 0.210 +ADw- +AFs-ce+AF0- P. 4001:4501(500) ack 3 win 257 +-0.001 read(4, ..., 4500) +AD0- 4500 +-0 write(4, ..., 1) +AD0- 1 +-0 +AD4- +AFs-ect0+AF0- PE. 3:4(1) ack 4501 +-0.010 +ADw- +AFs-ect0+AF0- W. 4501:5501(1000) ack 4 win 257 +-0 +AD4- +AFs-ect0+AF0- E. 4:4(0) ack 4501 // dup ack sent +-0.311 +ADw- +AFs-ect0+AF0- . 5501:6501(1000) ack 4 win 257 // Long RTO +-0 +AD4- +AFs-ect0+AF0- . 4:4(0) ack 6501 // now acks everything +-0.500 +ADw- F. 9501:9501(0) ack 4 win 257