From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Ibanez Subject: Re: Linux ECN Handling Date: Tue, 5 Dec 2017 11:36:44 -0800 Message-ID: References: <20171019124312.GE16796@breakpoint.cc> <5A006CF6.1020608@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Eric Dumazet , Yuchung Cheng , Daniel Borkmann , Netdev , Florian Westphal , Mohammad Alizadeh , Lawrence Brakmo To: Neal Cardwell Return-path: Received: from mx0b-00000d04.pphosted.com ([148.163.153.235]:33156 "EHLO mx0b-00000d04.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751161AbdLETgs (ORCPT ); Tue, 5 Dec 2017 14:36:48 -0500 Received: from pps.filterd (m0102893.ppops.net [127.0.0.1]) by mx0a-00000d04.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vB5JZNPX028120 for ; Tue, 5 Dec 2017 11:36:48 -0800 Received: from mx0a-00000d03.pphosted.com (mx0a-00000d03.pphosted.com [148.163.149.244]) by mx0a-00000d04.pphosted.com with ESMTP id 2enx3fherp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 05 Dec 2017 11:36:47 -0800 Received: from pps.filterd (m0102881.ppops.net [127.0.0.1]) by mx0a-00000d03.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vB5JYlSR024123 for ; Tue, 5 Dec 2017 11:36:46 -0800 Received: from codegreen8.stanford.edu (codegreen8.stanford.edu [171.67.224.10]) by mx0a-00000d03.pphosted.com with ESMTP id 2ep0w0rv7u-1 (version=TLSv1 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 05 Dec 2017 11:36:46 -0800 Received: from codegreen8.stanford.edu (localhost.localdomain [127.0.0.1]) by codegreen8.stanford.edu (Postfix) with ESMTP id 4568653 for ; Tue, 5 Dec 2017 11:36:46 -0800 (PST) Received: from smtp.stanford.edu (smtp1.stanford.edu [171.67.219.81]) by codegreen8.stanford.edu (Postfix) with ESMTP id 2E47153 for ; Tue, 5 Dec 2017 11:36:46 -0800 (PST) Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sibanez) by smtp.stanford.edu (Postfix) with ESMTPSA id 1C159D6F1F for ; Tue, 5 Dec 2017 11:36:46 -0800 (PST) Received: by mail-pf0-f182.google.com with SMTP id c204so895079pfc.13 for ; Tue, 05 Dec 2017 11:36:46 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Hi Neal, I've included a link to small trace of 13 packets which is different from the screenshot I attached in my last email, but shows the same sequence of events. It's a bit hard to read the tcptrace due to the 300ms timeout, so I figured this was the best approach. slice.pcap: https://drive.google.com/open?id=1hYXbUClHGbQv1hWG1HZWDO2WYf30N6G8 Thanks for the help! -Steve On Tue, Dec 5, 2017 at 7:23 AM, Neal Cardwell wrote: > On Tue, Dec 5, 2017 at 12:22 AM, Steve Ibanez wrote: >> Hi Neal, >> >> Happy to help out :) And thanks for the tip! >> >> I was able to track down where the missing bytes that you pointed out >> are being lost. It turns out the destination host seems to be >> misbehaving. I performed a packet capture at the destination host >> interface (a snapshot of the trace is attached). I see the following >> sequence of events when a timeout occurs (note that I have NIC >> offloading enabled so wireshark captures packets larger than the MTU): >> >> 1. The destination receives a data packet of length X with seqNo = Y >> from the src with the CWR bit set and does not send back a >> corresponding ACK. >> 2. The source times out and sends a retransmission packet of length Z >> (where Z < X) with seqNo = Y >> 3. The destination sends back an ACK with AckNo = Y + X >> >> So in other words, the packet which the destination host does not >> initially ACK (causing the timeout) does not actually get lost because >> after receiving the retransmission the AckNo moves forward all the way >> past the bytes in the initial unACKed CWR packet. In the attached >> screenshot, I've marked the unACKed CWR packet with a red box. >> >> Have you seen this behavior before? And do you know what might be >> causing the destination host not to ACK the CWR packet? In most cases >> the CWR marked packets are ACKed properly, it's just occasionally they >> are not. > > Thanks for the detailed report! > > I have not heard of an incoming CWR causing the receiver to fail to > ACK. And in re-reading the code, I don't see an obvious way in which a > CWR bit should cause the receiver to fail to ACK. > > That screen shot is a bit hard to parse. Would you be able to post a > tcpdump .pcap of that particular section, or post a screen shot of a > time-sequence plot of that section? > > To extract that segment and take screen shot, you could use something like: > > editcap -A "2017-12-04 11:22:27" -B "2017-12-04 11:22:30" all.pcap > slice.pcap > tcptrace -S -xy -zy slice.pcap > xplot.org a2b_tsg.xpl & > # take screenshot > > Or, alternatively, would you be able to post the slice.pcap on a web > server or public drive? > > thanks, > neal