From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1a60We-0007vc-5p for mharc-grub-devel@gnu.org; Mon, 07 Dec 2015 13:28:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32899) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a60Wb-0007sc-2q for grub-devel@gnu.org; Mon, 07 Dec 2015 13:28:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a60WW-0005WP-O7 for grub-devel@gnu.org; Mon, 07 Dec 2015 13:28:37 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:55303) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a60WW-0005W8-HM for grub-devel@gnu.org; Mon, 07 Dec 2015 13:28:32 -0500 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.15.0.59/8.15.0.59) with SMTP id tB7INkvc024445; Mon, 7 Dec 2015 10:28:29 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=subject : to : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=facebook; bh=jY0f/D8gks9mQdWnxCmsze5GbEcBfJfxyzY9u92ZMiA=; b=GMfNITtRTn17zs5izSXX/tYuVe6WZGeQv/FiSIt1G7IeEu2t8gG56GTbJK1dEEmnqzAX RNh/R9eDl5DsbYzIu4iJ8io3pn2f4YEtfXKBzvD/ZCOWw/SDpMNxRYRJM/U0rlYIeIjQ afPIbs48XuKqlh2+4x/sDzzJGu2yju0ALs4= Received: from mail.thefacebook.com ([199.201.64.23]) by m0089730.ppops.net with ESMTP id 1ykv9ph4gm-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 07 Dec 2015 10:28:29 -0800 Received: from localhost.localdomain (192.168.54.13) by mail.thefacebook.com (192.168.16.11) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 7 Dec 2015 10:28:25 -0800 Subject: Re: [PATCH] tcp: ack when we get an OOO/lost packet To: Andrei Borzenkov , The development of GNU GRUB , References: <1439392582-3172342-1-git-send-email-jbacik@fb.com> <5665C901.3040605@gmail.com> From: Josef Bacik Message-ID: <5665CFC6.80102@fb.com> Date: Mon, 7 Dec 2015 13:28:22 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <5665C901.3040605@gmail.com> Content-Type: text/plain; charset="utf-8"; format=flowed X-Originating-IP: [192.168.54.13] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2015-12-07_11:, , signatures=0 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by m0089730.ppops.net id tB7INkvc024445 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 67.231.153.30 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Dec 2015 18:28:38 -0000 On 12/07/2015 12:59 PM, Andrei Borzenkov wrote: > 12.08.2015 18:16, Josef Bacik =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> While adding tcp window scaling support I was finding that I'd get som= e packet >> loss or reordering when transferring from large distances and grub wou= ld just >> timeout. This is because we weren't ack'ing when we got our OOO packe= t, so the >> sender didn't know it needed to retransmit anything, so eventually it = would fill >> the window and stop transmitting, and we'd time out. Fix this by ACK'= ing when >> we don't find our next sequence numbered packet. With this fix I no l= onger time >> out. Thanks, >> > > Applied. Sorry, it somehow slipped through. > > More ideas in the same direction. > > 1. GRUB timeout for receiving currently is ~33 seconds. It is too small > comparing with anything else. I am pretty sure in situation from tcpdum= p > you sent me we could recover if timeout was in order of several minutes= :) > Yeah I jacked up the receive timeout in one of my iterations and that=20 helped as well. Could probably make it configurable. > 2. We may consider sending ACK in grub_net_tcp_retransmit() > additionally, although it probably needs proper rate-limiting based on = RTT. > > 3. Using timestamp option may improve RTT detection for partner and is > pretty cheap to implement. > I'm trying to get some standard testing set up internally so I can test=20 all of our hardware types whenever I make changes. Once I get that=20 stuff set up I'll look at adding this and some other features. Thanks, Josef