From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f170.google.com ([209.85.128.170]:33467 "EHLO mail-wr0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753185AbeBSSGE (ORCPT ); Mon, 19 Feb 2018 13:06:04 -0500 Received: by mail-wr0-f170.google.com with SMTP id s5so10574674wra.0 for ; Mon, 19 Feb 2018 10:06:03 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <33316bd2-da03-0dbb-bd41-4ff44eb81402@del.bg> References: <20170407184205.168217-1-ycheng@google.com> <33316bd2-da03-0dbb-bd41-4ff44eb81402@del.bg> From: Neal Cardwell Date: Mon, 19 Feb 2018 13:05:41 -0500 Message-ID: Subject: Re: [PATCH net] tcp: restrict F-RTO to work-around broken middle-boxes To: Teodor Milkov Cc: Netdev , Yuchung Cheng Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org List-ID: On Mon, Feb 19, 2018 at 11:17 AM, Teodor Milkov wrote: > On 19.02.2018 15:38, Neal Cardwell wrote: >> >> On Sun, Feb 18, 2018 at 4:02 PM, Teodor Milkov wrote: >>> >>> Hello, >>> >>> I've numerous reports from Windows users that after kernel upgrade from >>> 4.9 >>> to 4.14 they experienced major slow downs and transfer stalls. >>> >>> After some digging, I found that the slowness starts with this commit: >>> >>> tcp: extend F-RTO to catch more spurious timeouts (89fe18e44) >>> >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89fe18e44f7ee5ab1c90d0dff5835acee7751427 >>> >>> Which is partially reverted later with this one: >>> >>> tcp: restrict F-RTO to work-around broken middle-boxes (cc663f4d4) >>> >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc663f4d4c97b7297fb45135ab23cfd508b35a77 >>> >>> But, still, we had stalls until I fully reverted 89fe18e44. >> >> Thanks for the report. Do you have any other details that might help >> evaluate this issue? > > > I'm sorry I didn't provide more info. It was long session. > >> Any packet traces, by any chance? > > > I'll try and obtain one. Great! Yes, if you could obtain a sender-side tcpdump that captured one of these slow-down/stalls, that would be fantastic. It would be great to be able to understand what's going on. We only need headers, so something like the following would be fine: tcpdump -c2000000 -w /tmp/test.pcap -s 120 -i $ETH_DEVICE port $PORT Then if you could post on a web server or Google drive, etc, that'd be great. Thanks for all the additional details! neal