From mboxrd@z Thu Jan 1 00:00:00 1970 From: Heiner Kallweit Subject: Re: [PATCH net] r8169: fix NAPI handling under high load Date: Thu, 18 Oct 2018 08:03:32 +0200 Message-ID: <8beda4fa-5d04-49e6-eb3e-5656897a301f@gmail.com> References: <8f84fe39-3d8d-396d-3b97-027e0a83f8cb@gmail.com> <62974f0f-1938-3635-69d4-204ed8c587b3@gmail.com> <20181017233051.GB8478@electric-eye.fr.zoreil.com> <20181018055835.GE2487@marvin.atrad.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: Francois Romieu , =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= , David Miller , Realtek linux nic maintainers , "netdev@vger.kernel.org" To: Jonathan Woithe Return-path: Received: from mail-wr1-f68.google.com ([209.85.221.68]:38153 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727483AbeJROC7 (ORCPT ); Thu, 18 Oct 2018 10:02:59 -0400 Received: by mail-wr1-f68.google.com with SMTP id a13-v6so32190533wrt.5 for ; Wed, 17 Oct 2018 23:03:38 -0700 (PDT) In-Reply-To: <20181018055835.GE2487@marvin.atrad.com.au> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 18.10.2018 07:58, Jonathan Woithe wrote: > On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote: >> Holger Hoffstätte : >> [...] >>> I continued to use the BQL patch in my private tree after it was reverted >>> and also had occasional timeouts, but *only* after I started playing >>> with ethtool to change offload settings. Without offloads or the BQL patch >>> everything has been rock-solid since then. >>> The other weird problem was that timeouts would occur on an otherwise >>> *completely idle* system. Since that occasionally borked my NFS server >>> over night I ultimately removed BQL as well. Rock-solid since then. >> >> The bug will induce delayed rx processing when a spike of "load" is >> followed by an idle period. > > If this is the case, I wonder whether this bug might also be the cause of > the long reception delays we've observed at times when a period of high > network load is followed by almost nothing[1]. That thread[2] details the > investigations subsequently done. A git bisect showed that commit > da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour > we were observing. > > We still see the problem when we test with recent kernels. It would be > great if the underlying problem has now been identified. > > I can possibly scrape some hardware together to test any proposed fix under > our workload if there was interest. > Proposed fix is here: https://patchwork.ozlabs.org/patch/985014/ Would be good if you could test it. Thanks! Heiner > Regards > jonathan > > [1] https://marc.info/?l=linux-netdev&m=136281333207734&w=2 > [2] https://marc.info/?t=136281339500002&r=1&w=2 >