From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Jarosch Subject: Re: Re: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang" Date: Mon, 19 Jan 2015 17:49:58 +0100 Message-ID: <6534595.xliyeOG6D7@storm> References: <1719052.SGOfRAJhfQ@storm> <8088599.PZmG8U31O2@storm> <1421335532.11734.73.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7Bit Cc: 'Linux Netdev List' , Eric Dumazet , Jeff Kirsher , e1000-devel To: Eric Dumazet Return-path: Received: from rs04.intra2net.com ([85.214.66.2]:40592 "EHLO rs04.intra2net.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751596AbbASQuC (ORCPT ); Mon, 19 Jan 2015 11:50:02 -0500 In-Reply-To: <1421335532.11734.73.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thursday, 15. January 2015 07:25:32 Eric Dumazet wrote: > On Thu, 2015-01-15 at 15:58 +0100, Thomas Jarosch wrote: > > A colleague mentioned to me he saw the "Hardware Unit Hang" message > > every > > few days even running on kernel 3.4 (without your patch). Basically I'm > > testing now if that's still the case with 3.19-rc4+ or not. > > > > I'm all for fixing the root cause. I'm just interested if the e1000e > > hang can even be triggered when using a max frag page size of 4096. > > So far it transferred 751.6 GiB without a hiccup. > > You told it was forwarding setup. > > 1) What is the NIC receiving traffic. > 2) What happens if you disable GRO on it ? one more interesting thing happened: On one production machine, again an Intel DH61CR board, the issue was triggered even with TSO disabled. My colleague tried to disable GRO + GSO on the e1000e adapter, too, though not on the other interfaces. It's strange the issue appears with TSO disabled, that worked for three other production level machines. We've emergency-installed the "4096" max frag page size workaround for now as fifty people were a bit unhappy without network access... :D Cheers, Thomas