From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael Chan" Subject: Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles Date: Thu, 14 Feb 2008 14:48:09 -0800 Message-ID: <1203029289.13495.38.camel@dell> References: <20080214102425.0fc8e3c1.akpm@linux-foundation.org> <20080214185627.GK856@gospo.usersys.redhat.com> <1203024327.13495.21.camel@dell> <20080214221234.GL856@gospo.usersys.redhat.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: "Andrew Morton" , "Matt Carlson" , bugme-daemon@bugzilla.kernel.org, netdev , ralf.hildebrandt@charite.de To: "Andy Gospodarek" Return-path: Received: from mms2.broadcom.com ([216.31.210.18]:4553 "EHLO mms2.broadcom.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757550AbYBNWqN (ORCPT ); Thu, 14 Feb 2008 17:46:13 -0500 In-Reply-To: <20080214221234.GL856@gospo.usersys.redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2008-02-14 at 17:12 -0500, Andy Gospodarek wrote: > On Thu, Feb 14, 2008 at 01:25:27PM -0800, Michael Chan wrote: > > On Thu, 2008-02-14 at 13:56 -0500, Andy Gospodarek wrote: > > > That should be a simple matter of adding the right pci-ids to > > > tg3_get_invariants -- hopefully Ralf will respond and we can get that > > > knocked out quickly. > > > > > > > > > > It doesn't look like it was re-ordered IO. If it was, it should have > > self-recovered without hitting the BUG(). > > > > Good catch, Michael! I missed that it paniced since I expect to see > some sort of backtrace when that happens. We should try and get that > bridge added to the list though, to avoid repeated complaints that there > is a tg3 bug. > > Andy, I think you still missed my point. I don't believe this problem was caused by the bridge or the chipset at all. Some corruption caused us to not find the SKB in the TX ring where it was expected. So the driver assumed it was the bridge re-ordering I/O and printed that warning message and took recovery action. The recovery action had no effect in this case since apparently it was caused by something else and the corruption happened again later. This 2nd time, we hit the BUG_ON() seeing that the recovery action did not work.