From mboxrd@z Thu Jan 1 00:00:00 1970 From: linas@austin.ibm.com (Linas Vepstas) Subject: Re: [PATCH] ixgb: add PCI Error recovery callbacks Date: Thu, 6 Jul 2006 11:16:40 -0500 Message-ID: <20060706161640.GT29526@austin.ibm.com> References: <20060629162634.GC5472@austin.ibm.com> <1151905766.28493.129.camel@ymzhang-perf.sh.intel.com> <44ABDF87.8000801@intel.com> <20060705194437.GJ29526@austin.ibm.com> <1152148899.28493.168.camel@ymzhang-perf.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Auke Kok , Jesse Brandeburg , "Ronciak, John" , "bibo,mao" , Rajesh Shah , Grant Grundler , akpm@osdl.org, LKML , linux-pci maillist , netdev@vger.kernel.org, wenxiong@us.ibm.com Return-path: Received: from e5.ny.us.ibm.com ([32.97.182.145]:58814 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S1030274AbWGFQQs (ORCPT ); Thu, 6 Jul 2006 12:16:48 -0400 To: "Zhang, Yanmin" Content-Disposition: inline In-Reply-To: <1152148899.28493.168.camel@ymzhang-perf.sh.intel.com> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Thu, Jul 06, 2006 at 09:21:39AM +0800, Zhang, Yanmin wrote: > On Thu, 2006-07-06 at 03:44, Linas Vepstas wrote: > > On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote: > > > Zhang, Yanmin wrote: > > > >On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote: > > > >>Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet > > > >>ixgb device driver. Lightly tested, works. > > > > > > > >Both pci_disable_device and ixgb_down would access the device. It doesn't > > > >follow Documentation/pci-error-recovery.txt that error_detected shouldn't > > > >do > > > >any access to the device. > > > > > > Moreover, it was Linas who wrote this documentation in the first place :) > > > > On the pSeries, its harmless to try to do i/o; the i/o will e blocked. > In the future, we might move the pci error recovery codes to generic to > support other platforms which might not block I/O. So it's better to follow > Documentation/pci-error-recovery.txt when adding error recovery codes into driver. Or we could change the documentation. The point was that doing unexpected i/o after the aapter reset is likely to wedge the adapter again, leading to an inf loop of resets. As a practical matter, I found that, while developing this patch, and the other related patches, that this was indeed the usual failure mode: incorrect bringup just lead to more errors. What I really want to do is to perform as clean a shut-down as possible, reset the adapter, and then bring it back up. I'm concerned that changing the order to "reset"-"shutdown-"bringup" would be inappropriate. Perhaps the right fix is to figure out what parts of the driver do i/o during shutdown, and then add a line "if(wedged) skip i/o;" to those places? --linas