From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: Re: [PATCH] Prevent netpoll hanging when link is down Date: Wed, 6 Oct 2004 23:49:12 -0700 Sender: netdev-bounce@oss.sgi.com Message-ID: <20041006234912.66bfbdcc.davem@davemloft.net> References: <20041006232544.53615761@jack.colino.net> <20041006214322.GG31237@waste.org> <20041007075319.6b31430d@jack.colino.net> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: mpm@selenic.com, akpm@osdl.org, netdev@oss.sgi.com Return-path: To: Colin Leroy In-Reply-To: <20041007075319.6b31430d@jack.colino.net> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On Thu, 7 Oct 2004 07:53:19 +0200 Colin Leroy wrote: > On 06 Oct 2004 at 16h10, Matt Mackall wrote: > > > On Wed, Oct 06, 2004 at 11:25:44PM +0200, Colin Leroy wrote: > > Well this doesn't look unreasonable, but I haven't run into it with > > the NICs I've tested. Nor have I seen this reported before. Which NICs > > is this with? > > Sungem. I didn't find anything strange in sungem, but it may be... I think this is very strange that only sungem behaves this way. I don't think netpoll is doing anything different than what would happen, f.e., when bringing an interface up using dhcp. That should cause the same kind of hang. The only thing that should make the thing spin is if gp->hw_running is zero. This is set non-zero by gem_open() after resetting the chip to bring it up. If gem_open() fails, and on entry gp->hw_running was zero, the chip will be powered back down and gp->hw_running set back to zero. gem_suspend()/gem_resume() also modify the gp->hw_running state, as appropriate. I could see it that if gp->hw_running is non-zero, we could run into troubles. np->dev->poll_controller() will run, and it won't do anything since the gem_interrupt() call is a nop when gp->hw_running is zero. Then we blindly call ingo np->dev->poll() Folks debugging this should verify that gp->hw_running is non-zero when the problematic case runs.