From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: netpoll + xmit_lock == deadlock Date: Fri, 31 Jul 2009 14:09:21 -0400 Message-ID: <20090731180921.GA24491@hmsreliant.think-freely.org> References: <20090729073523.GA4515@gondor.apana.org.au> <1248894478.4545.2822.camel@calx> <20090729194300.GB17410@hmsreliant.think-freely.org> <20090729223831.GA14066@gondor.apana.org.au> <20090730010639.GB4169@localhost.localdomain> <20090731013017.GB25895@gondor.apana.org.au> <20090731125654.GB18303@hmsreliant.think-freely.org> <20090731130243.GA31058@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Matt Mackall , "David S. Miller" , netdev@vger.kernel.org, Matt Carlson To: Herbert Xu Return-path: Received: from charlotte.tuxdriver.com ([70.61.120.58]:57936 "EHLO smtp.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752063AbZGaSJo (ORCPT ); Fri, 31 Jul 2009 14:09:44 -0400 Content-Disposition: inline In-Reply-To: <20090731130243.GA31058@gondor.apana.org.au> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Jul 31, 2009 at 09:02:43PM +0800, Herbert Xu wrote: > On Fri, Jul 31, 2009 at 08:56:54AM -0400, Neil Horman wrote: > > > > > tg3_poll => tg3_poll_work => tg3_tx => netif_tx_lock > > > > Oh, goodness, thats just asking for disaster. Setting asside the netpoll issue > > for the moment, what if we take an rx interrupt on a cpu while in the middle of > > sending a frame? Whats to stop the NET_RX_SOFTIRQ after the hard interrupt and > > recursively taking the _xmit_lock? With or without netpoll, that seems prone to > > deadlock. > > No that can't happen because BH is disabled in the xmit function. > > This problem is specific to netpoll because it does things that > normally can't happen with BH off. > Ugh, you're right, my bad. In fact, looking back, we had a similar problem (but not identical) in RHEL, in which the netpoll path was removing the device from the poll_list from a different cpu, leading to a double free. I fixed it by adding a state bit to the napi flags that let helper functions know that we were in a netpoll context, which let us avoid doing stupid things. Perhaps such a solution might be usefull here. Set the flag when calling the poll routine from a netpoll context, so napi_tx_lock, would know to do a trylock and always return success or something of that nature. I'd also be up for simply ripping out netpoll entirely..... ;) Neil > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt >