From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Rompf Subject: Re: Deadlock in sungem/ip_auto_config/linkwatch Date: Mon, 5 Jan 2004 15:50:50 +0100 Sender: netdev-bounce@oss.sgi.com Message-ID: <200401051550.51063.srompf@isg.de> References: <1073307882.2041.98320.camel@brick.watson.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Return-path: To: Michal Ostrowski , netdev@oss.sgi.com In-Reply-To: <1073307882.2041.98320.camel@brick.watson.ibm.com> Content-Disposition: inline Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Am Montag, 05. Januar 2004 14:07 schrieb Michal Ostrowski: > ic_open_devs grabs rtnl_sem with an rtnl_shlock() call. > > The sungem driver at some point calls gem_init_one, which calls > netif_carrier_*, which in turn calls schedule_work (linkwatch_event). > > linkwatch_event in turn needs rtnl_sem. Good catch! The sungem driver shows clearly that we need some way to remove queued work without scheduling and waiting for other events. I will change the linkwatch code to use rtnl_shlock_nowait() and backoff and retry in case of failure this week. Call it a workaround, but it increases overall system stability. Btw, what is the planned difference between rtnl_shlock() and rtnl_exlock()? Even though the later is a null operation right now, I don't want to hold more locks than needed in the linkwatch code. Stefan -- "doesn't work" is not a magic word to explain everything.