From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Thery Subject: Re: [PATCH] net: deadlock during net device unregistration Date: Mon, 06 Oct 2008 17:19:13 +0200 Message-ID: <48EA2C71.6070509@bull.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: davem@davemloft.net, netdev@vger.kernel.org, dlezcano@fr.ibm.com To: Herbert Xu Return-path: Received: from ecfrec.frec.bull.fr ([129.183.4.8]:45540 "EHLO ecfrec.frec.bull.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751117AbYJFPT1 (ORCPT ); Mon, 6 Oct 2008 11:19:27 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Herbert Xu wrote: > Benjamin Thery wrote: >> 1. Unregister a device, the following routines are called: >> >> -> unregister_netdev >> -> rtnl_lock >> -> unregister_netdevice >> -> rtnl_unlock >> -> netdev_run_todo >> -> netdev_wait_allrefs > > OK, this explains lots of dead-locks that people have been seeing. > > But I think we can go a step further: > > net: Fix netdev_run_todo dead-lock > > Benjamin Thery tracked down a bug that explains many instances > of the error > > unregister_netdevice: waiting for %s to become free. Usage count = %d > > It turns out that netdev_run_todo can dead-lock with itself if > a second instance of it is run in a thread that will then free > a reference to the device waited on by the first instance. > > The problem is really quite silly. We were trying to create > parallelism where none was required. As netdev_run_todo always > follows a RTNL section, and that todo tasks can only be added > with the RTNL held, by definition you should only need to wait > for the very ones that you've added and be done with it. > > There is no need for a second mutex or spinlock. > > This is exactly what the following patch does. Herbert, thank you for having looked at the issue too. When I understood how the dead lock happened, I considered playing with the locks in net_set_todo()/netdev_run_todo(), but as some comments in the code in this area sounds a bit too cryptic for my brain, I didn't dare to change them myself. :) I guess you know a lot better than me the history behind this piece of code and why it was done this way. I tested your patch on my testbed during all the afternoon. It fixes the dead lock I can easily reproduce here with net namespaces and I didn't produce any regressions on my setup. Benjamin > > Signed-off-by: Herbert Xu > > diff --git a/net/core/dev.c b/net/core/dev.c > index e8eb2b4..021f531 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net) > } > > /* Delayed registration/unregisteration */ > -static DEFINE_SPINLOCK(net_todo_list_lock); > static LIST_HEAD(net_todo_list); > > static void net_set_todo(struct net_device *dev) > { > - spin_lock(&net_todo_list_lock); > list_add_tail(&dev->todo_list, &net_todo_list); > - spin_unlock(&net_todo_list_lock); > } > > static void rollback_registered(struct net_device *dev) > @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev) > * free_netdev(y1); > * free_netdev(y2); > * > - * We are invoked by rtnl_unlock() after it drops the semaphore. > + * We are invoked by rtnl_unlock(). > * This allows us to deal with problems: > * 1) We can delete sysfs objects which invoke hotplug > * without deadlocking with linkwatch via keventd. > * 2) Since we run with the RTNL semaphore not held, we can sleep > * safely in order to wait for the netdev refcnt to drop to zero. > + * > + * We must not return until all unregister events added during > + * the interval the lock was held have been completed. > */ > -static DEFINE_MUTEX(net_todo_run_mutex); > void netdev_run_todo(void) > { > struct list_head list; > > - /* Need to guard against multiple cpu's getting out of order. */ > - mutex_lock(&net_todo_run_mutex); > - > - /* Not safe to do outside the semaphore. We must not return > - * until all unregister events invoked by the local processor > - * have been completed (either by this todo run, or one on > - * another cpu). > - */ > - if (list_empty(&net_todo_list)) > - goto out; > - > /* Snapshot list, allow later requests */ > - spin_lock(&net_todo_list_lock); > list_replace_init(&net_todo_list, &list); > - spin_unlock(&net_todo_list_lock); > + > + __rtnl_unlock(); > > while (!list_empty(&list)) { > struct net_device *dev > @@ -4200,9 +4188,6 @@ void netdev_run_todo(void) > /* Free network device */ > kobject_put(&dev->dev.kobj); > } > - > -out: > - mutex_unlock(&net_todo_run_mutex); > } > > static struct net_device_stats *internal_stats(struct net_device *dev) > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 71edb8b..d6381c2 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -73,7 +73,7 @@ void __rtnl_unlock(void) > > void rtnl_unlock(void) > { > - mutex_unlock(&rtnl_mutex); > + /* This fellow will unlock it for us. */ > netdev_run_todo(); > } > > Cheers, -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com