* [PATCH] net: deadlock during net device unregistration [not found] <20080929175412.866679567@theryb.frec.bull.fr> @ 2008-09-29 17:54 ` Benjamin Thery 2008-09-30 6:32 ` Jarek Poplawski ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Benjamin Thery @ 2008-09-29 17:54 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, Daniel Lezcano, Benjamin Thery This patch proposes to replace the rtnl_unlock() call in linkwatch_event() by __rtnl_unlock(). The difference between the two routines being that __rtnl_unlock() will not call netdev_run_todo() after it unlocks rtnl_mutex. This is to fix a "deadlock" we observed when unregistering a net device. In some circumstances, linkwatch_event() blocks the whole "events" workqueue while blocking in rtnl_unlock(). Here is what happens: 1. Unregister a device, the following routines are called: -> unregister_netdev -> rtnl_lock -> unregister_netdevice -> rtnl_unlock -> netdev_run_todo -> netdev_wait_allrefs 2. In netdev_wait_allrefs(), the device's refcount is greater than 0 because there are still some routes to be garbage collected later. 3. Also, some link watch events are pending. netdev_wait_allrefs() will run the linkwatch event queue, calls linkwatch_run_queue(). Both the route garbage collector dst_gc_task() and the linkwatch task linkwatch_event() are queued in the same generic workqueue: "events". 4. linkwatch_event() is enqueued earlier in the queue. It will grab rtnl_lock(), deliver the link watch events pending, and then call rtnl_unlock(). rtnl_unlock() will then call netdev_run_todo() and block on mutex_lock(&net_todo_run_mutex). At this point, the workqueue "events" is _blocked_ until the netdev_wait_allrefs() call above returns when the device refcount reaches 0. Problem: it will never happens if dst_gc_task() was enqueued behind linkwatch_event() in the "events" workqueue as the queue is now blocked. Thread foo Thread "events" ---------- --------------- free IPv6 routes | -> schedule_delayed_work(dst_gc_task) ... | unregister_netdev | -> rtln_unlock | -> netdev_run_todo | ... -> grab net_todo_run_mutex | -> netdev_wait_allrefs (loop) | -> run__workqueue() -> linkwatch_run_queue() | .... enqueue linkwatch_event | in "events" | | -> run_workqueue() "BLOCKED" | -> linkwatch_event() | -> rtnl_lock() | -> __linkwatch_run_queue | ->rtnl_unlock() | -> netdev_run_todo | -> try to grab net_todo_run_mutex BLOCKED Deadlock: # BLOCKED waiting for dst_gc_task() # BLOCKED waiting for netdev_wait_allrefs to decrement the device refcnt to release net_todo_run_mutex # dst_gst_task() waiting to be run by "events" This deadlock can be observed sometimes when the loopback is unregistered when a network namespaces exits. Our tests showed no regression so far with this modification done in linkwatch_event(), but we aren't sure we understood all the subtleties in net device unregistration yet. :) Benjamin Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> --- net/core/link_watch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: net-next-2.6/net/core/link_watch.c =================================================================== --- net-next-2.6.orig/net/core/link_watch.c +++ net-next-2.6/net/core/link_watch.c @@ -207,7 +207,7 @@ static void linkwatch_event(struct work_ { rtnl_lock(); __linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies)); - rtnl_unlock(); + __rtnl_unlock(); } -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery @ 2008-09-30 6:32 ` Jarek Poplawski 2008-09-30 11:52 ` Benjamin Thery 2008-10-03 0:41 ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman 2008-10-05 4:26 ` Herbert Xu 2 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-09-30 6:32 UTC (permalink / raw) To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano On 29-09-2008 19:54, Benjamin Thery wrote: > This patch proposes to replace the rtnl_unlock() call in > linkwatch_event() by __rtnl_unlock(). The difference between the two > routines being that __rtnl_unlock() will not call netdev_run_todo() > after it unlocks rtnl_mutex. > > This is to fix a "deadlock" we observed when unregistering a net device. > > In some circumstances, linkwatch_event() blocks the whole "events" > workqueue while blocking in rtnl_unlock(). > > Here is what happens: > > 1. Unregister a device, the following routines are called: > > -> unregister_netdev > -> rtnl_lock > -> unregister_netdevice > -> rtnl_unlock > -> netdev_run_todo > -> netdev_wait_allrefs > > 2. In netdev_wait_allrefs(), the device's refcount is greater than 0 > because there are still some routes to be garbage collected later. > > 3. Also, some link watch events are pending. netdev_wait_allrefs() > will run the linkwatch event queue, calls linkwatch_run_queue(). > > > Both the route garbage collector dst_gc_task() and the linkwatch task > linkwatch_event() are queued in the same generic workqueue: "events". > > > 4. linkwatch_event() is enqueued earlier in the queue. It will grab > rtnl_lock(), deliver the link watch events pending, and then call > rtnl_unlock(). > rtnl_unlock() will then call netdev_run_todo() and block on > mutex_lock(&net_todo_run_mutex). > > At this point, the workqueue "events" is _blocked_ until the > netdev_wait_allrefs() call above returns when the device refcount > reaches 0. > > Problem: it will never happens if dst_gc_task() was enqueued behind > linkwatch_event() in the "events" workqueue as the queue is now > blocked. ... If it's really like this, I wonder if this can happen without linkwatch too in a non-preemptive config? So maybe this should be fixed somewhere else? According to a comment above netdev_wait_allrefs() it seems references should be rather put down on an UNREGISTER event, so this dst_gc_task() scheduling shouldn't bother us, I guess. Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 6:32 ` Jarek Poplawski @ 2008-09-30 11:52 ` Benjamin Thery 2008-09-30 13:58 ` David Miller 2008-09-30 14:42 ` Jarek Poplawski 0 siblings, 2 replies; 33+ messages in thread From: Benjamin Thery @ 2008-09-30 11:52 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David S. Miller, netdev, Daniel Lezcano Jarek Poplawski wrote: > On 29-09-2008 19:54, Benjamin Thery wrote: >> This patch proposes to replace the rtnl_unlock() call in >> linkwatch_event() by __rtnl_unlock(). The difference between the two >> routines being that __rtnl_unlock() will not call netdev_run_todo() >> after it unlocks rtnl_mutex. >> >> This is to fix a "deadlock" we observed when unregistering a net device. >> >> In some circumstances, linkwatch_event() blocks the whole "events" >> workqueue while blocking in rtnl_unlock(). >> >> Here is what happens: >> >> 1. Unregister a device, the following routines are called: >> >> -> unregister_netdev >> -> rtnl_lock >> -> unregister_netdevice >> -> rtnl_unlock >> -> netdev_run_todo >> -> netdev_wait_allrefs >> >> 2. In netdev_wait_allrefs(), the device's refcount is greater than 0 >> because there are still some routes to be garbage collected later. >> >> 3. Also, some link watch events are pending. netdev_wait_allrefs() >> will run the linkwatch event queue, calls linkwatch_run_queue(). >> >> >> Both the route garbage collector dst_gc_task() and the linkwatch task >> linkwatch_event() are queued in the same generic workqueue: "events". >> >> >> 4. linkwatch_event() is enqueued earlier in the queue. It will grab >> rtnl_lock(), deliver the link watch events pending, and then call >> rtnl_unlock(). >> rtnl_unlock() will then call netdev_run_todo() and block on >> mutex_lock(&net_todo_run_mutex). >> >> At this point, the workqueue "events" is _blocked_ until the >> netdev_wait_allrefs() call above returns when the device refcount >> reaches 0. >> >> Problem: it will never happens if dst_gc_task() was enqueued behind >> linkwatch_event() in the "events" workqueue as the queue is now >> blocked. > ... > > If it's really like this, I wonder if this can happen without linkwatch > too in a non-preemptive config? Um, not sure I fully understand what you mean... do you mean with CONFIG_PREEMPT_NONE=y? > So maybe this should be fixed somewhere > else? According to a comment above netdev_wait_allrefs() it seems > references should be rather put down on an UNREGISTER event, so this > dst_gc_task() scheduling shouldn't bother us, I guess. I saw this comment too. In our case, the UNREGISTER event is sent, notifications are dispatched correctly, some routes are deleted (dst_free()) but not destroyed (dst_destroy()) and the garbage collector as to run to finish the work. dst_entry's may hold a refcount on device until dst_destroy() is run on them. Unfortunately dst_gc_task() won't have a chance to run dst_destroy() on them later in this case because it is stuck in the "events" workqueue behind linkwatch_event() who is blocking everyone else in the queue. I'm still looking at why the first dst_free() on those particular routes doesn't call dst_destroy() immediately but defers it (another refcount on the route itself). BTW, in the past (last year) dst_gc_task() was run as a timer handler and this situation (deadlock with linkwatch_event()) couldn't occur. Thanks, Benjamin > > Jarek P. > > -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 11:52 ` Benjamin Thery @ 2008-09-30 13:58 ` David Miller 2008-09-30 14:07 ` Benjamin Thery 2008-09-30 14:42 ` Jarek Poplawski 1 sibling, 1 reply; 33+ messages in thread From: David Miller @ 2008-09-30 13:58 UTC (permalink / raw) To: benjamin.thery; +Cc: jarkao2, netdev, dlezcano, dada1 From: Benjamin Thery <benjamin.thery@bull.net> Date: Tue, 30 Sep 2008 13:52:31 +0200 > BTW, in the past (last year) dst_gc_task() was run as a timer > handler and this situation (deadlock with linkwatch_event()) > couldn't occur. September 12th to be exact :-) That's when Eric Dumazet's change went in: commit 86bba269d08f0c545ae76c90b56727f65d62d57f Author: Eric Dumazet <dada1@cosmosbay.com> Date: Wed Sep 12 14:29:01 2007 +0200 [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue But this change went into 2.6.24, so I wonder why it's taken so long to hit this. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 13:58 ` David Miller @ 2008-09-30 14:07 ` Benjamin Thery 0 siblings, 0 replies; 33+ messages in thread From: Benjamin Thery @ 2008-09-30 14:07 UTC (permalink / raw) To: David Miller; +Cc: jarkao2, netdev, dlezcano, dada1 David Miller wrote: > From: Benjamin Thery <benjamin.thery@bull.net> > Date: Tue, 30 Sep 2008 13:52:31 +0200 > >> BTW, in the past (last year) dst_gc_task() was run as a timer >> handler and this situation (deadlock with linkwatch_event()) >> couldn't occur. > > September 12th to be exact :-) That's when Eric Dumazet's > change went in: > > commit 86bba269d08f0c545ae76c90b56727f65d62d57f > Author: Eric Dumazet <dada1@cosmosbay.com> > Date: Wed Sep 12 14:29:01 2007 +0200 > > [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue > > > But this change went into 2.6.24, so I wonder why it's taken > so long to hit this. Maybe because we manage to reproduce it only with network namespaces so far and there isn't much netns users at the moment (and netns is still incomplete in mainline). Benjamin > > -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 11:52 ` Benjamin Thery 2008-09-30 13:58 ` David Miller @ 2008-09-30 14:42 ` Jarek Poplawski 2008-09-30 14:57 ` Jarek Poplawski 1 sibling, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-09-30 14:42 UTC (permalink / raw) To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano Benjamin Thery wrote, On 09/30/2008 01:52 PM: > Jarek Poplawski wrote: >> On 29-09-2008 19:54, Benjamin Thery wrote: ... >>> Problem: it will never happens if dst_gc_task() was enqueued behind >>> linkwatch_event() in the "events" workqueue as the queue is now >>> blocked. >> ... >> >> If it's really like this, I wonder if this can happen without linkwatch >> too in a non-preemptive config? > > Um, not sure I fully understand what you mean... do you mean with > CONFIG_PREEMPT_NONE=y? Yes, but after rethinking I see this is irrelevant. Anyway, my main concern is that it seems similar dependency might happen between other than linkwatch work functions or processes waiting for each other. >> So maybe this should be fixed somewhere >> else? According to a comment above netdev_wait_allrefs() it seems >> references should be rather put down on an UNREGISTER event, so this >> dst_gc_task() scheduling shouldn't bother us, I guess. > > I saw this comment too. In our case, the UNREGISTER event is sent, > notifications are dispatched correctly, some routes are deleted > (dst_free()) but not destroyed (dst_destroy()) and the garbage collector > as to run to finish the work. > > dst_entry's may hold a refcount on device until dst_destroy() is run on > them. Unfortunately dst_gc_task() won't have a chance to run > dst_destroy() on them later in this case because it is stuck in the > "events" workqueue behind linkwatch_event() who is blocking everyone > else in the queue. > > I'm still looking at why the first dst_free() on those particular routes > doesn't call dst_destroy() immediately but defers it (another refcount > on the route itself). Yes, finding/fixing this, if possible, in this place looks like the most consistent with the way netdev_wait_allrefs() is handling this. Thanks, Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 14:42 ` Jarek Poplawski @ 2008-09-30 14:57 ` Jarek Poplawski 2008-09-30 15:18 ` Benjamin Thery 0 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-09-30 14:57 UTC (permalink / raw) To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote: > Benjamin Thery wrote, On 09/30/2008 01:52 PM: ... > > I'm still looking at why the first dst_free() on those particular routes > > doesn't call dst_destroy() immediately but defers it (another refcount > > on the route itself). > > Yes, finding/fixing this, if possible, in this place looks like the > most consistent with the way netdev_wait_allrefs() is handling this. Actually, I wonder, why we can't simply run this dst_gc_task() from dst_dev_event() (after cancelling the work) when needed. Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 14:57 ` Jarek Poplawski @ 2008-09-30 15:18 ` Benjamin Thery 2008-10-01 9:59 ` David Miller 0 siblings, 1 reply; 33+ messages in thread From: Benjamin Thery @ 2008-09-30 15:18 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David S. Miller, netdev, Daniel Lezcano Jarek Poplawski wrote: > On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote: >> Benjamin Thery wrote, On 09/30/2008 01:52 PM: > ... >>> I'm still looking at why the first dst_free() on those particular routes >>> doesn't call dst_destroy() immediately but defers it (another refcount >>> on the route itself). >> Yes, finding/fixing this, if possible, in this place looks like the >> most consistent with the way netdev_wait_allrefs() is handling this. > > Actually, I wonder, why we can't simply run this dst_gc_task() from > dst_dev_event() (after cancelling the work) when needed. > Um... I haven't thought about this. I'll have a look to see if it can solve our issue. Benjamin > Jarek P. > > -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-30 15:18 ` Benjamin Thery @ 2008-10-01 9:59 ` David Miller 2008-10-01 10:10 ` Daniel Lezcano 0 siblings, 1 reply; 33+ messages in thread From: David Miller @ 2008-10-01 9:59 UTC (permalink / raw) To: benjamin.thery; +Cc: jarkao2, netdev, dlezcano From: Benjamin Thery <benjamin.thery@bull.net> Date: Tue, 30 Sep 2008 17:18:25 +0200 > Jarek Poplawski wrote: > > On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote: > >> Benjamin Thery wrote, On 09/30/2008 01:52 PM: > > ... > >>> I'm still looking at why the first dst_free() on those particular routes doesn't call dst_destroy() immediately but defers it (another refcount > >>> on the route itself). > >> Yes, finding/fixing this, if possible, in this place looks like the > >> most consistent with the way netdev_wait_allrefs() is handling this. > > Actually, I wonder, why we can't simply run this dst_gc_task() from > > dst_dev_event() (after cancelling the work) when needed. > > > > Um... I haven't thought about this. I'll have a look to see if it can > solve our issue. Let me know what happens, I'd like to apply some fix soon. So just report the patch implementing the final approach you feel the most comfortable with. Thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-01 9:59 ` David Miller @ 2008-10-01 10:10 ` Daniel Lezcano 2008-10-01 10:12 ` David Miller 0 siblings, 1 reply; 33+ messages in thread From: Daniel Lezcano @ 2008-10-01 10:10 UTC (permalink / raw) To: David Miller; +Cc: benjamin.thery, jarkao2, netdev David Miller wrote: > From: Benjamin Thery <benjamin.thery@bull.net> > Date: Tue, 30 Sep 2008 17:18:25 +0200 > >> Jarek Poplawski wrote: >>> On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote: >>>> Benjamin Thery wrote, On 09/30/2008 01:52 PM: >>> ... >>>>> I'm still looking at why the first dst_free() on those particular routes doesn't call dst_destroy() immediately but defers it (another refcount >>>>> on the route itself). >>>> Yes, finding/fixing this, if possible, in this place looks like the >>>> most consistent with the way netdev_wait_allrefs() is handling this. >>> Actually, I wonder, why we can't simply run this dst_gc_task() from >>> dst_dev_event() (after cancelling the work) when needed. >>> >> Um... I haven't thought about this. I'll have a look to see if it can >> solve our issue. > > Let me know what happens, I'd like to apply some fix soon. > So just report the patch implementing the final approach you > feel the most comfortable with. We did the modification suggested by Jarek and that fix the problem :) We are playing a bit with this patch to check if we didn't missed something. We will certainly send it in the next hours. Thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-01 10:10 ` Daniel Lezcano @ 2008-10-01 10:12 ` David Miller 2008-10-01 14:14 ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery 0 siblings, 1 reply; 33+ messages in thread From: David Miller @ 2008-10-01 10:12 UTC (permalink / raw) To: dlezcano; +Cc: benjamin.thery, jarkao2, netdev From: Daniel Lezcano <dlezcano@fr.ibm.com> Date: Wed, 01 Oct 2008 12:10:10 +0200 > We are playing a bit with this patch to check if we didn't missed > something. We will certainly send it in the next hours. Nice. It is quite efficient for us to work together live like this when we are in the same time zone, and even the same country :-) ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 10:12 ` David Miller @ 2008-10-01 14:14 ` Benjamin Thery 2008-10-01 19:48 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Benjamin Thery @ 2008-10-01 14:14 UTC (permalink / raw) To: David S. Miller, Jarek Poplawski; +Cc: netdev, Daniel Lezcano, Benjamin Thery This is the second version of a patch aimed at fixing a deadlock that can occur when unregistering net devices. This new version of the patch ensures the garbage collector dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling it in the netdevice notifier dst_dev_event() when receiving a NETDEV_UNREGISTER event. Thanks to Jarek Poplawski for proposing this fix. (The previous version proposed to replace in linkwatch_event() the call to rntl_unlock() by __rtnl_lock()) Short description of the deadlock - - - - - - - - - - - - - - - - - When reaching netdev_wait_allrefs() some routes may still hold the device refcount. dst_gc_task() should 'garbage collect' them soon (and decrease the refcount). dst_gc_task() is scheduled with a delay in the "events" workqueue. But, as some linkwatch events are also pending, netdev_wait_allrefs() schedules (without delay) a linkwatch_event() in the same workqueue. linkwatch_event runs before dst_gc_task() and blocks in netdev_run_todo (called by rtnl_unlock()), blocking the whole "events" workqueue. netdev_wait_allrefs() waiting for dt_gc_task() waiting dst_gc_task() to decrease the device -----> linkwatch_event() to return refcount. It holds the mutex and unblock the "events" netdev_run_todo_mutex workqueue ^ | | | | | | | | linkwatch_event() waiting for <-+ +------- net_dev_run_todo (who called netdev_wait_allrefs()) to release netdev_run_todo_mutex Long description: - - - - - - - - - In some circumstances, linkwatch_event() blocks the whole "events" workqueue while blocking in rtnl_unlock(), and prevent dst_gc_task() (enqueued in the same workqueue) to be run: 1. Unregister a device, the following routines are called: -> unregister_netdev -> rtnl_lock -> unregister_netdevice -> rtnl_unlock -> netdev_run_todo -> netdev_wait_allrefs 2. In netdev_wait_allrefs(), the device's refcount is greater than 0 because there are still some routes to be garbage collected later. 3. Also, some link watch events are pending. netdev_wait_allrefs() will run the linkwatch event queue, calls linkwatch_run_queue(). Both the route garbage collector dst_gc_task() and the linkwatch task linkwatch_event() are queued in the same generic workqueue: "events". 4. linkwatch_event() is enqueued earlier in the queue. It will grab rtnl_lock(), deliver the link watch events pending, and then call rtnl_unlock(). rtnl_unlock() will then call netdev_run_todo() and block on mutex_lock(&net_todo_run_mutex). At this point, the workqueue "events" is _blocked_ until the netdev_wait_allrefs() call above returns when the device refcount reaches 0. Problem: it will never happens if dst_gc_task() was enqueued behind linkwatch_event() in the "events" workqueue as the queue is now blocked. Thread foo Thread "events" ---------- --------------- free IPv6 routes | -> schedule_delayed_work(dst_gc_task) ... | unregister_netdev | -> rtln_unlock | -> netdev_run_todo | ... -> grab net_todo_run_mutex | -> netdev_wait_allrefs (loop) | -> run__workqueue() -> linkwatch_run_queue() | .... enqueue linkwatch_event | in "events" | | -> run_workqueue() "BLOCKED" | -> linkwatch_event() | -> rtnl_lock() | -> __linkwatch_run_queue | ->rtnl_unlock() | -> netdev_run_todo | -> try to grab net_todo_run_mutex BLOCKED Deadlock: # BLOCKED waiting for dst_gc_task() # BLOCKED waiting for netdev_wait_allrefs to decrement the device refcnt to release net_todo_run_mutex # dst_gst_task() waiting to be run by "events" This deadlock can be observed sometimes when the loopback is unregistered when a network namespaces exits. Benjamin Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> --- net/core/dst.c | 4 ++++ 1 file changed, 4 insertions(+) Index: net-next-2.6/net/core/dst.c =================================================================== --- net-next-2.6.orig/net/core/dst.c +++ net-next-2.6/net/core/dst.c @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier dst_ifdown(dst, dev, event != NETDEV_DOWN); } mutex_unlock(&dst_gc_mutex); + + if (event == NETDEV_UNREGISTER && + cancel_delayed_work(&dst_gc_work)) + dst_gc_task(&dst_gc_work.work); break; } return NOTIFY_DONE; -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 14:14 ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery @ 2008-10-01 19:48 ` Jarek Poplawski 2008-10-01 21:06 ` Daniel Lezcano 0 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-01 19:48 UTC (permalink / raw) To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: > This is the second version of a patch aimed at fixing a deadlock that > can occur when unregistering net devices. > > This new version of the patch ensures the garbage collector > dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling > it in the netdevice notifier dst_dev_event() when receiving a > NETDEV_UNREGISTER event. > > Thanks to Jarek Poplawski for proposing this fix. Not at all! Actually, I'm sorry for this mess... (Read below.) > (The previous version proposed to replace in linkwatch_event() > the call to rntl_unlock() by __rtnl_lock()) ... > --- net-next-2.6.orig/net/core/dst.c > +++ net-next-2.6/net/core/dst.c > @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier > dst_ifdown(dst, dev, event != NETDEV_DOWN); > } > mutex_unlock(&dst_gc_mutex); > + > + if (event == NETDEV_UNREGISTER && > + cancel_delayed_work(&dst_gc_work)) > + dst_gc_task(&dst_gc_work.work); Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only kill this while on timer, but not when queued and maybe blocked already. Probably cancel_delayed_work_sync() should be considered instead, but since this needs more checking, and David is waiting for this, I think it's safer to use this previous (__rtnl_unlock) patch for now (especially for -stable). Thanks, Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 19:48 ` Jarek Poplawski @ 2008-10-01 21:06 ` Daniel Lezcano 2008-10-01 21:52 ` Jarek Poplawski 2008-10-01 23:31 ` Jarek Poplawski 0 siblings, 2 replies; 33+ messages in thread From: Daniel Lezcano @ 2008-10-01 21:06 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Benjamin Thery, David S. Miller, netdev Jarek Poplawski wrote: > On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: >> This is the second version of a patch aimed at fixing a deadlock that >> can occur when unregistering net devices. >> >> This new version of the patch ensures the garbage collector >> dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling >> it in the netdevice notifier dst_dev_event() when receiving a >> NETDEV_UNREGISTER event. >> >> Thanks to Jarek Poplawski for proposing this fix. > > Not at all! Actually, I'm sorry for this mess... (Read below.) > >> (The previous version proposed to replace in linkwatch_event() >> the call to rntl_unlock() by __rtnl_lock()) > ... >> --- net-next-2.6.orig/net/core/dst.c >> +++ net-next-2.6/net/core/dst.c >> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >> dst_ifdown(dst, dev, event != NETDEV_DOWN); >> } >> mutex_unlock(&dst_gc_mutex); >> + >> + if (event == NETDEV_UNREGISTER && >> + cancel_delayed_work(&dst_gc_work)) >> + dst_gc_task(&dst_gc_work.work); > > Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only > kill this while on timer, but not when queued and maybe blocked already. Perhaps, I am misunderstanding but, dst_gc_work is always called with schedule_delayed_work. If the task is not running, we can cancel it and if the task is running, no need to trigger the gc, no ? > Probably cancel_delayed_work_sync() should be considered instead, but > since this needs more checking, and David is waiting for this, I think > it's safer to use this previous (__rtnl_unlock) patch for now > (especially for -stable). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 21:06 ` Daniel Lezcano @ 2008-10-01 21:52 ` Jarek Poplawski 2008-10-01 23:31 ` Jarek Poplawski 1 sibling, 0 replies; 33+ messages in thread From: Jarek Poplawski @ 2008-10-01 21:52 UTC (permalink / raw) To: Daniel Lezcano; +Cc: Benjamin Thery, David S. Miller, netdev On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote: > Jarek Poplawski wrote: >> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: ... >>> --- net-next-2.6.orig/net/core/dst.c >>> +++ net-next-2.6/net/core/dst.c >>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >>> dst_ifdown(dst, dev, event != NETDEV_DOWN); >>> } >>> mutex_unlock(&dst_gc_mutex); >>> + >>> + if (event == NETDEV_UNREGISTER && >>> + cancel_delayed_work(&dst_gc_work)) >>> + dst_gc_task(&dst_gc_work.work); >> >> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only >> kill this while on timer, but not when queued and maybe blocked already. > > Perhaps, I am misunderstanding but, dst_gc_work is always called with > schedule_delayed_work. If the task is not running, we can cancel it and > if the task is running, no need to trigger the gc, no ? Right, but cancel_delayed_work() can't cancel it reliably even if the task is not running (but e.g. queued already just after us - linkwatch). Of course, this should be OK most of the time, especially with such long schedule delays, but, on the other hand, not as sure as the first patch with __rtnl_unlock. So doing this right could take more time. Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 21:06 ` Daniel Lezcano 2008-10-01 21:52 ` Jarek Poplawski @ 2008-10-01 23:31 ` Jarek Poplawski 2008-10-02 15:23 ` Benjamin Thery 1 sibling, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-01 23:31 UTC (permalink / raw) To: Daniel Lezcano; +Cc: Benjamin Thery, David S. Miller, netdev On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote: > Jarek Poplawski wrote: >> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: ... >>> --- net-next-2.6.orig/net/core/dst.c >>> +++ net-next-2.6/net/core/dst.c >>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >>> dst_ifdown(dst, dev, event != NETDEV_DOWN); >>> } >>> mutex_unlock(&dst_gc_mutex); >>> + >>> + if (event == NETDEV_UNREGISTER && >>> + cancel_delayed_work(&dst_gc_work)) >>> + dst_gc_task(&dst_gc_work.work); >> >> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only >> kill this while on timer, but not when queued and maybe blocked already. Hmm#2... Then maybe something like this?: if (event == NETDEV_UNREGISTER && (cancel_delayed_work(&dst_gc_work) || delayed_work_pending(&dst_gc_work))) dst_gc_task(&dst_gc_work.work); Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-01 23:31 ` Jarek Poplawski @ 2008-10-02 15:23 ` Benjamin Thery 2008-10-02 18:38 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Benjamin Thery @ 2008-10-02 15:23 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Daniel Lezcano, David S. Miller, netdev Jarek Poplawski wrote: > On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote: >> Jarek Poplawski wrote: >>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: > ... >>>> --- net-next-2.6.orig/net/core/dst.c >>>> +++ net-next-2.6/net/core/dst.c >>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >>>> dst_ifdown(dst, dev, event != NETDEV_DOWN); >>>> } >>>> mutex_unlock(&dst_gc_mutex); >>>> + >>>> + if (event == NETDEV_UNREGISTER && >>>> + cancel_delayed_work(&dst_gc_work)) >>>> + dst_gc_task(&dst_gc_work.work); >>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only >>> kill this while on timer, but not when queued and maybe blocked already. You're right. > > Hmm#2... Then maybe something like this?: > > if (event == NETDEV_UNREGISTER && > (cancel_delayed_work(&dst_gc_work) || > delayed_work_pending(&dst_gc_work))) > dst_gc_task(&dst_gc_work.work); Hmmm... I'm not sure I understand what this change do? OK, I see this ensure we will run dst_gc_task() even if cancel_delayed_work() failed and the work is still pending (ie. the timer has expired and dst_gc_work is already in the queue). But what if the work was not pending at all at beginning? We still need to run dst_gc_task(). Is something like this better? (code expanded to be more readable) if (event == NETDEV_UNREGISTER) { if (!delayed_work_pending(&dst_gc_work)) /* work is not scheduled (no timer, not in queue) */ dst_gc_task(&dst_gc_work.work); else if (cancel_delayed_work(&dst_gc_work) || delayed_work_pending(&dst_gc_work))) /* work was scheduled (and may be blocked) */ dst_gc_task(&dst_gc_work.work); else /* dst_gc_task() is running, do nothing */ } Benjamin > > > Jarek P. > > -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-02 15:23 ` Benjamin Thery @ 2008-10-02 18:38 ` Jarek Poplawski 2008-10-02 19:55 ` Benjamin Thery 0 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-02 18:38 UTC (permalink / raw) To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote: > Jarek Poplawski wrote: >> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote: >>> Jarek Poplawski wrote: >>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: >> ... >>>>> --- net-next-2.6.orig/net/core/dst.c >>>>> +++ net-next-2.6/net/core/dst.c >>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >>>>> dst_ifdown(dst, dev, event != NETDEV_DOWN); >>>>> } >>>>> mutex_unlock(&dst_gc_mutex); >>>>> + >>>>> + if (event == NETDEV_UNREGISTER && >>>>> + cancel_delayed_work(&dst_gc_work)) >>>>> + dst_gc_task(&dst_gc_work.work); >>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only >>>> kill this while on timer, but not when queued and maybe blocked already. > > You're right. > >> >> Hmm#2... Then maybe something like this?: >> >> if (event == NETDEV_UNREGISTER && >> (cancel_delayed_work(&dst_gc_work) || >> delayed_work_pending(&dst_gc_work))) >> dst_gc_task(&dst_gc_work.work); > > Hmmm... I'm not sure I understand what this change do? > > OK, I see this ensure we will run dst_gc_task() even if > cancel_delayed_work() failed and the work is still pending (ie. the > timer has expired and dst_gc_work is already in the queue). I think, this covers exactly the case of blocking you described, plus more... (the work is queued but not blocked). > > But what if the work was not pending at all at beginning? > We still need to run dst_gc_task(). Maybe I miss something, but if this is really needed here then it looks like we are fixing more than "the blocking" bug BTW. > > Is something like this better? > (code expanded to be more readable) > > if (event == NETDEV_UNREGISTER) { > if (!delayed_work_pending(&dst_gc_work)) > /* work is not scheduled (no timer, not in queue) */ /* may be running too */ > dst_gc_task(&dst_gc_work.work); > else if (cancel_delayed_work(&dst_gc_work) || > delayed_work_pending(&dst_gc_work))) > /* work was scheduled (and may be blocked) */ /* actually could be both running and pending here: * if it's after rearming */ > dst_gc_task(&dst_gc_work.work); > else > /* dst_gc_task() is running, do nothing */ So again !delayed_work_pending() - there could be the change of state while checking - but then looks a bit inconsistent. I think this should be OK too. As a matter of fact I've thought about something even simpler, which probably should help for all above concerns too: if (event == NETDEV_UNREGISTER) dst_gc_task(&dst_gc_work.work); dst_gc_task() locking allows for this, and running this two times in a row could be even faster than trying to cancel the unnecessary run. Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-02 18:38 ` Jarek Poplawski @ 2008-10-02 19:55 ` Benjamin Thery 2008-10-02 20:34 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Benjamin Thery @ 2008-10-02 19:55 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Daniel Lezcano, David S. Miller, netdev Quoting Jarek Poplawski <jarkao2@gmail.com>: > On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote: >> Jarek Poplawski wrote: >>> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote: >>>> Jarek Poplawski wrote: >>>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote: >>> ... >>>>>> --- net-next-2.6.orig/net/core/dst.c >>>>>> +++ net-next-2.6/net/core/dst.c >>>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier >>>>>> dst_ifdown(dst, dev, event != NETDEV_DOWN); >>>>>> } >>>>>> mutex_unlock(&dst_gc_mutex); >>>>>> + >>>>>> + if (event == NETDEV_UNREGISTER && >>>>>> + cancel_delayed_work(&dst_gc_work)) >>>>>> + dst_gc_task(&dst_gc_work.work); >>>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only >>>>> kill this while on timer, but not when queued and maybe blocked already. >> >> You're right. >> >>> >>> Hmm#2... Then maybe something like this?: >>> >>> if (event == NETDEV_UNREGISTER && >>> (cancel_delayed_work(&dst_gc_work) || >>> delayed_work_pending(&dst_gc_work))) >>> dst_gc_task(&dst_gc_work.work); >> >> Hmmm... I'm not sure I understand what this change do? >> >> OK, I see this ensure we will run dst_gc_task() even if >> cancel_delayed_work() failed and the work is still pending (ie. the >> timer has expired and dst_gc_work is already in the queue). > > I think, this covers exactly the case of blocking you described, plus > more... (the work is queued but not blocked). > >> >> But what if the work was not pending at all at beginning? >> We still need to run dst_gc_task(). > > Maybe I miss something, but if this is really needed here then it > looks like we are fixing more than "the blocking" bug BTW. > >> >> Is something like this better? >> (code expanded to be more readable) >> >> if (event == NETDEV_UNREGISTER) { >> if (!delayed_work_pending(&dst_gc_work)) >> /* work is not scheduled (no timer, not in queue) */ > > /* may be running too */ > >> dst_gc_task(&dst_gc_work.work); >> else if (cancel_delayed_work(&dst_gc_work) || >> delayed_work_pending(&dst_gc_work))) >> /* work was scheduled (and may be blocked) */ > > /* actually could be both running and pending here: > * if it's after rearming > */ > >> dst_gc_task(&dst_gc_work.work); >> else >> /* dst_gc_task() is running, do nothing */ > > So again !delayed_work_pending() - there could be the change of state > while checking - but then looks a bit inconsistent. I think this should > be OK too. > > As a matter of fact I've thought about something even simpler, which > probably should help for all above concerns too: > > if (event == NETDEV_UNREGISTER) > dst_gc_task(&dst_gc_work.work); > > dst_gc_task() locking allows for this, and running this two times in > a row could be even faster than trying to cancel the unnecessary run. I've thought a bit more about my last proposal and come to the same conclusion as you, hmm, almost. I thought we could call cancel_delayed_work() unconditionally and then dst_gc_task(). if (event == NETDEV_UNREGISTER) { cancel_delayed_work(&dst_gc_work); dst_gc_task(&dst_gc_work.work); } But, you're right, calling only dst_gc_task() seems fine to me. I'll run some tests tomorrow and send a new patch. Do we agree that this fix (calling dst_gc_task() in dst_dev_event()) is better/more appropriate than the first one (replacing rtnl_unlock() by the non-blocking __rtnl_unlock() in linkwatch_event())? Thanks, Benjamin > > Jarek P. > > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-02 19:55 ` Benjamin Thery @ 2008-10-02 20:34 ` Jarek Poplawski 2008-10-04 7:42 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-02 20:34 UTC (permalink / raw) To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery wrote: > Quoting Jarek Poplawski <jarkao2@gmail.com>: ... >> As a matter of fact I've thought about something even simpler, which >> probably should help for all above concerns too: >> >> if (event == NETDEV_UNREGISTER) >> dst_gc_task(&dst_gc_work.work); >> >> dst_gc_task() locking allows for this, and running this two times in >> a row could be even faster than trying to cancel the unnecessary run. > > I've thought a bit more about my last proposal and come to the same > conclusion as you, hmm, almost. I thought we could call > cancel_delayed_work() unconditionally and then dst_gc_task(). > > if (event == NETDEV_UNREGISTER) { > cancel_delayed_work(&dst_gc_work); > dst_gc_task(&dst_gc_work.work); > } > > But, you're right, calling only dst_gc_task() seems fine to me. I'm OK with any of these versions. > > I'll run some tests tomorrow and send a new patch. > > Do we agree that this fix (calling dst_gc_task() in dst_dev_event()) > is better/more appropriate than the first one (replacing rtnl_unlock() > by the non-blocking __rtnl_unlock() in linkwatch_event())? I agree it's more appropriate, because: a) the job is done according to the rules (in the comment) during the notification and not in some future, b) it removes some hidden dependencies between processes/works, which, even if they currently don't exist except this one in the linkwatch, could be extremly hard to diagnose, if added accidentally in the future. On the other hand, until we are not sure of the reasons, why something like this (the full destroying) was avoided in the past, and until it's heavily tested (with lockdep), your first patch looks to me much safer if applying to any -stable is considered. Thanks, Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-02 20:34 ` Jarek Poplawski @ 2008-10-04 7:42 ` Jarek Poplawski 2008-10-04 7:52 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-04 7:42 UTC (permalink / raw) To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev On Thu, Oct 02, 2008 at 10:34:35PM +0200, Jarek Poplawski wrote: > On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery wrote: > > Quoting Jarek Poplawski <jarkao2@gmail.com>: > ... > >> As a matter of fact I've thought about something even simpler, which > >> probably should help for all above concerns too: > >> > >> if (event == NETDEV_UNREGISTER) > >> dst_gc_task(&dst_gc_work.work); > >> > >> dst_gc_task() locking allows for this, and running this two times in > >> a row could be even faster than trying to cancel the unnecessary run. > > > > I've thought a bit more about my last proposal and come to the same > > conclusion as you, hmm, almost. I thought we could call > > cancel_delayed_work() unconditionally and then dst_gc_task(). > > > > if (event == NETDEV_UNREGISTER) { > > cancel_delayed_work(&dst_gc_work); > > dst_gc_task(&dst_gc_work.work); > > } > > > > But, you're right, calling only dst_gc_task() seems fine to me. > > I'm OK with any of these versions. > > > > > I'll run some tests tomorrow and send a new patch. > > > > Do we agree that this fix (calling dst_gc_task() in dst_dev_event()) > > is better/more appropriate than the first one (replacing rtnl_unlock() > > by the non-blocking __rtnl_unlock() in linkwatch_event())? Hmm... You can kill me, but after more looking at this I've changed my mind. > I agree it's more appropriate, because: > a) the job is done according to the rules (in the comment) during > the notification and not in some future, But, since the rules are wrong or protocols buggy, the solution isn't effective: there should be no deadlock after this, but it's also hard to predict how many dst_gc_task() runs are needed to free all refs. During this all other works would be unnecessarily blocked by the linkwatch, what looks a bit too hoggish. My current idea is we should move linkwatch or dst_gc_task() (or both) to it's own workqueue. > b) it removes some hidden dependencies between processes/works, which, > even if they currently don't exist except this one in the linkwatch, > could be extremly hard to diagnose, if added accidentally in the > future. > > On the other hand, until we are not sure of the reasons, why something BTW, sorry, of course: "On the other hand, until we are sure of the reasons, why something" > like this (the full destroying) was avoided in the past, and until it's > heavily tested (with lockdep), your first patch looks to me much safer > if applying to any -stable is considered. And this first patch looks better and better each time... Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration - V2 2008-10-04 7:42 ` Jarek Poplawski @ 2008-10-04 7:52 ` Jarek Poplawski 0 siblings, 0 replies; 33+ messages in thread From: Jarek Poplawski @ 2008-10-04 7:52 UTC (permalink / raw) To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev On Sat, Oct 04, 2008 at 09:42:48AM +0200, Jarek Poplawski wrote: ... > During this all other works would be unnecessarily blocked by the > linkwatch, what looks a bit too hoggish. Or even more: can't some of these blockeds task hold our refs too? Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery 2008-09-30 6:32 ` Jarek Poplawski @ 2008-10-03 0:41 ` Eric W. Biederman 2008-10-05 4:26 ` Herbert Xu 2 siblings, 0 replies; 33+ messages in thread From: Eric W. Biederman @ 2008-10-03 0:41 UTC (permalink / raw) To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano Benjamin Thery <benjamin.thery@bull.net> writes: > This patch proposes to replace the rtnl_unlock() call in > linkwatch_event() by __rtnl_unlock(). The difference between the two > routines being that __rtnl_unlock() will not call netdev_run_todo() > after it unlocks rtnl_mutex. > > This is to fix a "deadlock" we observed when unregistering a net device. Thanks for tracking this I just started seeing this a few days ago, and before I reach the point of tracking it down you are in the process of fixing it ;) Eric ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery 2008-09-30 6:32 ` Jarek Poplawski 2008-10-03 0:41 ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman @ 2008-10-05 4:26 ` Herbert Xu 2008-10-05 6:55 ` Jarek Poplawski ` (2 more replies) 2 siblings, 3 replies; 33+ messages in thread From: Herbert Xu @ 2008-10-05 4:26 UTC (permalink / raw) To: Benjamin Thery; +Cc: davem, netdev, dlezcano, benjamin.thery Benjamin Thery <benjamin.thery@bull.net> wrote: > > 1. Unregister a device, the following routines are called: > > -> unregister_netdev > -> rtnl_lock > -> unregister_netdevice > -> rtnl_unlock > -> netdev_run_todo > -> netdev_wait_allrefs OK, this explains lots of dead-locks that people have been seeing. But I think we can go a step further: net: Fix netdev_run_todo dead-lock Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/net/core/dev.c b/net/core/dev.c index e8eb2b4..021f531 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net) } /* Delayed registration/unregisteration */ -static DEFINE_SPINLOCK(net_todo_list_lock); static LIST_HEAD(net_todo_list); static void net_set_todo(struct net_device *dev) { - spin_lock(&net_todo_list_lock); list_add_tail(&dev->todo_list, &net_todo_list); - spin_unlock(&net_todo_list_lock); } static void rollback_registered(struct net_device *dev) @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev) * free_netdev(y1); * free_netdev(y2); * - * We are invoked by rtnl_unlock() after it drops the semaphore. + * We are invoked by rtnl_unlock(). * This allows us to deal with problems: * 1) We can delete sysfs objects which invoke hotplug * without deadlocking with linkwatch via keventd. * 2) Since we run with the RTNL semaphore not held, we can sleep * safely in order to wait for the netdev refcnt to drop to zero. + * + * We must not return until all unregister events added during + * the interval the lock was held have been completed. */ -static DEFINE_MUTEX(net_todo_run_mutex); void netdev_run_todo(void) { struct list_head list; - /* Need to guard against multiple cpu's getting out of order. */ - mutex_lock(&net_todo_run_mutex); - - /* Not safe to do outside the semaphore. We must not return - * until all unregister events invoked by the local processor - * have been completed (either by this todo run, or one on - * another cpu). - */ - if (list_empty(&net_todo_list)) - goto out; - /* Snapshot list, allow later requests */ - spin_lock(&net_todo_list_lock); list_replace_init(&net_todo_list, &list); - spin_unlock(&net_todo_list_lock); + + __rtnl_unlock(); while (!list_empty(&list)) { struct net_device *dev @@ -4200,9 +4188,6 @@ void netdev_run_todo(void) /* Free network device */ kobject_put(&dev->dev.kobj); } - -out: - mutex_unlock(&net_todo_run_mutex); } static struct net_device_stats *internal_stats(struct net_device *dev) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 71edb8b..d6381c2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -73,7 +73,7 @@ void __rtnl_unlock(void) void rtnl_unlock(void) { - mutex_unlock(&rtnl_mutex); + /* This fellow will unlock it for us. */ netdev_run_todo(); } Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 4:26 ` Herbert Xu @ 2008-10-05 6:55 ` Jarek Poplawski 2008-10-05 6:56 ` Herbert Xu 2008-10-06 15:19 ` Benjamin Thery 2008-10-07 22:50 ` David Miller 2 siblings, 1 reply; 33+ messages in thread From: Jarek Poplawski @ 2008-10-05 6:55 UTC (permalink / raw) To: Herbert Xu; +Cc: Benjamin Thery, davem, netdev, dlezcano Herbert Xu wrote, On 10/05/2008 06:26 AM: ... > The problem is really quite silly. We were trying to create > parallelism where none was required. As netdev_run_todo always > follows a RTNL section, and that todo tasks can only be added > with the RTNL held, by definition you should only need to wait > for the very ones that you've added and be done with it. > > There is no need for a second mutex or spinlock. Is it for sure? (Read below.) > > This is exactly what the following patch does. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/core/dev.c b/net/core/dev.c > index e8eb2b4..021f531 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net) > } > > /* Delayed registration/unregisteration */ > -static DEFINE_SPINLOCK(net_todo_list_lock); > static LIST_HEAD(net_todo_list); > > static void net_set_todo(struct net_device *dev) > { > - spin_lock(&net_todo_list_lock); > list_add_tail(&dev->todo_list, &net_todo_list); > - spin_unlock(&net_todo_list_lock); > } > > static void rollback_registered(struct net_device *dev) > @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev) > * free_netdev(y1); > * free_netdev(y2); > * > - * We are invoked by rtnl_unlock() after it drops the semaphore. > + * We are invoked by rtnl_unlock(). > * This allows us to deal with problems: > * 1) We can delete sysfs objects which invoke hotplug > * without deadlocking with linkwatch via keventd. > * 2) Since we run with the RTNL semaphore not held, we can sleep > * safely in order to wait for the netdev refcnt to drop to zero. > + * > + * We must not return until all unregister events added during > + * the interval the lock was held have been completed. > */ > -static DEFINE_MUTEX(net_todo_run_mutex); > void netdev_run_todo(void) > { > struct list_head list; > > - /* Need to guard against multiple cpu's getting out of order. */ > - mutex_lock(&net_todo_run_mutex); > - > - /* Not safe to do outside the semaphore. We must not return > - * until all unregister events invoked by the local processor > - * have been completed (either by this todo run, or one on > - * another cpu). > - */ I think, it's about not to let others run this for devices unregistered within later rtnl_locks before completing previous tasks. So, it would be nice to have some comment why it's not necessary anymore. Cheers, Jarek P. > - if (list_empty(&net_todo_list)) > - goto out; > - > /* Snapshot list, allow later requests */ > - spin_lock(&net_todo_list_lock); > list_replace_init(&net_todo_list, &list); > - spin_unlock(&net_todo_list_lock); > + > + __rtnl_unlock(); > > while (!list_empty(&list)) { > struct net_device *dev > @@ -4200,9 +4188,6 @@ void netdev_run_todo(void) > /* Free network device */ > kobject_put(&dev->dev.kobj); > } > - > -out: > - mutex_unlock(&net_todo_run_mutex); > } > > static struct net_device_stats *internal_stats(struct net_device *dev) > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 71edb8b..d6381c2 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -73,7 +73,7 @@ void __rtnl_unlock(void) > > void rtnl_unlock(void) > { > - mutex_unlock(&rtnl_mutex); > + /* This fellow will unlock it for us. */ > netdev_run_todo(); > } > > Cheers, ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 6:55 ` Jarek Poplawski @ 2008-10-05 6:56 ` Herbert Xu 2008-10-05 7:12 ` Jarek Poplawski 0 siblings, 1 reply; 33+ messages in thread From: Herbert Xu @ 2008-10-05 6:56 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Benjamin Thery, davem, netdev, dlezcano On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote: > > > - /* Not safe to do outside the semaphore. We must not return > > - * until all unregister events invoked by the local processor > > - * have been completed (either by this todo run, or one on > > - * another cpu). > > - */ > > I think, it's about not to let others run this for devices unregistered > within later rtnl_locks before completing previous tasks. So, it would > be nice to have some comment why it's not necessary anymore. Where did you get that idea? This was there because people did (and still do) stuff like: unregister_netdev(dev); free_netdev(dev); Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 6:56 ` Herbert Xu @ 2008-10-05 7:12 ` Jarek Poplawski 2008-10-05 7:28 ` Stephen Hemminger 2008-10-05 7:39 ` Herbert Xu 0 siblings, 2 replies; 33+ messages in thread From: Jarek Poplawski @ 2008-10-05 7:12 UTC (permalink / raw) To: Herbert Xu; +Cc: Benjamin Thery, davem, netdev, dlezcano On Sun, Oct 05, 2008 at 02:56:48PM +0800, Herbert Xu wrote: > On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote: > > > > > - /* Not safe to do outside the semaphore. We must not return > > > - * until all unregister events invoked by the local processor > > > - * have been completed (either by this todo run, or one on > > > - * another cpu). > > > - */ > > > > I think, it's about not to let others run this for devices unregistered > > within later rtnl_locks before completing previous tasks. So, it would > > be nice to have some comment why it's not necessary anymore. > > Where did you get that idea? Just reading this code (plus the comment). Why would anybody bother with something so complex like this if something like your idea is rather straightforward? But, needed or not, my point is it would be nice to comment that this patch changes this behavior btw. Jarek P. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 7:12 ` Jarek Poplawski @ 2008-10-05 7:28 ` Stephen Hemminger 2008-10-05 7:38 ` Herbert Xu 2008-10-05 7:39 ` Herbert Xu 1 sibling, 1 reply; 33+ messages in thread From: Stephen Hemminger @ 2008-10-05 7:28 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Herbert Xu, Benjamin Thery, davem, netdev, dlezcano On Sun, 5 Oct 2008 09:12:38 +0200 Jarek Poplawski <jarkao2@gmail.com> wrote: > On Sun, Oct 05, 2008 at 02:56:48PM +0800, Herbert Xu wrote: > > On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote: > > > > > > > - /* Not safe to do outside the semaphore. We must not return > > > > - * until all unregister events invoked by the local processor > > > > - * have been completed (either by this todo run, or one on > > > > - * another cpu). > > > > - */ > > > > > > I think, it's about not to let others run this for devices unregistered > > > within later rtnl_locks before completing previous tasks. So, it would > > > be nice to have some comment why it's not necessary anymore. > > > > Where did you get that idea? > > Just reading this code (plus the comment). Why would anybody bother > with something so complex like this if something like your idea is > rather straightforward? But, needed or not, my point is it would be > nice to comment that this patch changes this behavior btw. I think there were issues with unregister triggering hotplug udev events, but that may have been long ago when rtnl_lock was a semaphore not a mutex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 7:28 ` Stephen Hemminger @ 2008-10-05 7:38 ` Herbert Xu 0 siblings, 0 replies; 33+ messages in thread From: Herbert Xu @ 2008-10-05 7:38 UTC (permalink / raw) To: Stephen Hemminger Cc: Jarek Poplawski, Benjamin Thery, davem, netdev, dlezcano On Sun, Oct 05, 2008 at 09:28:23AM +0200, Stephen Hemminger wrote: > I think there were issues with unregister triggering hotplug udev > events, but that may have been long ago when rtnl_lock was > a semaphore not a mutex. Well as it is we have no hotplug events in netdev_run_todo at all. They're all in unregister_netdevice which is where they should be and that runs under the RTNL. In fact it would be a bug if we had anything in netdev_run_todo tied to sysfs because once we get here we've relinquished our ownership of the device name so someone else is free to readd a device with the same name. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 7:12 ` Jarek Poplawski 2008-10-05 7:28 ` Stephen Hemminger @ 2008-10-05 7:39 ` Herbert Xu 1 sibling, 0 replies; 33+ messages in thread From: Herbert Xu @ 2008-10-05 7:39 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Benjamin Thery, davem, netdev, dlezcano On Sun, Oct 05, 2008 at 09:12:38AM +0200, Jarek Poplawski wrote: > > Just reading this code (plus the comment). Why would anybody bother > with something so complex like this if something like your idea is > rather straightforward? But, needed or not, my point is it would be > nice to comment that this patch changes this behavior btw. I'm sorry but I don't have time to consider such hypotheticals. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 4:26 ` Herbert Xu 2008-10-05 6:55 ` Jarek Poplawski @ 2008-10-06 15:19 ` Benjamin Thery 2008-10-07 22:46 ` David Miller 2008-10-07 22:50 ` David Miller 2 siblings, 1 reply; 33+ messages in thread From: Benjamin Thery @ 2008-10-06 15:19 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, netdev, dlezcano Herbert Xu wrote: > Benjamin Thery <benjamin.thery@bull.net> wrote: >> 1. Unregister a device, the following routines are called: >> >> -> unregister_netdev >> -> rtnl_lock >> -> unregister_netdevice >> -> rtnl_unlock >> -> netdev_run_todo >> -> netdev_wait_allrefs > > OK, this explains lots of dead-locks that people have been seeing. > > But I think we can go a step further: > > net: Fix netdev_run_todo dead-lock > > Benjamin Thery tracked down a bug that explains many instances > of the error > > unregister_netdevice: waiting for %s to become free. Usage count = %d > > It turns out that netdev_run_todo can dead-lock with itself if > a second instance of it is run in a thread that will then free > a reference to the device waited on by the first instance. > > The problem is really quite silly. We were trying to create > parallelism where none was required. As netdev_run_todo always > follows a RTNL section, and that todo tasks can only be added > with the RTNL held, by definition you should only need to wait > for the very ones that you've added and be done with it. > > There is no need for a second mutex or spinlock. > > This is exactly what the following patch does. Herbert, thank you for having looked at the issue too. When I understood how the dead lock happened, I considered playing with the locks in net_set_todo()/netdev_run_todo(), but as some comments in the code in this area sounds a bit too cryptic for my brain, I didn't dare to change them myself. :) I guess you know a lot better than me the history behind this piece of code and why it was done this way. I tested your patch on my testbed during all the afternoon. It fixes the dead lock I can easily reproduce here with net namespaces and I didn't produce any regressions on my setup. Benjamin > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/core/dev.c b/net/core/dev.c > index e8eb2b4..021f531 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net) > } > > /* Delayed registration/unregisteration */ > -static DEFINE_SPINLOCK(net_todo_list_lock); > static LIST_HEAD(net_todo_list); > > static void net_set_todo(struct net_device *dev) > { > - spin_lock(&net_todo_list_lock); > list_add_tail(&dev->todo_list, &net_todo_list); > - spin_unlock(&net_todo_list_lock); > } > > static void rollback_registered(struct net_device *dev) > @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev) > * free_netdev(y1); > * free_netdev(y2); > * > - * We are invoked by rtnl_unlock() after it drops the semaphore. > + * We are invoked by rtnl_unlock(). > * This allows us to deal with problems: > * 1) We can delete sysfs objects which invoke hotplug > * without deadlocking with linkwatch via keventd. > * 2) Since we run with the RTNL semaphore not held, we can sleep > * safely in order to wait for the netdev refcnt to drop to zero. > + * > + * We must not return until all unregister events added during > + * the interval the lock was held have been completed. > */ > -static DEFINE_MUTEX(net_todo_run_mutex); > void netdev_run_todo(void) > { > struct list_head list; > > - /* Need to guard against multiple cpu's getting out of order. */ > - mutex_lock(&net_todo_run_mutex); > - > - /* Not safe to do outside the semaphore. We must not return > - * until all unregister events invoked by the local processor > - * have been completed (either by this todo run, or one on > - * another cpu). > - */ > - if (list_empty(&net_todo_list)) > - goto out; > - > /* Snapshot list, allow later requests */ > - spin_lock(&net_todo_list_lock); > list_replace_init(&net_todo_list, &list); > - spin_unlock(&net_todo_list_lock); > + > + __rtnl_unlock(); > > while (!list_empty(&list)) { > struct net_device *dev > @@ -4200,9 +4188,6 @@ void netdev_run_todo(void) > /* Free network device */ > kobject_put(&dev->dev.kobj); > } > - > -out: > - mutex_unlock(&net_todo_run_mutex); > } > > static struct net_device_stats *internal_stats(struct net_device *dev) > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 71edb8b..d6381c2 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -73,7 +73,7 @@ void __rtnl_unlock(void) > > void rtnl_unlock(void) > { > - mutex_unlock(&rtnl_mutex); > + /* This fellow will unlock it for us. */ > netdev_run_todo(); > } > > Cheers, -- B e n j a m i n T h e r y - BULL/DT/Open Software R&D http://www.bull.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-06 15:19 ` Benjamin Thery @ 2008-10-07 22:46 ` David Miller 0 siblings, 0 replies; 33+ messages in thread From: David Miller @ 2008-10-07 22:46 UTC (permalink / raw) To: benjamin.thery; +Cc: herbert, netdev, dlezcano From: Benjamin Thery <benjamin.thery@bull.net> Date: Mon, 06 Oct 2008 17:19:13 +0200 > I guess you know a lot better than me the history behind this piece of > code and why it was done this way. Or maybe Herbert doesn't know :-) I wrote this crap, and I can't remember why I decided to drop the RTNL semaphore so early and go directly to that todo lock. Maybe it was purely because of how the interfaces are arranged and I considered it ugly to drop the RTNL semaphore in the todo list running code. Who knows... I'm going to scrutinize Herbert's patch some more, but this is really a borderline change to put into 2.6.27 so late in the RC cycle. I want to fix this bug, but... it's risky. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] net: deadlock during net device unregistration 2008-10-05 4:26 ` Herbert Xu 2008-10-05 6:55 ` Jarek Poplawski 2008-10-06 15:19 ` Benjamin Thery @ 2008-10-07 22:50 ` David Miller 2 siblings, 0 replies; 33+ messages in thread From: David Miller @ 2008-10-07 22:50 UTC (permalink / raw) To: herbert; +Cc: benjamin.thery, netdev, dlezcano From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 05 Oct 2008 12:26:21 +0800 > net: Fix netdev_run_todo dead-lock > > Benjamin Thery tracked down a bug that explains many instances > of the error > > unregister_netdevice: waiting for %s to become free. Usage count = %d > > It turns out that netdev_run_todo can dead-lock with itself if > a second instance of it is run in a thread that will then free > a reference to the device waited on by the first instance. > > The problem is really quite silly. We were trying to create > parallelism where none was required. As netdev_run_todo always > follows a RTNL section, and that todo tasks can only be added > with the RTNL held, by definition you should only need to wait > for the very ones that you've added and be done with it. > > There is no need for a second mutex or spinlock. > > This is exactly what the following patch does. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Ok, this looks safe. I've applied this to net-2.6, thanks! ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2008-10-07 22:51 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080929175412.866679567@theryb.frec.bull.fr>
2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery
2008-09-30 6:32 ` Jarek Poplawski
2008-09-30 11:52 ` Benjamin Thery
2008-09-30 13:58 ` David Miller
2008-09-30 14:07 ` Benjamin Thery
2008-09-30 14:42 ` Jarek Poplawski
2008-09-30 14:57 ` Jarek Poplawski
2008-09-30 15:18 ` Benjamin Thery
2008-10-01 9:59 ` David Miller
2008-10-01 10:10 ` Daniel Lezcano
2008-10-01 10:12 ` David Miller
2008-10-01 14:14 ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery
2008-10-01 19:48 ` Jarek Poplawski
2008-10-01 21:06 ` Daniel Lezcano
2008-10-01 21:52 ` Jarek Poplawski
2008-10-01 23:31 ` Jarek Poplawski
2008-10-02 15:23 ` Benjamin Thery
2008-10-02 18:38 ` Jarek Poplawski
2008-10-02 19:55 ` Benjamin Thery
2008-10-02 20:34 ` Jarek Poplawski
2008-10-04 7:42 ` Jarek Poplawski
2008-10-04 7:52 ` Jarek Poplawski
2008-10-03 0:41 ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman
2008-10-05 4:26 ` Herbert Xu
2008-10-05 6:55 ` Jarek Poplawski
2008-10-05 6:56 ` Herbert Xu
2008-10-05 7:12 ` Jarek Poplawski
2008-10-05 7:28 ` Stephen Hemminger
2008-10-05 7:38 ` Herbert Xu
2008-10-05 7:39 ` Herbert Xu
2008-10-06 15:19 ` Benjamin Thery
2008-10-07 22:46 ` David Miller
2008-10-07 22:50 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).