[PATCH] net: deadlock during net device unregistration

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] net: deadlock during net device unregistration
       [not found] <20080929175412.866679567@theryb.frec.bull.fr>
@ 2008-09-29 17:54 ` Benjamin Thery
  2008-09-30  6:32   ` Jarek Poplawski
                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Benjamin Thery @ 2008-09-29 17:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Daniel Lezcano, Benjamin Thery

This patch proposes to replace the rtnl_unlock() call in 
linkwatch_event() by __rtnl_unlock(). The difference between the two
routines being that __rtnl_unlock() will not call netdev_run_todo()
after it unlocks rtnl_mutex.

This is to fix a "deadlock" we observed when unregistering a net device.

In some circumstances, linkwatch_event() blocks the whole "events" 
workqueue while blocking in rtnl_unlock().

Here is what happens:

1. Unregister a device, the following routines are called:

-> unregister_netdev
  -> rtnl_lock
  -> unregister_netdevice
  -> rtnl_unlock
    -> netdev_run_todo
      -> netdev_wait_allrefs

2. In netdev_wait_allrefs(), the device's refcount is greater than 0
  because there are still some routes to be garbage collected later.

3. Also, some link watch events are pending. netdev_wait_allrefs()
  will run the linkwatch event queue, calls linkwatch_run_queue().

Both the route garbage collector dst_gc_task() and the linkwatch task
linkwatch_event() are queued in the same generic workqueue: "events".

4. linkwatch_event() is enqueued earlier in the queue. It will grab
  rtnl_lock(), deliver the link watch events pending, and then call
  rtnl_unlock(). 
  rtnl_unlock() will then call netdev_run_todo() and block on
  mutex_lock(&net_todo_run_mutex).

  At this point, the workqueue "events" is _blocked_ until the
  netdev_wait_allrefs() call above returns when the device refcount 
  reaches 0.

  Problem: it will never happens if dst_gc_task() was enqueued behind
  linkwatch_event() in the "events" workqueue as the queue is now 
  blocked.

Thread foo                            Thread "events"
----------                            ---------------

free IPv6 routes                    |
-> schedule_delayed_work(dst_gc_task)
...                                 |
unregister_netdev                   |
 -> rtln_unlock                     | 
   -> netdev_run_todo               | ...
     -> grab net_todo_run_mutex     |
     -> netdev_wait_allrefs (loop)  | -> run__workqueue()
        ->  linkwatch_run_queue()   | ....
            enqueue linkwatch_event |
            in "events"             |
                                    | -> run_workqueue()
        "BLOCKED"                   |   -> linkwatch_event()
                                    |     -> rtnl_lock()
                                    |     -> __linkwatch_run_queue 
                                    |     ->rtnl_unlock()
                                    |       -> netdev_run_todo
                                    |          -> try to grab net_todo_run_mutex
                                               BLOCKED

Deadlock: 

# BLOCKED waiting for dst_gc_task()  # BLOCKED waiting for netdev_wait_allrefs
to decrement the device refcnt         to release net_todo_run_mutex

                  # dst_gst_task() waiting to be run by "events"

This deadlock can be observed sometimes when the loopback is 
unregistered when a network namespaces exits.

Our tests showed no regression so far with this modification done in 
linkwatch_event(), but we aren't sure we understood all the subtleties
in net device unregistration yet. :)

Benjamin

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
 net/core/link_watch.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: net-next-2.6/net/core/link_watch.c
===================================================================
--- net-next-2.6.orig/net/core/link_watch.c
+++ net-next-2.6/net/core/link_watch.c
@@ -207,7 +207,7 @@ static void linkwatch_event(struct work_
 {
 	rtnl_lock();
 	__linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies));
-	rtnl_unlock();
+	__rtnl_unlock();
 }

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery
@ 2008-09-30  6:32   ` Jarek Poplawski
  2008-09-30 11:52     ` Benjamin Thery
  2008-10-03  0:41   ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman
  2008-10-05  4:26   ` Herbert Xu
  2 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-09-30  6:32 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano

On 29-09-2008 19:54, Benjamin Thery wrote:
> This patch proposes to replace the rtnl_unlock() call in 
> linkwatch_event() by __rtnl_unlock(). The difference between the two
> routines being that __rtnl_unlock() will not call netdev_run_todo()
> after it unlocks rtnl_mutex.
> 
> This is to fix a "deadlock" we observed when unregistering a net device.
> 
> In some circumstances, linkwatch_event() blocks the whole "events" 
> workqueue while blocking in rtnl_unlock().
> 
> Here is what happens:
> 
> 1. Unregister a device, the following routines are called:
> 
> -> unregister_netdev
>   -> rtnl_lock
>   -> unregister_netdevice
>   -> rtnl_unlock
>     -> netdev_run_todo
>       -> netdev_wait_allrefs
> 
> 2. In netdev_wait_allrefs(), the device's refcount is greater than 0
>   because there are still some routes to be garbage collected later.
> 
> 3. Also, some link watch events are pending. netdev_wait_allrefs()
>   will run the linkwatch event queue, calls linkwatch_run_queue().
> 
> 
> Both the route garbage collector dst_gc_task() and the linkwatch task
> linkwatch_event() are queued in the same generic workqueue: "events".
> 
> 
> 4. linkwatch_event() is enqueued earlier in the queue. It will grab
>   rtnl_lock(), deliver the link watch events pending, and then call
>   rtnl_unlock(). 
>   rtnl_unlock() will then call netdev_run_todo() and block on
>   mutex_lock(&net_todo_run_mutex).
> 
>   At this point, the workqueue "events" is _blocked_ until the
>   netdev_wait_allrefs() call above returns when the device refcount 
>   reaches 0.
> 
>   Problem: it will never happens if dst_gc_task() was enqueued behind
>   linkwatch_event() in the "events" workqueue as the queue is now 
>   blocked.
...

If it's really like this, I wonder if this can happen without linkwatch
too in a non-preemptive config? So maybe this should be fixed somewhere
else? According to a comment above netdev_wait_allrefs() it seems
references should be rather put down on an UNREGISTER event, so this
dst_gc_task() scheduling shouldn't bother us, I guess.

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30  6:32   ` Jarek Poplawski
@ 2008-09-30 11:52     ` Benjamin Thery
  2008-09-30 13:58       ` David Miller
  2008-09-30 14:42       ` Jarek Poplawski
  0 siblings, 2 replies; 33+ messages in thread
From: Benjamin Thery @ 2008-09-30 11:52 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David S. Miller, netdev, Daniel Lezcano

Jarek Poplawski wrote:
> On 29-09-2008 19:54, Benjamin Thery wrote:
>> This patch proposes to replace the rtnl_unlock() call in 
>> linkwatch_event() by __rtnl_unlock(). The difference between the two
>> routines being that __rtnl_unlock() will not call netdev_run_todo()
>> after it unlocks rtnl_mutex.
>>
>> This is to fix a "deadlock" we observed when unregistering a net device.
>>
>> In some circumstances, linkwatch_event() blocks the whole "events" 
>> workqueue while blocking in rtnl_unlock().
>>
>> Here is what happens:
>>
>> 1. Unregister a device, the following routines are called:
>>
>> -> unregister_netdev
>>   -> rtnl_lock
>>   -> unregister_netdevice
>>   -> rtnl_unlock
>>     -> netdev_run_todo
>>       -> netdev_wait_allrefs
>>
>> 2. In netdev_wait_allrefs(), the device's refcount is greater than 0
>>   because there are still some routes to be garbage collected later.
>>
>> 3. Also, some link watch events are pending. netdev_wait_allrefs()
>>   will run the linkwatch event queue, calls linkwatch_run_queue().
>>
>>
>> Both the route garbage collector dst_gc_task() and the linkwatch task
>> linkwatch_event() are queued in the same generic workqueue: "events".
>>
>>
>> 4. linkwatch_event() is enqueued earlier in the queue. It will grab
>>   rtnl_lock(), deliver the link watch events pending, and then call
>>   rtnl_unlock(). 
>>   rtnl_unlock() will then call netdev_run_todo() and block on
>>   mutex_lock(&net_todo_run_mutex).
>>
>>   At this point, the workqueue "events" is _blocked_ until the
>>   netdev_wait_allrefs() call above returns when the device refcount 
>>   reaches 0.
>>
>>   Problem: it will never happens if dst_gc_task() was enqueued behind
>>   linkwatch_event() in the "events" workqueue as the queue is now 
>>   blocked.
> ...
> 
> If it's really like this, I wonder if this can happen without linkwatch
> too in a non-preemptive config?

Um, not sure I fully understand what you mean... do you mean with
CONFIG_PREEMPT_NONE=y?

> So maybe this should be fixed somewhere
> else? According to a comment above netdev_wait_allrefs() it seems
> references should be rather put down on an UNREGISTER event, so this
> dst_gc_task() scheduling shouldn't bother us, I guess.

I saw this comment too. In our case, the UNREGISTER event is sent,
notifications are dispatched correctly, some routes are deleted
(dst_free()) but not destroyed (dst_destroy()) and the garbage collector
as to run to finish the work.

dst_entry's may hold a refcount on device until dst_destroy() is run on
them. Unfortunately dst_gc_task() won't have a chance to run
dst_destroy() on them later in this case because it is stuck in the
"events" workqueue behind linkwatch_event() who is blocking everyone 
else in the queue.

I'm still looking at why the first dst_free() on those particular routes 
doesn't call dst_destroy() immediately but defers it (another refcount
on the route itself).

BTW, in the past (last year) dst_gc_task() was run as a timer handler 
and this situation (deadlock with linkwatch_event()) couldn't occur.

Thanks,
Benjamin

> 
> Jarek P.
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 11:52     ` Benjamin Thery
@ 2008-09-30 13:58       ` David Miller
  2008-09-30 14:07         ` Benjamin Thery
  2008-09-30 14:42       ` Jarek Poplawski
  1 sibling, 1 reply; 33+ messages in thread
From: David Miller @ 2008-09-30 13:58 UTC (permalink / raw)
  To: benjamin.thery; +Cc: jarkao2, netdev, dlezcano, dada1

From: Benjamin Thery <benjamin.thery@bull.net>
Date: Tue, 30 Sep 2008 13:52:31 +0200

> BTW, in the past (last year) dst_gc_task() was run as a timer
> handler and this situation (deadlock with linkwatch_event())
> couldn't occur.

September 12th to be exact :-)  That's when Eric Dumazet's
change went in:

commit 86bba269d08f0c545ae76c90b56727f65d62d57f
Author: Eric Dumazet <dada1@cosmosbay.com>
Date:   Wed Sep 12 14:29:01 2007 +0200

    [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue


But this change went into 2.6.24, so I wonder why it's taken
so long to hit this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 13:58       ` David Miller
@ 2008-09-30 14:07         ` Benjamin Thery
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Thery @ 2008-09-30 14:07 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, netdev, dlezcano, dada1

David Miller wrote:
> From: Benjamin Thery <benjamin.thery@bull.net>
> Date: Tue, 30 Sep 2008 13:52:31 +0200
> 
>> BTW, in the past (last year) dst_gc_task() was run as a timer
>> handler and this situation (deadlock with linkwatch_event())
>> couldn't occur.
> 
> September 12th to be exact :-)  That's when Eric Dumazet's
> change went in:
> 
> commit 86bba269d08f0c545ae76c90b56727f65d62d57f
> Author: Eric Dumazet <dada1@cosmosbay.com>
> Date:   Wed Sep 12 14:29:01 2007 +0200
> 
>     [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue
> 
> 
> But this change went into 2.6.24, so I wonder why it's taken
> so long to hit this.

Maybe because we manage to reproduce it only with network namespaces so
far and there isn't much netns users at the moment (and netns is still
incomplete in mainline).

Benjamin

> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 11:52     ` Benjamin Thery
  2008-09-30 13:58       ` David Miller
@ 2008-09-30 14:42       ` Jarek Poplawski
  2008-09-30 14:57         ` Jarek Poplawski
  1 sibling, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-09-30 14:42 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano

Benjamin Thery wrote, On 09/30/2008 01:52 PM:

> Jarek Poplawski wrote:
>> On 29-09-2008 19:54, Benjamin Thery wrote:
...
>>>   Problem: it will never happens if dst_gc_task() was enqueued behind
>>>   linkwatch_event() in the "events" workqueue as the queue is now 
>>>   blocked.
>> ...
>>
>> If it's really like this, I wonder if this can happen without linkwatch
>> too in a non-preemptive config?
> 
> Um, not sure I fully understand what you mean... do you mean with
> CONFIG_PREEMPT_NONE=y?

Yes, but after rethinking I see this is irrelevant. Anyway, my main
concern is that it seems similar dependency might happen between other
than linkwatch work functions or processes waiting for each other.

>> So maybe this should be fixed somewhere
>> else? According to a comment above netdev_wait_allrefs() it seems
>> references should be rather put down on an UNREGISTER event, so this
>> dst_gc_task() scheduling shouldn't bother us, I guess.
> 
> I saw this comment too. In our case, the UNREGISTER event is sent,
> notifications are dispatched correctly, some routes are deleted
> (dst_free()) but not destroyed (dst_destroy()) and the garbage collector
> as to run to finish the work.
> 
> dst_entry's may hold a refcount on device until dst_destroy() is run on
> them. Unfortunately dst_gc_task() won't have a chance to run
> dst_destroy() on them later in this case because it is stuck in the
> "events" workqueue behind linkwatch_event() who is blocking everyone 
> else in the queue.
> 
> I'm still looking at why the first dst_free() on those particular routes 
> doesn't call dst_destroy() immediately but defers it (another refcount
> on the route itself).

Yes, finding/fixing this, if possible, in this place looks like the
most consistent with the way netdev_wait_allrefs() is handling this.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 14:42       ` Jarek Poplawski
@ 2008-09-30 14:57         ` Jarek Poplawski
  2008-09-30 15:18           ` Benjamin Thery
  0 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-09-30 14:57 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano

On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote:
> Benjamin Thery wrote, On 09/30/2008 01:52 PM:
...
> > I'm still looking at why the first dst_free() on those particular routes 
> > doesn't call dst_destroy() immediately but defers it (another refcount
> > on the route itself).
> 
> Yes, finding/fixing this, if possible, in this place looks like the
> most consistent with the way netdev_wait_allrefs() is handling this.

Actually, I wonder, why we can't simply run this dst_gc_task() from
dst_dev_event() (after cancelling the work) when needed.

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 14:57         ` Jarek Poplawski
@ 2008-09-30 15:18           ` Benjamin Thery
  2008-10-01  9:59             ` David Miller
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Thery @ 2008-09-30 15:18 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David S. Miller, netdev, Daniel Lezcano

Jarek Poplawski wrote:
> On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote:
>> Benjamin Thery wrote, On 09/30/2008 01:52 PM:
> ...
>>> I'm still looking at why the first dst_free() on those particular routes 
>>> doesn't call dst_destroy() immediately but defers it (another refcount
>>> on the route itself).
>> Yes, finding/fixing this, if possible, in this place looks like the
>> most consistent with the way netdev_wait_allrefs() is handling this.
> 
> Actually, I wonder, why we can't simply run this dst_gc_task() from
> dst_dev_event() (after cancelling the work) when needed.
> 

Um... I haven't thought about this. I'll have a look to see if it can
solve our issue.


Benjamin

> Jarek P.
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-30 15:18           ` Benjamin Thery
@ 2008-10-01  9:59             ` David Miller
  2008-10-01 10:10               ` Daniel Lezcano
  0 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2008-10-01  9:59 UTC (permalink / raw)
  To: benjamin.thery; +Cc: jarkao2, netdev, dlezcano

From: Benjamin Thery <benjamin.thery@bull.net>
Date: Tue, 30 Sep 2008 17:18:25 +0200

> Jarek Poplawski wrote:
> > On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote:
> >> Benjamin Thery wrote, On 09/30/2008 01:52 PM:
> > ...
> >>> I'm still looking at why the first dst_free() on those particular routes doesn't call dst_destroy() immediately but defers it (another refcount
> >>> on the route itself).
> >> Yes, finding/fixing this, if possible, in this place looks like the
> >> most consistent with the way netdev_wait_allrefs() is handling this.
> > Actually, I wonder, why we can't simply run this dst_gc_task() from
> > dst_dev_event() (after cancelling the work) when needed.
> > 
> 
> Um... I haven't thought about this. I'll have a look to see if it can
> solve our issue.

Let me know what happens, I'd like to apply some fix soon.
So just report the patch implementing the final approach you
feel the most comfortable with.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-01  9:59             ` David Miller
@ 2008-10-01 10:10               ` Daniel Lezcano
  2008-10-01 10:12                 ` David Miller
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2008-10-01 10:10 UTC (permalink / raw)
  To: David Miller; +Cc: benjamin.thery, jarkao2, netdev

David Miller wrote:
> From: Benjamin Thery <benjamin.thery@bull.net>
> Date: Tue, 30 Sep 2008 17:18:25 +0200
> 
>> Jarek Poplawski wrote:
>>> On Tue, Sep 30, 2008 at 04:42:04PM +0200, Jarek Poplawski wrote:
>>>> Benjamin Thery wrote, On 09/30/2008 01:52 PM:
>>> ...
>>>>> I'm still looking at why the first dst_free() on those particular routes doesn't call dst_destroy() immediately but defers it (another refcount
>>>>> on the route itself).
>>>> Yes, finding/fixing this, if possible, in this place looks like the
>>>> most consistent with the way netdev_wait_allrefs() is handling this.
>>> Actually, I wonder, why we can't simply run this dst_gc_task() from
>>> dst_dev_event() (after cancelling the work) when needed.
>>>
>> Um... I haven't thought about this. I'll have a look to see if it can
>> solve our issue.
> 
> Let me know what happens, I'd like to apply some fix soon.
> So just report the patch implementing the final approach you
> feel the most comfortable with.

We did the modification suggested by Jarek and that fix the problem :)
We are playing a bit with this patch to check if we didn't missed 
something. We will certainly send it in the next hours.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-01 10:10               ` Daniel Lezcano
@ 2008-10-01 10:12                 ` David Miller
  2008-10-01 14:14                   ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery
  0 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2008-10-01 10:12 UTC (permalink / raw)
  To: dlezcano; +Cc: benjamin.thery, jarkao2, netdev

From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 01 Oct 2008 12:10:10 +0200

> We are playing a bit with this patch to check if we didn't missed
> something. We will certainly send it in the next hours.

Nice.

It is quite efficient for us to work together live like this when we
are in the same time zone, and even the same country :-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 10:12                 ` David Miller
@ 2008-10-01 14:14                   ` Benjamin Thery
  2008-10-01 19:48                     ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Thery @ 2008-10-01 14:14 UTC (permalink / raw)
  To: David S. Miller, Jarek Poplawski; +Cc: netdev, Daniel Lezcano, Benjamin Thery

This is the second version of a patch aimed at fixing a deadlock that
can occur when unregistering net devices. 

This new version of the patch ensures the garbage collector 
dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
it in the netdevice notifier dst_dev_event() when receiving a 
NETDEV_UNREGISTER event.

Thanks to Jarek Poplawski for proposing this fix.

(The previous version proposed to replace in linkwatch_event()
the call to rntl_unlock() by __rtnl_lock())

Short description of the deadlock
- - - - - - - - - - - - - - - - -

When reaching netdev_wait_allrefs() some routes may still hold the 
device refcount. dst_gc_task() should 'garbage collect' them soon 
(and decrease the refcount). dst_gc_task() is scheduled with a delay in
the "events" workqueue.
But, as some linkwatch events are also pending, netdev_wait_allrefs()
schedules (without delay) a linkwatch_event() in the same workqueue.
linkwatch_event runs before dst_gc_task() and blocks in  netdev_run_todo
(called by rtnl_unlock()), blocking the whole "events" workqueue.

netdev_wait_allrefs() waiting for           dt_gc_task() waiting
dst_gc_task() to decrease the device -----> linkwatch_event() to return
refcount. It holds the mutex                and unblock the "events" 
netdev_run_todo_mutex                       workqueue
       ^                                        |
       |                                        |
       |                                        |
       |                                        |
       |        linkwatch_event() waiting for <-+
       +------- net_dev_run_todo (who called 
                netdev_wait_allrefs()) to release 
                netdev_run_todo_mutex

Long description:
- - - - - - - - -

In some circumstances, linkwatch_event() blocks the whole "events" 
workqueue while blocking in rtnl_unlock(), and prevent dst_gc_task() 
(enqueued in the same workqueue) to be run:

1. Unregister a device, the following routines are called:

-> unregister_netdev
  -> rtnl_lock
  -> unregister_netdevice
  -> rtnl_unlock
    -> netdev_run_todo
      -> netdev_wait_allrefs

2. In netdev_wait_allrefs(), the device's refcount is greater than 0
  because there are still some routes to be garbage collected later.

3. Also, some link watch events are pending. netdev_wait_allrefs()
  will run the linkwatch event queue, calls linkwatch_run_queue().

Both the route garbage collector dst_gc_task() and the linkwatch task
linkwatch_event() are queued in the same generic workqueue: "events".

4. linkwatch_event() is enqueued earlier in the queue. It will grab
  rtnl_lock(), deliver the link watch events pending, and then call
  rtnl_unlock(). 
  rtnl_unlock() will then call netdev_run_todo() and block on
  mutex_lock(&net_todo_run_mutex).

  At this point, the workqueue "events" is _blocked_ until the
  netdev_wait_allrefs() call above returns when the device refcount 
  reaches 0.

  Problem: it will never happens if dst_gc_task() was enqueued behind
  linkwatch_event() in the "events" workqueue as the queue is now 
  blocked.

Thread foo                            Thread "events"
----------                            ---------------

free IPv6 routes                    |
-> schedule_delayed_work(dst_gc_task)
...                                 |
unregister_netdev                   |
 -> rtln_unlock                     | 
   -> netdev_run_todo               | ...
     -> grab net_todo_run_mutex     |
     -> netdev_wait_allrefs (loop)  | -> run__workqueue()
        ->  linkwatch_run_queue()   | ....
            enqueue linkwatch_event |
            in "events"             |
                                    | -> run_workqueue()
        "BLOCKED"                   |   -> linkwatch_event()
                                    |     -> rtnl_lock()
                                    |     -> __linkwatch_run_queue 
                                    |     ->rtnl_unlock()
                                    |       -> netdev_run_todo
                                    |          -> try to grab net_todo_run_mutex
                                               BLOCKED

Deadlock: 

# BLOCKED waiting for dst_gc_task()  # BLOCKED waiting for netdev_wait_allrefs
to decrement the device refcnt         to release net_todo_run_mutex

                  # dst_gst_task() waiting to be run by "events"

This deadlock can be observed sometimes when the loopback is 
unregistered when a network namespaces exits.

Benjamin

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/core/dst.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: net-next-2.6/net/core/dst.c
===================================================================
--- net-next-2.6.orig/net/core/dst.c
+++ net-next-2.6/net/core/dst.c
@@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
 			dst_ifdown(dst, dev, event != NETDEV_DOWN);
 		}
 		mutex_unlock(&dst_gc_mutex);
+
+		if (event == NETDEV_UNREGISTER &&
+		    cancel_delayed_work(&dst_gc_work))
+			dst_gc_task(&dst_gc_work.work);
 		break;
 	}
 	return NOTIFY_DONE;

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 14:14                   ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery
@ 2008-10-01 19:48                     ` Jarek Poplawski
  2008-10-01 21:06                       ` Daniel Lezcano
  0 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-01 19:48 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano

On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
> This is the second version of a patch aimed at fixing a deadlock that
> can occur when unregistering net devices. 
> 
> This new version of the patch ensures the garbage collector 
> dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
> it in the netdevice notifier dst_dev_event() when receiving a 
> NETDEV_UNREGISTER event.
> 
> Thanks to Jarek Poplawski for proposing this fix.

Not at all! Actually, I'm sorry for this mess... (Read below.)

> (The previous version proposed to replace in linkwatch_event()
> the call to rntl_unlock() by __rtnl_lock())
...
> --- net-next-2.6.orig/net/core/dst.c
> +++ net-next-2.6/net/core/dst.c
> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>  		}
>  		mutex_unlock(&dst_gc_mutex);
> +
> +		if (event == NETDEV_UNREGISTER &&
> +		    cancel_delayed_work(&dst_gc_work))
> +			dst_gc_task(&dst_gc_work.work);

Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
kill this while on timer, but not when queued and maybe blocked already.
Probably cancel_delayed_work_sync() should be considered instead, but
since this needs more checking, and David is waiting for this, I think
it's safer to use this previous (__rtnl_unlock) patch for now
(especially for -stable).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 19:48                     ` Jarek Poplawski
@ 2008-10-01 21:06                       ` Daniel Lezcano
  2008-10-01 21:52                         ` Jarek Poplawski
  2008-10-01 23:31                         ` Jarek Poplawski
  0 siblings, 2 replies; 33+ messages in thread
From: Daniel Lezcano @ 2008-10-01 21:06 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Benjamin Thery, David S. Miller, netdev

Jarek Poplawski wrote:
> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>> This is the second version of a patch aimed at fixing a deadlock that
>> can occur when unregistering net devices. 
>>
>> This new version of the patch ensures the garbage collector 
>> dst_gc_task() is run when waiting in netdev_wait_allrefs(), by calling
>> it in the netdevice notifier dst_dev_event() when receiving a 
>> NETDEV_UNREGISTER event.
>>
>> Thanks to Jarek Poplawski for proposing this fix.
> 
> Not at all! Actually, I'm sorry for this mess... (Read below.)
> 
>> (The previous version proposed to replace in linkwatch_event()
>> the call to rntl_unlock() by __rtnl_lock())
> ...
>> --- net-next-2.6.orig/net/core/dst.c
>> +++ net-next-2.6/net/core/dst.c
>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>  		}
>>  		mutex_unlock(&dst_gc_mutex);
>> +
>> +		if (event == NETDEV_UNREGISTER &&
>> +		    cancel_delayed_work(&dst_gc_work))
>> +			dst_gc_task(&dst_gc_work.work);
> 
> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
> kill this while on timer, but not when queued and maybe blocked already.

Perhaps, I am misunderstanding but, dst_gc_work is always called with 
schedule_delayed_work. If the task is not running, we can cancel it and 
if the task is running, no need to trigger the gc, no ?

> Probably cancel_delayed_work_sync() should be considered instead, but
> since this needs more checking, and David is waiting for this, I think
> it's safer to use this previous (__rtnl_unlock) patch for now
> (especially for -stable).




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 21:06                       ` Daniel Lezcano
@ 2008-10-01 21:52                         ` Jarek Poplawski
  2008-10-01 23:31                         ` Jarek Poplawski
  1 sibling, 0 replies; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-01 21:52 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Benjamin Thery, David S. Miller, netdev

On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
...
>>> --- net-next-2.6.orig/net/core/dst.c
>>> +++ net-next-2.6/net/core/dst.c
>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>  		}
>>>  		mutex_unlock(&dst_gc_mutex);
>>> +
>>> +		if (event == NETDEV_UNREGISTER &&
>>> +		    cancel_delayed_work(&dst_gc_work))
>>> +			dst_gc_task(&dst_gc_work.work);
>>
>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>> kill this while on timer, but not when queued and maybe blocked already.
>
> Perhaps, I am misunderstanding but, dst_gc_work is always called with  
> schedule_delayed_work. If the task is not running, we can cancel it and  
> if the task is running, no need to trigger the gc, no ?

Right, but cancel_delayed_work() can't cancel it reliably even if the
task is not running (but e.g. queued already just after us - linkwatch).
Of course, this should be OK most of the time, especially with such long
schedule delays, but, on the other hand, not as sure as the first patch
with __rtnl_unlock. So doing this right could take more time.

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 21:06                       ` Daniel Lezcano
  2008-10-01 21:52                         ` Jarek Poplawski
@ 2008-10-01 23:31                         ` Jarek Poplawski
  2008-10-02 15:23                           ` Benjamin Thery
  1 sibling, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-01 23:31 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Benjamin Thery, David S. Miller, netdev

On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
...
>>> --- net-next-2.6.orig/net/core/dst.c
>>> +++ net-next-2.6/net/core/dst.c
>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>  		}
>>>  		mutex_unlock(&dst_gc_mutex);
>>> +
>>> +		if (event == NETDEV_UNREGISTER &&
>>> +		    cancel_delayed_work(&dst_gc_work))
>>> +			dst_gc_task(&dst_gc_work.work);
>>
>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>> kill this while on timer, but not when queued and maybe blocked already.

Hmm#2... Then maybe something like this?:

		if (event == NETDEV_UNREGISTER &&
	    	    (cancel_delayed_work(&dst_gc_work) ||
		     delayed_work_pending(&dst_gc_work)))
			dst_gc_task(&dst_gc_work.work);


Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-01 23:31                         ` Jarek Poplawski
@ 2008-10-02 15:23                           ` Benjamin Thery
  2008-10-02 18:38                             ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Thery @ 2008-10-02 15:23 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Daniel Lezcano, David S. Miller, netdev

Jarek Poplawski wrote:
> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>> Jarek Poplawski wrote:
>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
> ...
>>>> --- net-next-2.6.orig/net/core/dst.c
>>>> +++ net-next-2.6/net/core/dst.c
>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>  		}
>>>>  		mutex_unlock(&dst_gc_mutex);
>>>> +
>>>> +		if (event == NETDEV_UNREGISTER &&
>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>> +			dst_gc_task(&dst_gc_work.work);
>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>> kill this while on timer, but not when queued and maybe blocked already.

You're right.

> 
> Hmm#2... Then maybe something like this?:
> 
> 		if (event == NETDEV_UNREGISTER &&
> 	    	    (cancel_delayed_work(&dst_gc_work) ||
> 		     delayed_work_pending(&dst_gc_work)))
> 			dst_gc_task(&dst_gc_work.work);

Hmmm... I'm not sure I understand what this change do?

OK, I see this ensure we will run dst_gc_task() even if
cancel_delayed_work() failed and the work is still pending (ie. the 
timer has expired and dst_gc_work is already in the queue).

But what if the work was not pending at all at beginning?
We still need to run dst_gc_task().

Is something like this better?
(code expanded to be more readable)

	if (event == NETDEV_UNREGISTER) {
	    if (!delayed_work_pending(&dst_gc_work))
	        /* work is not scheduled (no timer, not in queue) */
		dst_gc_task(&dst_gc_work.work);
	    else if (cancel_delayed_work(&dst_gc_work) ||
  	             delayed_work_pending(&dst_gc_work)))
                 /* work was scheduled (and may be blocked) */
  	        dst_gc_task(&dst_gc_work.work);
	    else
                 /* dst_gc_task() is running, do nothing */
	}

Benjamin



> 
> 
> Jarek P.
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-02 15:23                           ` Benjamin Thery
@ 2008-10-02 18:38                             ` Jarek Poplawski
  2008-10-02 19:55                               ` Benjamin Thery 
  0 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-02 18:38 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev

On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote:
> Jarek Poplawski wrote:
>> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>>> Jarek Poplawski wrote:
>>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>> ...
>>>>> --- net-next-2.6.orig/net/core/dst.c
>>>>> +++ net-next-2.6/net/core/dst.c
>>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>>  		}
>>>>>  		mutex_unlock(&dst_gc_mutex);
>>>>> +
>>>>> +		if (event == NETDEV_UNREGISTER &&
>>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>>> +			dst_gc_task(&dst_gc_work.work);
>>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>>> kill this while on timer, but not when queued and maybe blocked already.
>
> You're right.
>
>>
>> Hmm#2... Then maybe something like this?:
>>
>> 		if (event == NETDEV_UNREGISTER &&
>> 	    	    (cancel_delayed_work(&dst_gc_work) ||
>> 		     delayed_work_pending(&dst_gc_work)))
>> 			dst_gc_task(&dst_gc_work.work);
>
> Hmmm... I'm not sure I understand what this change do?
>
> OK, I see this ensure we will run dst_gc_task() even if
> cancel_delayed_work() failed and the work is still pending (ie. the  
> timer has expired and dst_gc_work is already in the queue).

I think, this covers exactly the case of blocking you described, plus
more... (the work is queued but not blocked). 

>
> But what if the work was not pending at all at beginning?
> We still need to run dst_gc_task().

Maybe I miss something, but if this is really needed here then it
looks like we are fixing more than "the blocking" bug BTW.

>
> Is something like this better?
> (code expanded to be more readable)
>
> 	if (event == NETDEV_UNREGISTER) {
> 	    if (!delayed_work_pending(&dst_gc_work))
> 	        /* work is not scheduled (no timer, not in queue) */

		/* may be running too */

> 		dst_gc_task(&dst_gc_work.work);
> 	    else if (cancel_delayed_work(&dst_gc_work) ||
>  	             delayed_work_pending(&dst_gc_work)))
>                 /* work was scheduled (and may be blocked) */

		/* actually could be both running and pending here:
		 * if it's after rearming
		 */

>  	        dst_gc_task(&dst_gc_work.work);
> 	    else
>                 /* dst_gc_task() is running, do nothing */

So again !delayed_work_pending() - there could be the change of state
while checking - but then looks a bit inconsistent. I think this should
be OK too.

As a matter of fact I've thought about something even simpler, which
probably should help for all above concerns too:

	if (event == NETDEV_UNREGISTER)
  	        dst_gc_task(&dst_gc_work.work);

dst_gc_task() locking allows for this, and running this two times in
a row could be even faster than trying to cancel the unnecessary run.

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-02 18:38                             ` Jarek Poplawski
@ 2008-10-02 19:55                               ` Benjamin Thery 
  2008-10-02 20:34                                 ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Thery  @ 2008-10-02 19:55 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Daniel Lezcano, David S. Miller, netdev

Quoting Jarek Poplawski <jarkao2@gmail.com>:

> On Thu, Oct 02, 2008 at 05:23:30PM +0200, Benjamin Thery wrote:
>> Jarek Poplawski wrote:
>>> On Wed, Oct 01, 2008 at 11:06:22PM +0200, Daniel Lezcano wrote:
>>>> Jarek Poplawski wrote:
>>>>> On Wed, Oct 01, 2008 at 04:14:35PM +0200, Benjamin Thery wrote:
>>> ...
>>>>>> --- net-next-2.6.orig/net/core/dst.c
>>>>>> +++ net-next-2.6/net/core/dst.c
>>>>>> @@ -328,6 +328,10 @@ static int dst_dev_event(struct notifier
>>>>>>  			dst_ifdown(dst, dev, event != NETDEV_DOWN);
>>>>>>  		}
>>>>>>  		mutex_unlock(&dst_gc_mutex);
>>>>>> +
>>>>>> +		if (event == NETDEV_UNREGISTER &&
>>>>>> +		    cancel_delayed_work(&dst_gc_work))
>>>>>> +			dst_gc_task(&dst_gc_work.work);
>>>>> Hmm... It seems this shouldn't work yet: cancel_delayed_work() can only
>>>>> kill this while on timer, but not when queued and maybe blocked already.
>>
>> You're right.
>>
>>>
>>> Hmm#2... Then maybe something like this?:
>>>
>>> 		if (event == NETDEV_UNREGISTER &&
>>> 	    	    (cancel_delayed_work(&dst_gc_work) ||
>>> 		     delayed_work_pending(&dst_gc_work)))
>>> 			dst_gc_task(&dst_gc_work.work);
>>
>> Hmmm... I'm not sure I understand what this change do?
>>
>> OK, I see this ensure we will run dst_gc_task() even if
>> cancel_delayed_work() failed and the work is still pending (ie. the
>> timer has expired and dst_gc_work is already in the queue).
>
> I think, this covers exactly the case of blocking you described, plus
> more... (the work is queued but not blocked).
>
>>
>> But what if the work was not pending at all at beginning?
>> We still need to run dst_gc_task().
>
> Maybe I miss something, but if this is really needed here then it
> looks like we are fixing more than "the blocking" bug BTW.
>
>>
>> Is something like this better?
>> (code expanded to be more readable)
>>
>> 	if (event == NETDEV_UNREGISTER) {
>> 	    if (!delayed_work_pending(&dst_gc_work))
>> 	        /* work is not scheduled (no timer, not in queue) */
>
> 		/* may be running too */
>
>> 		dst_gc_task(&dst_gc_work.work);
>> 	    else if (cancel_delayed_work(&dst_gc_work) ||
>>  	             delayed_work_pending(&dst_gc_work)))
>>                 /* work was scheduled (and may be blocked) */
>
> 		/* actually could be both running and pending here:
> 		 * if it's after rearming
> 		 */
>
>>  	        dst_gc_task(&dst_gc_work.work);
>> 	    else
>>                 /* dst_gc_task() is running, do nothing */
>
> So again !delayed_work_pending() - there could be the change of state
> while checking - but then looks a bit inconsistent. I think this should
> be OK too.
>
> As a matter of fact I've thought about something even simpler, which
> probably should help for all above concerns too:
>
> 	if (event == NETDEV_UNREGISTER)
>   	        dst_gc_task(&dst_gc_work.work);
>
> dst_gc_task() locking allows for this, and running this two times in
> a row could be even faster than trying to cancel the unnecessary run.

I've thought a bit more about my last proposal and come to the same
conclusion as you, hmm, almost. I thought we could call
cancel_delayed_work() unconditionally and then dst_gc_task().

  	if (event == NETDEV_UNREGISTER) {
                 cancel_delayed_work(&dst_gc_work);
    	        dst_gc_task(&dst_gc_work.work);
         }

But, you're right, calling only dst_gc_task() seems fine to me.

I'll run some tests tomorrow and send a new patch.

Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
is better/more appropriate than the first one (replacing rtnl_unlock()
by the non-blocking __rtnl_unlock() in linkwatch_event())?

Thanks,
Benjamin

>
> Jarek P.
>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-02 19:55                               ` Benjamin Thery 
@ 2008-10-02 20:34                                 ` Jarek Poplawski
  2008-10-04  7:42                                   ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-02 20:34 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev

On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery  wrote:
> Quoting Jarek Poplawski <jarkao2@gmail.com>:
...
>> As a matter of fact I've thought about something even simpler, which
>> probably should help for all above concerns too:
>>
>> 	if (event == NETDEV_UNREGISTER)
>>   	        dst_gc_task(&dst_gc_work.work);
>>
>> dst_gc_task() locking allows for this, and running this two times in
>> a row could be even faster than trying to cancel the unnecessary run.
>
> I've thought a bit more about my last proposal and come to the same
> conclusion as you, hmm, almost. I thought we could call
> cancel_delayed_work() unconditionally and then dst_gc_task().
>
>  	if (event == NETDEV_UNREGISTER) {
>                 cancel_delayed_work(&dst_gc_work);
>    	        dst_gc_task(&dst_gc_work.work);
>         }
>
> But, you're right, calling only dst_gc_task() seems fine to me.

I'm OK with any of these versions.

>
> I'll run some tests tomorrow and send a new patch.
>
> Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
> is better/more appropriate than the first one (replacing rtnl_unlock()
> by the non-blocking __rtnl_unlock() in linkwatch_event())?

I agree it's more appropriate, because:
a) the job is done according to the rules (in the comment) during
   the notification and not in some future,
b) it removes some hidden dependencies between processes/works, which,
   even if they currently don't exist except this one in the linkwatch,
   could be extremly hard to diagnose, if added accidentally in the
   future.

On the other hand, until we are not sure of the reasons, why something
like this (the full destroying) was avoided in the past, and until it's
heavily tested (with lockdep), your first patch looks to me much safer
if applying to any -stable is considered.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery
  2008-09-30  6:32   ` Jarek Poplawski
@ 2008-10-03  0:41   ` Eric W. Biederman
  2008-10-05  4:26   ` Herbert Xu
  2 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2008-10-03  0:41 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: David S. Miller, netdev, Daniel Lezcano

Benjamin Thery <benjamin.thery@bull.net> writes:

> This patch proposes to replace the rtnl_unlock() call in 
> linkwatch_event() by __rtnl_unlock(). The difference between the two
> routines being that __rtnl_unlock() will not call netdev_run_todo()
> after it unlocks rtnl_mutex.
>
> This is to fix a "deadlock" we observed when unregistering a net device.

Thanks for tracking this I just started seeing this a few days ago, and
before I reach the point of tracking it down you are in the process
of fixing it ;)

Eric

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-02 20:34                                 ` Jarek Poplawski
@ 2008-10-04  7:42                                   ` Jarek Poplawski
  2008-10-04  7:52                                     ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-04  7:42 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev

On Thu, Oct 02, 2008 at 10:34:35PM +0200, Jarek Poplawski wrote:
> On Thu, Oct 02, 2008 at 09:55:02PM +0200, Benjamin Thery  wrote:
> > Quoting Jarek Poplawski <jarkao2@gmail.com>:
> ...
> >> As a matter of fact I've thought about something even simpler, which
> >> probably should help for all above concerns too:
> >>
> >> 	if (event == NETDEV_UNREGISTER)
> >>   	        dst_gc_task(&dst_gc_work.work);
> >>
> >> dst_gc_task() locking allows for this, and running this two times in
> >> a row could be even faster than trying to cancel the unnecessary run.
> >
> > I've thought a bit more about my last proposal and come to the same
> > conclusion as you, hmm, almost. I thought we could call
> > cancel_delayed_work() unconditionally and then dst_gc_task().
> >
> >  	if (event == NETDEV_UNREGISTER) {
> >                 cancel_delayed_work(&dst_gc_work);
> >    	        dst_gc_task(&dst_gc_work.work);
> >         }
> >
> > But, you're right, calling only dst_gc_task() seems fine to me.
> 
> I'm OK with any of these versions.
> 
> >
> > I'll run some tests tomorrow and send a new patch.
> >
> > Do we agree that this fix (calling dst_gc_task() in dst_dev_event())
> > is better/more appropriate than the first one (replacing rtnl_unlock()
> > by the non-blocking __rtnl_unlock() in linkwatch_event())?

Hmm... You can kill me, but after more looking at this I've changed my
mind.

> I agree it's more appropriate, because:
> a) the job is done according to the rules (in the comment) during
>    the notification and not in some future,

But, since the rules are wrong or protocols buggy, the solution isn't
effective: there should be no deadlock after this, but it's also hard
to predict how many dst_gc_task() runs are needed to free all refs.
During this all other works would be unnecessarily blocked by the
linkwatch, what looks a bit too hoggish.

My current idea is we should move linkwatch or dst_gc_task() (or both)
to it's own workqueue.

> b) it removes some hidden dependencies between processes/works, which,
>    even if they currently don't exist except this one in the linkwatch,
>    could be extremly hard to diagnose, if added accidentally in the
>    future.
> 
> On the other hand, until we are not sure of the reasons, why something

BTW, sorry, of course:
"On the other hand, until we are sure of the reasons, why something"

> like this (the full destroying) was avoided in the past, and until it's
> heavily tested (with lockdep), your first patch looks to me much safer
> if applying to any -stable is considered.

And this first patch looks better and better each time...

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration - V2
  2008-10-04  7:42                                   ` Jarek Poplawski
@ 2008-10-04  7:52                                     ` Jarek Poplawski
  0 siblings, 0 replies; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-04  7:52 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: Daniel Lezcano, David S. Miller, netdev

On Sat, Oct 04, 2008 at 09:42:48AM +0200, Jarek Poplawski wrote:
...
> During this all other works would be unnecessarily blocked by the
> linkwatch, what looks a bit too hoggish.

Or even more: can't some of these blockeds task hold our refs too?

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery
  2008-09-30  6:32   ` Jarek Poplawski
  2008-10-03  0:41   ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman
@ 2008-10-05  4:26   ` Herbert Xu
  2008-10-05  6:55     ` Jarek Poplawski
                       ` (2 more replies)
  2 siblings, 3 replies; 33+ messages in thread
From: Herbert Xu @ 2008-10-05  4:26 UTC (permalink / raw)
  To: Benjamin Thery; +Cc: davem, netdev, dlezcano, benjamin.thery

Benjamin Thery <benjamin.thery@bull.net> wrote:
>
> 1. Unregister a device, the following routines are called:
> 
> -> unregister_netdev
>  -> rtnl_lock
>  -> unregister_netdevice
>  -> rtnl_unlock
>    -> netdev_run_todo
>      -> netdev_wait_allrefs

OK, this explains lots of dead-locks that people have been seeing.

But I think we can go a step further:

net: Fix netdev_run_todo dead-lock

Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly.  We were trying to create
parallelism where none was required.  As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/dev.c b/net/core/dev.c
index e8eb2b4..021f531 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net)
 }
 
 /* Delayed registration/unregisteration */
-static DEFINE_SPINLOCK(net_todo_list_lock);
 static LIST_HEAD(net_todo_list);
 
 static void net_set_todo(struct net_device *dev)
 {
-	spin_lock(&net_todo_list_lock);
 	list_add_tail(&dev->todo_list, &net_todo_list);
-	spin_unlock(&net_todo_list_lock);
 }
 
 static void rollback_registered(struct net_device *dev)
@@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev)
  *	free_netdev(y1);
  *	free_netdev(y2);
  *
- * We are invoked by rtnl_unlock() after it drops the semaphore.
+ * We are invoked by rtnl_unlock().
  * This allows us to deal with problems:
  * 1) We can delete sysfs objects which invoke hotplug
  *    without deadlocking with linkwatch via keventd.
  * 2) Since we run with the RTNL semaphore not held, we can sleep
  *    safely in order to wait for the netdev refcnt to drop to zero.
+ *
+ * We must not return until all unregister events added during
+ * the interval the lock was held have been completed.
  */
-static DEFINE_MUTEX(net_todo_run_mutex);
 void netdev_run_todo(void)
 {
 	struct list_head list;
 
-	/* Need to guard against multiple cpu's getting out of order. */
-	mutex_lock(&net_todo_run_mutex);
-
-	/* Not safe to do outside the semaphore.  We must not return
-	 * until all unregister events invoked by the local processor
-	 * have been completed (either by this todo run, or one on
-	 * another cpu).
-	 */
-	if (list_empty(&net_todo_list))
-		goto out;
-
 	/* Snapshot list, allow later requests */
-	spin_lock(&net_todo_list_lock);
 	list_replace_init(&net_todo_list, &list);
-	spin_unlock(&net_todo_list_lock);
+
+	__rtnl_unlock();
 
 	while (!list_empty(&list)) {
 		struct net_device *dev
@@ -4200,9 +4188,6 @@ void netdev_run_todo(void)
 		/* Free network device */
 		kobject_put(&dev->dev.kobj);
 	}
-
-out:
-	mutex_unlock(&net_todo_run_mutex);
 }
 
 static struct net_device_stats *internal_stats(struct net_device *dev)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 71edb8b..d6381c2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -73,7 +73,7 @@ void __rtnl_unlock(void)
 
 void rtnl_unlock(void)
 {
-	mutex_unlock(&rtnl_mutex);
+	/* This fellow will unlock it for us. */
 	netdev_run_todo();
 }
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  4:26   ` Herbert Xu
@ 2008-10-05  6:55     ` Jarek Poplawski
  2008-10-05  6:56       ` Herbert Xu
  2008-10-06 15:19     ` Benjamin Thery
  2008-10-07 22:50     ` David Miller
  2 siblings, 1 reply; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-05  6:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Benjamin Thery, davem, netdev, dlezcano

Herbert Xu wrote, On 10/05/2008 06:26 AM:
...
> The problem is really quite silly.  We were trying to create
> parallelism where none was required.  As netdev_run_todo always
> follows a RTNL section, and that todo tasks can only be added
> with the RTNL held, by definition you should only need to wait
> for the very ones that you've added and be done with it.
> 
> There is no need for a second mutex or spinlock.

Is it for sure? (Read below.)

> 
> This is exactly what the following patch does.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e8eb2b4..021f531 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net)
>  }
>  
>  /* Delayed registration/unregisteration */
> -static DEFINE_SPINLOCK(net_todo_list_lock);
>  static LIST_HEAD(net_todo_list);
>  
>  static void net_set_todo(struct net_device *dev)
>  {
> -	spin_lock(&net_todo_list_lock);
>  	list_add_tail(&dev->todo_list, &net_todo_list);
> -	spin_unlock(&net_todo_list_lock);
>  }
>  
>  static void rollback_registered(struct net_device *dev)
> @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev)
>   *	free_netdev(y1);
>   *	free_netdev(y2);
>   *
> - * We are invoked by rtnl_unlock() after it drops the semaphore.
> + * We are invoked by rtnl_unlock().
>   * This allows us to deal with problems:
>   * 1) We can delete sysfs objects which invoke hotplug
>   *    without deadlocking with linkwatch via keventd.
>   * 2) Since we run with the RTNL semaphore not held, we can sleep
>   *    safely in order to wait for the netdev refcnt to drop to zero.
> + *
> + * We must not return until all unregister events added during
> + * the interval the lock was held have been completed.
>   */
> -static DEFINE_MUTEX(net_todo_run_mutex);
>  void netdev_run_todo(void)
>  {
>  	struct list_head list;
>  
> -	/* Need to guard against multiple cpu's getting out of order. */
> -	mutex_lock(&net_todo_run_mutex);
> -
> -	/* Not safe to do outside the semaphore.  We must not return
> -	 * until all unregister events invoked by the local processor
> -	 * have been completed (either by this todo run, or one on
> -	 * another cpu).
> -	 */

I think, it's about not to let others run this for devices unregistered
within later rtnl_locks before completing previous tasks. So, it would
be nice to have some comment why it's not necessary anymore.

Cheers,
Jarek P.

> -	if (list_empty(&net_todo_list))
> -		goto out;
> -
>  	/* Snapshot list, allow later requests */
> -	spin_lock(&net_todo_list_lock);
>  	list_replace_init(&net_todo_list, &list);
> -	spin_unlock(&net_todo_list_lock);
> +
> +	__rtnl_unlock();
>  
>  	while (!list_empty(&list)) {
>  		struct net_device *dev
> @@ -4200,9 +4188,6 @@ void netdev_run_todo(void)
>  		/* Free network device */
>  		kobject_put(&dev->dev.kobj);
>  	}
> -
> -out:
> -	mutex_unlock(&net_todo_run_mutex);
>  }
>  
>  static struct net_device_stats *internal_stats(struct net_device *dev)
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 71edb8b..d6381c2 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -73,7 +73,7 @@ void __rtnl_unlock(void)
>  
>  void rtnl_unlock(void)
>  {
> -	mutex_unlock(&rtnl_mutex);
> +	/* This fellow will unlock it for us. */
>  	netdev_run_todo();
>  }
>  
> Cheers,






^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  6:55     ` Jarek Poplawski
@ 2008-10-05  6:56       ` Herbert Xu
  2008-10-05  7:12         ` Jarek Poplawski
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2008-10-05  6:56 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Benjamin Thery, davem, netdev, dlezcano

On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote:
>
> > -	/* Not safe to do outside the semaphore.  We must not return
> > -	 * until all unregister events invoked by the local processor
> > -	 * have been completed (either by this todo run, or one on
> > -	 * another cpu).
> > -	 */
> 
> I think, it's about not to let others run this for devices unregistered
> within later rtnl_locks before completing previous tasks. So, it would
> be nice to have some comment why it's not necessary anymore.

Where did you get that idea?

This was there because people did (and still do) stuff like:

unregister_netdev(dev);
free_netdev(dev);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  6:56       ` Herbert Xu
@ 2008-10-05  7:12         ` Jarek Poplawski
  2008-10-05  7:28           ` Stephen Hemminger
  2008-10-05  7:39           ` Herbert Xu
  0 siblings, 2 replies; 33+ messages in thread
From: Jarek Poplawski @ 2008-10-05  7:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Benjamin Thery, davem, netdev, dlezcano

On Sun, Oct 05, 2008 at 02:56:48PM +0800, Herbert Xu wrote:
> On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote:
> >
> > > -	/* Not safe to do outside the semaphore.  We must not return
> > > -	 * until all unregister events invoked by the local processor
> > > -	 * have been completed (either by this todo run, or one on
> > > -	 * another cpu).
> > > -	 */
> > 
> > I think, it's about not to let others run this for devices unregistered
> > within later rtnl_locks before completing previous tasks. So, it would
> > be nice to have some comment why it's not necessary anymore.
> 
> Where did you get that idea?

Just reading this code (plus the comment). Why would anybody bother 
with something so complex like this if something like your idea is
rather straightforward? But, needed or not, my point is it would be
nice to comment that this patch changes this behavior btw.

Jarek P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  7:12         ` Jarek Poplawski
@ 2008-10-05  7:28           ` Stephen Hemminger
  2008-10-05  7:38             ` Herbert Xu
  2008-10-05  7:39           ` Herbert Xu
  1 sibling, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2008-10-05  7:28 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Herbert Xu, Benjamin Thery, davem, netdev, dlezcano

On Sun, 5 Oct 2008 09:12:38 +0200
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Sun, Oct 05, 2008 at 02:56:48PM +0800, Herbert Xu wrote:
> > On Sun, Oct 05, 2008 at 08:55:10AM +0200, Jarek Poplawski wrote:
> > >
> > > > -	/* Not safe to do outside the semaphore.  We must not return
> > > > -	 * until all unregister events invoked by the local processor
> > > > -	 * have been completed (either by this todo run, or one on
> > > > -	 * another cpu).
> > > > -	 */
> > > 
> > > I think, it's about not to let others run this for devices unregistered
> > > within later rtnl_locks before completing previous tasks. So, it would
> > > be nice to have some comment why it's not necessary anymore.
> > 
> > Where did you get that idea?
> 
> Just reading this code (plus the comment). Why would anybody bother 
> with something so complex like this if something like your idea is
> rather straightforward? But, needed or not, my point is it would be
> nice to comment that this patch changes this behavior btw.

I think there were issues with unregister triggering hotplug udev
events, but that may have been long ago when rtnl_lock was
a semaphore not a mutex.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  7:28           ` Stephen Hemminger
@ 2008-10-05  7:38             ` Herbert Xu
  0 siblings, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2008-10-05  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, Benjamin Thery, davem, netdev, dlezcano

On Sun, Oct 05, 2008 at 09:28:23AM +0200, Stephen Hemminger wrote:

> I think there were issues with unregister triggering hotplug udev
> events, but that may have been long ago when rtnl_lock was
> a semaphore not a mutex.

Well as it is we have no hotplug events in netdev_run_todo at
all.  They're all in unregister_netdevice which is where they
should be and that runs under the RTNL.

In fact it would be a bug if we had anything in netdev_run_todo
tied to sysfs because once we get here we've relinquished our
ownership of the device name so someone else is free to readd
a device with the same name.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  7:12         ` Jarek Poplawski
  2008-10-05  7:28           ` Stephen Hemminger
@ 2008-10-05  7:39           ` Herbert Xu
  1 sibling, 0 replies; 33+ messages in thread
From: Herbert Xu @ 2008-10-05  7:39 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Benjamin Thery, davem, netdev, dlezcano

On Sun, Oct 05, 2008 at 09:12:38AM +0200, Jarek Poplawski wrote:
>
> Just reading this code (plus the comment). Why would anybody bother 
> with something so complex like this if something like your idea is
> rather straightforward? But, needed or not, my point is it would be
> nice to comment that this patch changes this behavior btw.

I'm sorry but I don't have time to consider such hypotheticals.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  4:26   ` Herbert Xu
  2008-10-05  6:55     ` Jarek Poplawski
@ 2008-10-06 15:19     ` Benjamin Thery
  2008-10-07 22:46       ` David Miller
  2008-10-07 22:50     ` David Miller
  2 siblings, 1 reply; 33+ messages in thread
From: Benjamin Thery @ 2008-10-06 15:19 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev, dlezcano

Herbert Xu wrote:
> Benjamin Thery <benjamin.thery@bull.net> wrote:
>> 1. Unregister a device, the following routines are called:
>>
>> -> unregister_netdev
>>  -> rtnl_lock
>>  -> unregister_netdevice
>>  -> rtnl_unlock
>>    -> netdev_run_todo
>>      -> netdev_wait_allrefs
> 
> OK, this explains lots of dead-locks that people have been seeing.
> 
> But I think we can go a step further:
> 
> net: Fix netdev_run_todo dead-lock
> 
> Benjamin Thery tracked down a bug that explains many instances
> of the error
> 
> unregister_netdevice: waiting for %s to become free. Usage count = %d
> 
> It turns out that netdev_run_todo can dead-lock with itself if
> a second instance of it is run in a thread that will then free
> a reference to the device waited on by the first instance.
> 
> The problem is really quite silly.  We were trying to create
> parallelism where none was required.  As netdev_run_todo always
> follows a RTNL section, and that todo tasks can only be added
> with the RTNL held, by definition you should only need to wait
> for the very ones that you've added and be done with it.
> 
> There is no need for a second mutex or spinlock.
> 
> This is exactly what the following patch does.

Herbert, thank you for having looked at the issue too.

When I understood how the dead lock happened, I considered playing
with the locks in net_set_todo()/netdev_run_todo(), but as some comments
in the code in this area sounds a bit too cryptic for my brain, I didn't
dare to change them myself. :)

I guess you know a lot better than me the history behind this piece of
code and why it was done this way.

I tested your patch on my testbed during all the afternoon.
It fixes the dead lock I can easily reproduce here with net namespaces
and I didn't produce any regressions on my setup.

Benjamin

> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e8eb2b4..021f531 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3808,14 +3808,11 @@ static int dev_new_index(struct net *net)
>  }
>  
>  /* Delayed registration/unregisteration */
> -static DEFINE_SPINLOCK(net_todo_list_lock);
>  static LIST_HEAD(net_todo_list);
>  
>  static void net_set_todo(struct net_device *dev)
>  {
> -	spin_lock(&net_todo_list_lock);
>  	list_add_tail(&dev->todo_list, &net_todo_list);
> -	spin_unlock(&net_todo_list_lock);
>  }
>  
>  static void rollback_registered(struct net_device *dev)
> @@ -4142,33 +4139,24 @@ static void netdev_wait_allrefs(struct net_device *dev)
>   *	free_netdev(y1);
>   *	free_netdev(y2);
>   *
> - * We are invoked by rtnl_unlock() after it drops the semaphore.
> + * We are invoked by rtnl_unlock().
>   * This allows us to deal with problems:
>   * 1) We can delete sysfs objects which invoke hotplug
>   *    without deadlocking with linkwatch via keventd.
>   * 2) Since we run with the RTNL semaphore not held, we can sleep
>   *    safely in order to wait for the netdev refcnt to drop to zero.
> + *
> + * We must not return until all unregister events added during
> + * the interval the lock was held have been completed.
>   */
> -static DEFINE_MUTEX(net_todo_run_mutex);
>  void netdev_run_todo(void)
>  {
>  	struct list_head list;
>  
> -	/* Need to guard against multiple cpu's getting out of order. */
> -	mutex_lock(&net_todo_run_mutex);
> -
> -	/* Not safe to do outside the semaphore.  We must not return
> -	 * until all unregister events invoked by the local processor
> -	 * have been completed (either by this todo run, or one on
> -	 * another cpu).
> -	 */
> -	if (list_empty(&net_todo_list))
> -		goto out;
> -
>  	/* Snapshot list, allow later requests */
> -	spin_lock(&net_todo_list_lock);
>  	list_replace_init(&net_todo_list, &list);
> -	spin_unlock(&net_todo_list_lock);
> +
> +	__rtnl_unlock();
>  
>  	while (!list_empty(&list)) {
>  		struct net_device *dev
> @@ -4200,9 +4188,6 @@ void netdev_run_todo(void)
>  		/* Free network device */
>  		kobject_put(&dev->dev.kobj);
>  	}
> -
> -out:
> -	mutex_unlock(&net_todo_run_mutex);
>  }
>  
>  static struct net_device_stats *internal_stats(struct net_device *dev)
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 71edb8b..d6381c2 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -73,7 +73,7 @@ void __rtnl_unlock(void)
>  
>  void rtnl_unlock(void)
>  {
> -	mutex_unlock(&rtnl_mutex);
> +	/* This fellow will unlock it for us. */
>  	netdev_run_todo();
>  }
>  
> Cheers,


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-06 15:19     ` Benjamin Thery
@ 2008-10-07 22:46       ` David Miller
  0 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2008-10-07 22:46 UTC (permalink / raw)
  To: benjamin.thery; +Cc: herbert, netdev, dlezcano

From: Benjamin Thery <benjamin.thery@bull.net>
Date: Mon, 06 Oct 2008 17:19:13 +0200

> I guess you know a lot better than me the history behind this piece of
> code and why it was done this way.

Or maybe Herbert doesn't know :-)

I wrote this crap, and I can't remember why I decided to drop
the RTNL semaphore so early and go directly to that todo lock.
Maybe it was purely because of how the interfaces are arranged
and I considered it ugly to drop the RTNL semaphore in the
todo list running code.   Who knows...

I'm going to scrutinize Herbert's patch some more, but this is
really a borderline change to put into 2.6.27 so late in the
RC cycle.

I want to fix this bug, but... it's risky.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: deadlock during net device unregistration
  2008-10-05  4:26   ` Herbert Xu
  2008-10-05  6:55     ` Jarek Poplawski
  2008-10-06 15:19     ` Benjamin Thery
@ 2008-10-07 22:50     ` David Miller
  2 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2008-10-07 22:50 UTC (permalink / raw)
  To: herbert; +Cc: benjamin.thery, netdev, dlezcano

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 05 Oct 2008 12:26:21 +0800

> net: Fix netdev_run_todo dead-lock
> 
> Benjamin Thery tracked down a bug that explains many instances
> of the error
> 
> unregister_netdevice: waiting for %s to become free. Usage count = %d
> 
> It turns out that netdev_run_todo can dead-lock with itself if
> a second instance of it is run in a thread that will then free
> a reference to the device waited on by the first instance.
> 
> The problem is really quite silly.  We were trying to create
> parallelism where none was required.  As netdev_run_todo always
> follows a RTNL section, and that todo tasks can only be added
> with the RTNL held, by definition you should only need to wait
> for the very ones that you've added and be done with it.
> 
> There is no need for a second mutex or spinlock.
> 
> This is exactly what the following patch does.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Ok, this looks safe.  I've applied this to net-2.6, thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2008-10-07 22:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080929175412.866679567@theryb.frec.bull.fr>
2008-09-29 17:54 ` [PATCH] net: deadlock during net device unregistration Benjamin Thery
2008-09-30  6:32   ` Jarek Poplawski
2008-09-30 11:52     ` Benjamin Thery
2008-09-30 13:58       ` David Miller
2008-09-30 14:07         ` Benjamin Thery
2008-09-30 14:42       ` Jarek Poplawski
2008-09-30 14:57         ` Jarek Poplawski
2008-09-30 15:18           ` Benjamin Thery
2008-10-01  9:59             ` David Miller
2008-10-01 10:10               ` Daniel Lezcano
2008-10-01 10:12                 ` David Miller
2008-10-01 14:14                   ` [PATCH] net: deadlock during net device unregistration - V2 Benjamin Thery
2008-10-01 19:48                     ` Jarek Poplawski
2008-10-01 21:06                       ` Daniel Lezcano
2008-10-01 21:52                         ` Jarek Poplawski
2008-10-01 23:31                         ` Jarek Poplawski
2008-10-02 15:23                           ` Benjamin Thery
2008-10-02 18:38                             ` Jarek Poplawski
2008-10-02 19:55                               ` Benjamin Thery 
2008-10-02 20:34                                 ` Jarek Poplawski
2008-10-04  7:42                                   ` Jarek Poplawski
2008-10-04  7:52                                     ` Jarek Poplawski
2008-10-03  0:41   ` [PATCH] net: deadlock during net device unregistration Eric W. Biederman
2008-10-05  4:26   ` Herbert Xu
2008-10-05  6:55     ` Jarek Poplawski
2008-10-05  6:56       ` Herbert Xu
2008-10-05  7:12         ` Jarek Poplawski
2008-10-05  7:28           ` Stephen Hemminger
2008-10-05  7:38             ` Herbert Xu
2008-10-05  7:39           ` Herbert Xu
2008-10-06 15:19     ` Benjamin Thery
2008-10-07 22:46       ` David Miller
2008-10-07 22:50     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).