2.6.25rc7 lockdep trace

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 2.6.25rc7 lockdep trace
@ 2008-03-28  0:00 Dave Jones
  2008-03-28  1:55 ` Dave Jones
  2008-03-29  0:34 ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Dave Jones @ 2008-03-28  0:00 UTC (permalink / raw)
  To: netdev

I see this every time I shut down.

	Dave


 
 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.25-0.161.rc7.fc9.i686 #1
 -------------------------------------------------------
 NetworkManager/2308 is trying to acquire lock:
  (events){--..}, at: [flush_workqueue+0/133] flush_workqueue+0x0/0x85
 
 but task is already holding lock:
  (rtnl_mutex){--..}, at: [rtnetlink_rcv+18/38] rtnetlink_rcv+0x12/0x26
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #2 (rtnl_mutex){--..}:
        [__lock_acquire+2713/3089] __lock_acquire+0xa99/0xc11
        [lock_acquire+106/144] lock_acquire+0x6a/0x90
        [mutex_lock_nested+219/625] mutex_lock_nested+0xdb/0x271
        [rtnl_lock+15/17] rtnl_lock+0xf/0x11
        [linkwatch_event+8/34] linkwatch_event+0x8/0x22
        [run_workqueue+211/417] run_workqueue+0xd3/0x1a1
        [worker_thread+182/194] worker_thread+0xb6/0xc2
        [kthread+59/97] kthread+0x3b/0x61
        [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
 
 -> #1 ((linkwatch_work).work){--..}:
        [__lock_acquire+2713/3089] __lock_acquire+0xa99/0xc11
        [lock_acquire+106/144] lock_acquire+0x6a/0x90
        [run_workqueue+205/417] run_workqueue+0xcd/0x1a1
        [worker_thread+182/194] worker_thread+0xb6/0xc2
        [kthread+59/97] kthread+0x3b/0x61
        [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
 
 -> #0 (events){--..}:
        [__lock_acquire+2488/3089] __lock_acquire+0x9b8/0xc11
        [lock_acquire+106/144] lock_acquire+0x6a/0x90
        [flush_workqueue+68/133] flush_workqueue+0x44/0x85
        [flush_scheduled_work+13/15] flush_scheduled_work+0xd/0xf
        [<d096d80a>] tulip_down+0x20/0x1a3 [tulip]
        [<d096e2b5>] tulip_close+0x24/0xd6 [tulip]
        [dev_close+82/111] dev_close+0x52/0x6f
        [dev_change_flags+159/338] dev_change_flags+0x9f/0x152
        [do_setlink+586/764] do_setlink+0x24a/0x2fc
        [rtnl_setlink+226/230] rtnl_setlink+0xe2/0xe6
        [rtnetlink_rcv_msg+418/444] rtnetlink_rcv_msg+0x1a2/0x1bc
        [netlink_rcv_skb+48/134] netlink_rcv_skb+0x30/0x86
        [rtnetlink_rcv+30/38] rtnetlink_rcv+0x1e/0x26
        [netlink_unicast+439/533] netlink_unicast+0x1b7/0x215
        [netlink_sendmsg+600/613] netlink_sendmsg+0x258/0x265
        [sock_sendmsg+222/249] sock_sendmsg+0xde/0xf9
        [sys_sendmsg+319/402] sys_sendmsg+0x13f/0x192
        [sys_socketcall+363/390] sys_socketcall+0x16b/0x186
        [syscall_call+7/11] syscall_call+0x7/0xb
        [<ffffffff>] 0xffffffff
 
 other info that might help us debug this:
 
 1 lock held by NetworkManager/2308:
  #0:  (rtnl_mutex){--..}, at: [rtnetlink_rcv+18/38] rtnetlink_rcv+0x12/0x26
 
 stack backtrace:
 Pid: 2308, comm: NetworkManager Not tainted 2.6.25-0.161.rc7.fc9.i686 #1
  [print_circular_bug_tail+91/102] print_circular_bug_tail+0x5b/0x66
  [print_circular_bug_entry+57/67] ? print_circular_bug_entry+0x39/0x43
  [__lock_acquire+2488/3089] __lock_acquire+0x9b8/0xc11
  [_spin_unlock_irq+34/47] ? _spin_unlock_irq+0x22/0x2f
  [lock_acquire+106/144] lock_acquire+0x6a/0x90
  [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
  [flush_workqueue+68/133] flush_workqueue+0x44/0x85
  [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
  [flush_scheduled_work+13/15] flush_scheduled_work+0xd/0xf
  [<d096d80a>] tulip_down+0x20/0x1a3 [tulip]
  [trace_hardirqs_on+233/266] ? trace_hardirqs_on+0xe9/0x10a
  [dev_deactivate+177/222] ? dev_deactivate+0xb1/0xde
  [<d096e2b5>] tulip_close+0x24/0xd6 [tulip]
  [dev_close+82/111] dev_close+0x52/0x6f
  [dev_change_flags+159/338] dev_change_flags+0x9f/0x152
  [do_setlink+586/764] do_setlink+0x24a/0x2fc
  [_read_unlock+29/32] ? _read_unlock+0x1d/0x20
  [rtnl_setlink+226/230] rtnl_setlink+0xe2/0xe6
  [rtnl_setlink+0/230] ? rtnl_setlink+0x0/0xe6
  [rtnetlink_rcv_msg+418/444] rtnetlink_rcv_msg+0x1a2/0x1bc
  [rtnetlink_rcv_msg+0/444] ? rtnetlink_rcv_msg+0x0/0x1bc
  [netlink_rcv_skb+48/134] netlink_rcv_skb+0x30/0x86
  [rtnetlink_rcv+30/38] rtnetlink_rcv+0x1e/0x26
  [netlink_unicast+439/533] netlink_unicast+0x1b7/0x215
  [netlink_sendmsg+600/613] netlink_sendmsg+0x258/0x265
  [sock_sendmsg+222/249] sock_sendmsg+0xde/0xf9
  [autoremove_wake_function+0/51] ? autoremove_wake_function+0x0/0x33
  [native_sched_clock+181/209] ? native_sched_clock+0xb5/0xd1
  [sched_clock+8/11] ? sched_clock+0x8/0xb
  [lock_release_holdtime+26/277] ? lock_release_holdtime+0x1a/0x115
  [fget_light+142/185] ? fget_light+0x8e/0xb9
  [copy_from_user+57/289] ? copy_from_user+0x39/0x121
  [verify_iovec+64/111] ? verify_iovec+0x40/0x6f
  [sys_sendmsg+319/402] sys_sendmsg+0x13f/0x192
  [sys_recvmsg+366/379] ? sys_recvmsg+0x16e/0x17b
  [check_object+304/388] ? check_object+0x130/0x184
  [check_object+304/388] ? check_object+0x130/0x184
  [kmem_cache_free+186/207] ? kmem_cache_free+0xba/0xcf
  [trace_hardirqs_on+233/266] ? trace_hardirqs_on+0xe9/0x10a
  [d_free+59/77] ? d_free+0x3b/0x4d
  [d_free+59/77] ? d_free+0x3b/0x4d
  [sys_socketcall+363/390] sys_socketcall+0x16b/0x186
  [syscall_call+7/11] syscall_call+0x7/0xb
  =======================

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-28  0:00 2.6.25rc7 lockdep trace Dave Jones
@ 2008-03-28  1:55 ` Dave Jones
  2008-03-29  0:34 ` David Miller
  1 sibling, 0 replies; 28+ messages in thread
From: Dave Jones @ 2008-03-28  1:55 UTC (permalink / raw)
  To: netdev

On Thu, Mar 27, 2008 at 08:00:13PM -0400, Dave Jones wrote:
 > I see this every time I shut down.

Something that may be relevant.. There are two NICs in this machine.
An onboard e100, which is in use, and there's a PCI tulip card in there
which doesn't even have a cable plugged into it.

The not-in-use card seems to be the trigger.

	Dave 

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-28  0:00 2.6.25rc7 lockdep trace Dave Jones
  2008-03-28  1:55 ` Dave Jones
@ 2008-03-29  0:34 ` David Miller
  2008-03-29  0:54   ` Johannes Berg
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2008-03-29  0:34 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Thu, 27 Mar 2008 20:00:13 -0400

>  stack backtrace:
>  Pid: 2308, comm: NetworkManager Not tainted 2.6.25-0.161.rc7.fc9.i686 #1
>   [print_circular_bug_tail+91/102] print_circular_bug_tail+0x5b/0x66
>   [print_circular_bug_entry+57/67] ? print_circular_bug_entry+0x39/0x43
>   [__lock_acquire+2488/3089] __lock_acquire+0x9b8/0xc11
>   [_spin_unlock_irq+34/47] ? _spin_unlock_irq+0x22/0x2f
>   [lock_acquire+106/144] lock_acquire+0x6a/0x90
>   [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
>   [flush_workqueue+68/133] flush_workqueue+0x44/0x85
>   [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
>   [flush_scheduled_work+13/15] flush_scheduled_work+0xd/0xf
>   [<d096d80a>] tulip_down+0x20/0x1a3 [tulip]
>   [trace_hardirqs_on+233/266] ? trace_hardirqs_on+0xe9/0x10a
>   [dev_deactivate+177/222] ? dev_deactivate+0xb1/0xde

Yes, see for example:

http://www.mail-archive.com/netdev@vger.kernel.org/msg31718.html

You can't flush a workqueue in the device close handler
exactly because of this locking conflict.

Nobody has come up with a suitable way to fix this yet.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  0:34 ` David Miller
@ 2008-03-29  0:54   ` Johannes Berg
  2008-03-29  1:01     ` Johannes Berg
  2008-03-29  1:06     ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Johannes Berg @ 2008-03-29  0:54 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]


> >  stack backtrace:
> >  Pid: 2308, comm: NetworkManager Not tainted 2.6.25-0.161.rc7.fc9.i686 #1
> >   [print_circular_bug_tail+91/102] print_circular_bug_tail+0x5b/0x66
> >   [print_circular_bug_entry+57/67] ? print_circular_bug_entry+0x39/0x43
> >   [__lock_acquire+2488/3089] __lock_acquire+0x9b8/0xc11
> >   [_spin_unlock_irq+34/47] ? _spin_unlock_irq+0x22/0x2f
> >   [lock_acquire+106/144] lock_acquire+0x6a/0x90
> >   [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
> >   [flush_workqueue+68/133] flush_workqueue+0x44/0x85
> >   [flush_workqueue+0/133] ? flush_workqueue+0x0/0x85
> >   [flush_scheduled_work+13/15] flush_scheduled_work+0xd/0xf
> >   [<d096d80a>] tulip_down+0x20/0x1a3 [tulip]
> >   [trace_hardirqs_on+233/266] ? trace_hardirqs_on+0xe9/0x10a
> >   [dev_deactivate+177/222] ? dev_deactivate+0xb1/0xde
> 
> Yes, see for example:
> 
> http://www.mail-archive.com/netdev@vger.kernel.org/msg31718.html
> 
> You can't flush a workqueue in the device close handler
> exactly because of this locking conflict.
> 
> Nobody has come up with a suitable way to fix this yet.

Maybe we should check which schedule_work users actually lock the rtnl
within the work function and move them to a uses-rtnl-in-work workqueue
so that everybody else can have rtnl around flush.

Depending on how many users there are that might not be feasible, but I
have so far only seen linkwatch_event() lock the rtnl within the work
function and everybody else seems to want to use the rtnl around the
flushing.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  0:54   ` Johannes Berg
@ 2008-03-29  1:01     ` Johannes Berg
  2008-03-29  1:09       ` David Miller
  2008-03-29  1:06     ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Johannes Berg @ 2008-03-29  1:01 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]


> > You can't flush a workqueue in the device close handler
> > exactly because of this locking conflict.
> > 
> > Nobody has come up with a suitable way to fix this yet.
> 
> Maybe we should check which schedule_work users actually lock the rtnl
> within the work function and move them to a uses-rtnl-in-work workqueue
> so that everybody else can have rtnl around flush.

On the other hand, most drivers don't actually care that their work has
run, they just care that it won't run in the future after they give up
resources or similar, hence they can and should use cancel_work_sync()
which doesn't suffer from the deadlock. But that needs actual inspection
because it does change behaviour from "run and wait for it if scheduled"
to "cancel if scheduled".

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  1:01     ` Johannes Berg
@ 2008-03-29  1:09       ` David Miller
  2008-04-02  8:51         ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-03-29  1:09 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Sat, 29 Mar 2008 02:01:25 +0100

> 
> > > You can't flush a workqueue in the device close handler
> > > exactly because of this locking conflict.
> > > 
> > > Nobody has come up with a suitable way to fix this yet.
> > 
> > Maybe we should check which schedule_work users actually lock the rtnl
> > within the work function and move them to a uses-rtnl-in-work workqueue
> > so that everybody else can have rtnl around flush.
> 
> On the other hand, most drivers don't actually care that their work has
> run, they just care that it won't run in the future after they give up
> resources or similar, hence they can and should use cancel_work_sync()
> which doesn't suffer from the deadlock. But that needs actual inspection
> because it does change behaviour from "run and wait for it if scheduled"
> to "cancel if scheduled".

I don't see how you can not race with the transition from
scheduled to "executing" without taking the runqueue lock
for the testing.

And it is crucial that the workqueue function doesn't
execute "accidently" due to such a race before the module
and thus the workqueue code is about to get potentially
unloaded.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  1:09       ` David Miller
@ 2008-04-02  8:51         ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2008-04-02  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 03/28, David Miller wrote:
>
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Sat, 29 Mar 2008 02:01:25 +0100
> 
> > 
> > > > You can't flush a workqueue in the device close handler
> > > > exactly because of this locking conflict.
> > > > 
> > > > Nobody has come up with a suitable way to fix this yet.
> > > 
> > > Maybe we should check which schedule_work users actually lock the rtnl
> > > within the work function and move them to a uses-rtnl-in-work workqueue
> > > so that everybody else can have rtnl around flush.
> > 
> > On the other hand, most drivers don't actually care that their work has
> > run, they just care that it won't run in the future after they give up
> > resources or similar, hence they can and should use cancel_work_sync()
> > which doesn't suffer from the deadlock. But that needs actual inspection
> > because it does change behaviour from "run and wait for it if scheduled"
> > to "cancel if scheduled".
> 
> I don't see how you can not race with the transition from
> scheduled to "executing" without taking the runqueue lock
> for the testing.

Yes, cancel_work_sync() takes cwq->lock but this is fine (unless it is buggy ;)
Please note that run_workqueue() drops this lock before calling work->func().

If the caller of cancel_work_sync(work) doesn't share locks with work->func()
we can't deadlock, even if there are other pending/running work_structs which
need the same locks as the caller (say, RTNL).

But, perhaps, you mean wq->lockdep_map? As Johannes pointed out this lock is
fake, but I think this doesn't matter, from the correctness POV it is "real"
lock. What does matter is that cancel_work_sync() doesn't use this lock at all.

(again, Johannes has already explained this all).

> And it is crucial that the workqueue function doesn't
> execute "accidently" due to such a race before the module
> and thus the workqueue code is about to get potentially
> unloaded.

Which race? Unless explicitly queued afterwards, work->func() can't execute
after return from cancel_work_sync(work).

David, I think you misunderstood Johannes, or perhaps I missed something.

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  0:54   ` Johannes Berg
  2008-03-29  1:01     ` Johannes Berg
@ 2008-03-29  1:06     ` David Miller
  2008-03-29 10:02       ` Johannes Berg
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2008-03-29  1:06 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Sat, 29 Mar 2008 01:54:09 +0100

> Maybe we should check which schedule_work users actually lock the rtnl
> within the work function and move them to a uses-rtnl-in-work workqueue
> so that everybody else can have rtnl around flush.

The issue is that linkwatch goes: workqueue lock --> RTNL

And thus taking workqueue lock within RTNL is deadlock
prone no matter where you do it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29  1:06     ` David Miller
@ 2008-03-29 10:02       ` Johannes Berg
  2008-03-29 12:52         ` Jarek Poplawski
  2008-04-03 20:48         ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Johannes Berg @ 2008-03-29 10:02 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

[-- Attachment #1: Type: text/plain, Size: 2956 bytes --]

[merging your two mails]

> The issue is that linkwatch goes: workqueue lock --> RTNL
> 
> And thus taking workqueue lock within RTNL is deadlock
> prone no matter where you do it.

Which lock are you referring to here? There is no real runqueue lock
other than a spin-lock for the list. I think you may be misunderstanding
the lockdep output in a way I didn't anticipated when adding lockdep
debugging for workqueue stuff.

The actual work function is _not_ running with any locks held! The fact
that you see lockdep output there is because we tell lockdep we're
holding two locks, namely the "work" and the "workqueue". These are
fake, they aren't actually locks.

More importantly, those "locks" are locked and unlocked for every _run_
of the struct work function, not around the whole runqueue manipulation,
that part is done atomically without global locks.

So what you have is essentially
list_for_each_work
	lock(workqueue)
	lock(struct work)
	work->f();
	unlock(struct work)
	unlock(workqueue)

where both those locks are really only telling lockdep there are locks
to get it to warn about the deadlock scenario. Unfortunately Andrew
mangled the two patches into one when committing, so I can't point to
the changelog, but here are two links to the two patches adding this:
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch

I think the "runqueue lock" you're referring to is the fake
"lock(workqueue)" which isn't around the whole runqueue manipulation but
only a lockdep helper to debug exactly the deadlock problem we're
talking about. I'm open to renaming it, but I don't think lockdep
supports formatting its output differently.

> I don't see how you can not race with the transition from
> scheduled to "executing" without taking the runqueue lock
> for the testing.

When you call cancel_work_sync(), the work struct will be grabbed by the
code (really __cancel_work_timer) and removed from the queue. That just
operates on bits and a spinlock, not locks held across the struct work
function execution, and ensures it is race-free without needing any such
locks.

> And it is crucial that the workqueue function doesn't
> execute "accidently" due to such a race before the module
> and thus the workqueue code is about to get potentially
> unloaded.

That is exactly what cancel_work_sync() guarantees, but not more. On
contrary, flush_workqueue() also guarantees that the work is run one
last time if it was scheduled at the time of the flush_workqueue() call.

However, as I just tried to explain, cancel_work_sync() _is_ safe to run
while holding the RTNL because it doesn't need any runqueue lock.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29 10:02       ` Johannes Berg
@ 2008-03-29 12:52         ` Jarek Poplawski
  2008-03-29 12:50           ` Johannes Berg
  2008-04-03 20:48         ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-03-29 12:52 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, davej, netdev

Johannes Berg wrote, On 03/29/2008 11:02 AM:

... 
> When you call cancel_work_sync(), the work struct will be grabbed by the
> code (really __cancel_work_timer) and removed from the queue. That just
> operates on bits and a spinlock, not locks held across the struct work
> function execution, and ensures it is race-free without needing any such
> locks

...

> However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> while holding the RTNL because it doesn't need any runqueue lock.

These issues are so seldom now that I forget these details each time

"after use", so maybe I miss something again, but shouldn't this rather
read something like this?:

cancel_work_sync() _is_ safe to run while holding the RTNL against
works which don't take RTNL.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29 12:52         ` Jarek Poplawski
@ 2008-03-29 12:50           ` Johannes Berg
  0 siblings, 0 replies; 28+ messages in thread
From: Johannes Berg @ 2008-03-29 12:50 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, davej, netdev

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]


> > When you call cancel_work_sync(), the work struct will be grabbed by the
> > code (really __cancel_work_timer) and removed from the queue. That just
> > operates on bits and a spinlock, not locks held across the struct work
> > function execution, and ensures it is race-free without needing any such
> > locks
> 
> ...
> 
> > However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> > while holding the RTNL because it doesn't need any runqueue lock.
> 
> These issues are so seldom now that I forget these details each time
> 
> "after use", so maybe I miss something again, but shouldn't this rather
> read something like this?:
> 
> cancel_work_sync() _is_ safe to run while holding the RTNL against
> works which don't take RTNL.

Yes, indeed, I should have said that.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-03-29 10:02       ` Johannes Berg
  2008-03-29 12:52         ` Jarek Poplawski
@ 2008-04-03 20:48         ` David Miller
  2008-04-04 14:48           ` Johannes Berg
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2008-04-03 20:48 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Sat, 29 Mar 2008 11:02:28 +0100

> However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> while holding the RTNL because it doesn't need any runqueue lock.

So in theory we should be able to safely transform
flush_scheduled_work() calls in network driver close
methods into cancel_work_sync()?

Could someone prepare that patch?

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-04-03 20:48         ` David Miller
@ 2008-04-04 14:48           ` Johannes Berg
  2008-06-11  5:40             ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Berg @ 2008-04-04 14:48 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]


On Thu, 2008-04-03 at 13:48 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Sat, 29 Mar 2008 11:02:28 +0100
> 
> > However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> > while holding the RTNL because it doesn't need any runqueue lock.
> 
> So in theory we should be able to safely transform
> flush_scheduled_work() calls in network driver close
> methods into cancel_work_sync()?

Yes, more precisely cancel_work_sync() for each work struct the driver
uses, unless the driver actually requires that the work runs before it
goes down.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-04-04 14:48           ` Johannes Berg
@ 2008-06-11  5:40             ` David Miller
  2008-06-11  7:08               ` Jarek Poplawski
                                 ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: David Miller @ 2008-06-11  5:40 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 04 Apr 2008 16:48:12 +0200

I'm attempting to bring solving this problem back from the dead.

Please help!

> On Thu, 2008-04-03 at 13:48 -0700, David Miller wrote:
> > From: Johannes Berg <johannes@sipsolutions.net>
> > Date: Sat, 29 Mar 2008 11:02:28 +0100
> > 
> > > However, as I just tried to explain, cancel_work_sync() _is_ safe to run
> > > while holding the RTNL because it doesn't need any runqueue lock.
> > 
> > So in theory we should be able to safely transform
> > flush_scheduled_work() calls in network driver close
> > methods into cancel_work_sync()?
> 
> Yes, more precisely cancel_work_sync() for each work struct the driver
> uses, unless the driver actually requires that the work runs before it
> goes down.

Ok, I did an audit of all the drivers under drivers/net that invoke
flush_scheduled_work().  Here is my audit analysis and the patch I
came up with.  The only deadlock'y case I couldn't fix right now is
the cassini driver, see below for details and the patch:

8139too: Calls flush_scheduled_work() in device remove method, OK.
libertas: Calls flush_scheduled_work() in device probe and remove methods, OK.
bnx2: Uses special polling mdelay() sequence instead of calling
      flush_scheduled_work() in close method, converted to
      cancel_work_sync().
cassini: Does flush_scheduled_work() from ->change_mtu() method, which
	 also holds RTNL semaphore.  Seems tricky to fix as it's trying
	 to schedule work and then immediately wait for it to
	 complete synchronously before returning from the ->change_mtu
	 method.  Maybe calling cas_reset_task() directly would work,
	 but this could conflict with reset tasks scheduled in other
	 contexts.  NOT FIXED
e1000e: Does flush_scheduled_work() from device remove method, OK.
ehea_main: Does flush_scheduled_work() in ->stop method, converted to
	   cancel_work_sync().
baycom_epp: Does flush_scheduled_work() from ->stop method, converted
	    to cancel_delayed_work()/cancel_work_sync() sequence.
ibm_newemac: Calls flush_scheduled_work() from device remove method, OK.
igb: Calls flush_scheduled_work() from device remove method, OK.
irda/mcs7780: Calls flush_scheduled_work() from USB disconnect, OK.
iseries_veth: Calls flush_scheduled_work() from device disconnect and module
	      exit, OK.
ixgbe: Calls flush_scheduled_work() from device remove method, OK.
mv643xx_eth: Calls flush_scheduled_work() from device remove method, OK.
myri10ge: Calls flush_scheduled_work() from device remove method, OK.
netxen: Has it's own hand-grown scheme to work around the flush_scheduled_work()
	deadlock.  Using a workqueue and flush_workqueue().
niu: Calls flush_scheduled_work() only from suspend handler, OK.
phy/phy.c: Avoids flush_scheduled_work() calls to avoid deadlock, uses
	   cancel_work_sync() instead.
r8169: Calls flush_scheduled_work() from device remove method, OK.
s2io: Calls flush_scheduled_work() from device remove method, OK.
sis190: Calls flush_scheduled_work() from device remove method, OK.
skge.c: Calls flush_scheduled_work() from device remove method, OK.
smc911x: Uses a similar polling loop like bnx2 did, converted to
	   cancel_work_sync().
smc91x: Same as smc911x
sungem: Calls flush_scheduled_work() only from suspend and device
	remove method, OK.
tg3: Calls flush_scheduled_work() only from suspend and device remove
     method, OK.
tulip_core: Calls flush_scheduled_work() from tulip_down(), converted to
	    cancel_work_sync().
kaweth: Calls flush_scheduled_work() from kaweth_kill_urbs() which can
	run from the ->stop() handler, converted to
	cancel_delayed_work/cancel_work_sync sequence.
usbnet:	Only calls flush_scheduled_work() from disconnect handler, OK.
hostap: Calls flush_scheduled_work() from ->stop, converted to
	cancel_work_sync()

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4b46e68..48ed410 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5907,12 +5907,7 @@ bnx2_close(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	u32 reset_code;
 
-	/* Calling flush_scheduled_work() may deadlock because
-	 * linkwatch_event() may be on the workqueue and it will try to get
-	 * the rtnl_lock which we are holding.
-	 */
-	while (bp->in_reset_task)
-		msleep(1);
+	cancel_work_sync(&bp->reset_task);
 
 	bnx2_disable_int_sync(bp);
 	bnx2_napi_disable(bp);
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index faae01d..075fd54 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2605,7 +2605,8 @@ static int ehea_stop(struct net_device *dev)
 	if (netif_msg_ifdown(port))
 		ehea_info("disabling port %s", dev->name);
 
-	flush_scheduled_work();
+	cancel_work_sync(&port->reset_task);
+
 	mutex_lock(&port->port_lock);
 	netif_stop_queue(dev);
 	port_napi_disable(port);
diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
index dde9c7e..b36ae41 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -959,7 +959,8 @@ static int epp_close(struct net_device *dev)
 	unsigned char tmp[1];
 
 	bc->work_running = 0;
-	flush_scheduled_work();
+	cancel_delayed_work(&bc->run_work);
+	cancel_work_sync(&bc->run_work.work);
 	bc->stat = EPP_DCDBIT;
 	tmp[0] = 0;
 	pp->ops->epp_write_addr(pp, tmp, 1, 0);
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index 4e28002..5c71098 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -1531,16 +1531,8 @@ static int smc911x_close(struct net_device *dev)
 	if (lp->phy_type != 0) {
 		/* We need to ensure that no calls to
 		 * smc911x_phy_configure are pending.
-
-		 * flush_scheduled_work() cannot be called because we
-		 * are running with the netlink semaphore held (from
-		 * devinet_ioctl()) and the pending work queue
-		 * contains linkwatch_event() (scheduled by
-		 * netif_carrier_off() above). linkwatch_event() also
-		 * wants the netlink semaphore.
 		 */
-		while (lp->work_pending)
-			schedule();
+		cancel_work_sync(&lp->phy_configure);
 		smc911x_phy_powerdown(dev, lp->mii.phy_id);
 	}
 
diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c
index a188e33..5dbe4d3 100644
--- a/drivers/net/smc91x.c
+++ b/drivers/net/smc91x.c
@@ -1016,15 +1016,8 @@ static void smc_phy_powerdown(struct net_device *dev)
 
 	/* We need to ensure that no calls to smc_phy_configure are
 	   pending.
-
-	   flush_scheduled_work() cannot be called because we are
-	   running with the netlink semaphore held (from
-	   devinet_ioctl()) and the pending work queue contains
-	   linkwatch_event() (scheduled by netif_carrier_off()
-	   above). linkwatch_event() also wants the netlink semaphore.
 	*/
-	while(lp->work_pending)
-		yield();
+	cancel_work_sync(&lp->phy_configure);
 
 	bmcr = smc_phy_read(dev, phy, MII_BMCR);
 	smc_phy_write(dev, phy, MII_BMCR, bmcr | BMCR_PDOWN);
diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 55670b5..af8d2c4 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -731,7 +731,7 @@ static void tulip_down (struct net_device *dev)
 	void __iomem *ioaddr = tp->base_addr;
 	unsigned long flags;
 
-	flush_scheduled_work();
+	cancel_work_sync(&tp->media_work);
 
 #ifdef CONFIG_TULIP_NAPI
 	napi_disable(&tp->napi);
diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index 0dcfc03..d776bcf 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -706,7 +706,8 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
 	usb_kill_urb(kaweth->rx_urb);
 	usb_kill_urb(kaweth->tx_urb);
 
-	flush_scheduled_work();
+	cancel_delayed_work(&kaweth->lowmem_work);
+	cancel_work_sync(&kaweth->lowmem_work.work);
 
 	/* a scheduled work may have resubmitted,
 	   we hit them again */
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index 20d387f..57b800f 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -682,7 +682,11 @@ static int prism2_close(struct net_device *dev)
 		netif_device_detach(dev);
 	}
 
-	flush_scheduled_work();
+	cancel_work_sync(&local->reset_queue);
+	cancel_work_sync(&local->set_multicast_list_queue);
+	cancel_work_sync(&local->set_tim_queue);
+	cancel_work_sync(&local->info_queue);
+	cancel_work_sync(&local->comms_qual_update);
 
 	module_put(local->hw_module);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
@ 2008-06-11  7:08               ` Jarek Poplawski
  2008-06-11  7:10                 ` David Miller
  2008-06-11  9:36               ` Jarek Poplawski
                                 ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-11  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 11-06-2008 07:40, David Miller wrote:
...
> Ok, I did an audit of all the drivers under drivers/net that invoke
> flush_scheduled_work().  Here is my audit analysis and the patch I
> came up with.  The only deadlock'y case I couldn't fix right now is
> the cassini driver, see below for details and the patch:
> 
> 8139too: Calls flush_scheduled_work() in device remove method, OK.

I have some doubt here: this rtl8139_thred() work function seems to
rearm if netif_running(). Is it guaranteed a dev is down while
running this flush_scheduled_work()?

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  7:08               ` Jarek Poplawski
@ 2008-06-11  7:10                 ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2008-06-11  7:10 UTC (permalink / raw)
  To: jarkao2; +Cc: johannes, davej, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 11 Jun 2008 07:08:47 +0000

> On 11-06-2008 07:40, David Miller wrote:
> ...
> > Ok, I did an audit of all the drivers under drivers/net that invoke
> > flush_scheduled_work().  Here is my audit analysis and the patch I
> > came up with.  The only deadlock'y case I couldn't fix right now is
> > the cassini driver, see below for details and the patch:
> > 
> > 8139too: Calls flush_scheduled_work() in device remove method, OK.
> 
> I have some doubt here: this rtl8139_thred() work function seems to
> rearm if netif_running(). Is it guaranteed a dev is down while
> running this flush_scheduled_work()?

Indeed, perhaps it should do the flush_scheduled_work()
after the unregister_netdev().

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
  2008-06-11  7:08               ` Jarek Poplawski
@ 2008-06-11  9:36               ` Jarek Poplawski
  2008-06-12  0:34                 ` David Miller
  2008-06-11 10:40               ` Jarek Poplawski
                                 ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-11  9:36 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 11-06-2008 07:40, David Miller wrote:
...
> ehea_main: Does flush_scheduled_work() in ->stop method, converted to
> 	   cancel_work_sync().

Looks like this work could be rearmed afterwards from
ehea_qp_aff_irq_handler() (and maybe a few other places).

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  9:36               ` Jarek Poplawski
@ 2008-06-12  0:34                 ` David Miller
  2008-06-12  6:29                   ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-06-12  0:34 UTC (permalink / raw)
  To: jarkao2; +Cc: johannes, davej, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 11 Jun 2008 09:36:45 +0000

> On 11-06-2008 07:40, David Miller wrote:
> ...
> > ehea_main: Does flush_scheduled_work() in ->stop method, converted to
> > 	   cancel_work_sync().
> 
> Looks like this work could be rearmed afterwards from
> ehea_qp_aff_irq_handler() (and maybe a few other places).

Good catch, but this bug exists both before and after
my changes :-)

It is certainly something which should be reported to
the EHEA developers.  Could you please do this?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-12  0:34                 ` David Miller
@ 2008-06-12  6:29                   ` Jarek Poplawski
  0 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-12  6:29 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev, Jan-Bernd Themann, Thomas Klein

On Wed, Jun 11, 2008 at 05:34:17PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 11 Jun 2008 09:36:45 +0000
> 
> > On 11-06-2008 07:40, David Miller wrote:
> > ...
> > > ehea_main: Does flush_scheduled_work() in ->stop method, converted to
> > > 	   cancel_work_sync().
> > 
> > Looks like this work could be rearmed afterwards from
> > ehea_qp_aff_irq_handler() (and maybe a few other places).
> 
> Good catch, but this bug exists both before and after
> my changes :-)
> 
> It is certainly something which should be reported to
> the EHEA developers.  Could you please do this?

I hope this RE & CC will suffice.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
  2008-06-11  7:08               ` Jarek Poplawski
  2008-06-11  9:36               ` Jarek Poplawski
@ 2008-06-11 10:40               ` Jarek Poplawski
  2008-06-12  0:31                 ` David Miller
  2008-06-11 13:14               ` Jarek Poplawski
                                 ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-11 10:40 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 11-06-2008 07:40, David Miller wrote:
...
> diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
> index 0dcfc03..d776bcf 100644
> --- a/drivers/net/usb/kaweth.c
> +++ b/drivers/net/usb/kaweth.c
> @@ -706,7 +706,8 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
>  	usb_kill_urb(kaweth->rx_urb);
>  	usb_kill_urb(kaweth->tx_urb);
>  
> -	flush_scheduled_work();
> +	cancel_delayed_work(&kaweth->lowmem_work);
> +	cancel_work_sync(&kaweth->lowmem_work.work);
> 

Why not cancel_delayed_work_sync()?

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11 10:40               ` Jarek Poplawski
@ 2008-06-12  0:31                 ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2008-06-12  0:31 UTC (permalink / raw)
  To: jarkao2; +Cc: johannes, davej, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 11 Jun 2008 10:40:27 +0000

> On 11-06-2008 07:40, David Miller wrote:
> ...
> > diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
> > index 0dcfc03..d776bcf 100644
> > --- a/drivers/net/usb/kaweth.c
> > +++ b/drivers/net/usb/kaweth.c
> > @@ -706,7 +706,8 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
> >  	usb_kill_urb(kaweth->rx_urb);
> >  	usb_kill_urb(kaweth->tx_urb);
> >  
> > -	flush_scheduled_work();
> > +	cancel_delayed_work(&kaweth->lowmem_work);
> > +	cancel_work_sync(&kaweth->lowmem_work.work);
> > 
> 
> Why not cancel_delayed_work_sync()?

I simply didn't know it existed :-)  I've updated my patch
to use cancel_delayed_work_sync() as appropriate, thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
                                 ` (2 preceding siblings ...)
  2008-06-11 10:40               ` Jarek Poplawski
@ 2008-06-11 13:14               ` Jarek Poplawski
  2008-06-12  5:46               ` David Miller
  2008-06-12  6:13               ` Jarek Poplawski
  5 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-11 13:14 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 11-06-2008 07:40, David Miller wrote:
...
> cassini: Does flush_scheduled_work() from ->change_mtu() method, which
> 	 also holds RTNL semaphore.  Seems tricky to fix as it's trying
> 	 to schedule work and then immediately wait for it to
> 	 complete synchronously before returning from the ->change_mtu
> 	 method.  Maybe calling cas_reset_task() directly would work,
> 	 but this could conflict with reset tasks scheduled in other
> 	 contexts.  NOT FIXED

Could you give an example of such a conflict? (I wonder how it's
avoided with current solution.)

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
                                 ` (3 preceding siblings ...)
  2008-06-11 13:14               ` Jarek Poplawski
@ 2008-06-12  5:46               ` David Miller
  2008-06-12  7:20                 ` Johannes Berg
  2008-06-12  6:13               ` Jarek Poplawski
  5 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-06-12  5:46 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: David Miller <davem@davemloft.net>
Date: Tue, 10 Jun 2008 22:40:23 -0700 (PDT)

> I'm attempting to bring solving this problem back from the dead.

Here is my current patch with a proper changelog.

Any objections to my pushing this to Linus?

net: Eliminate flush_scheduled_work() calls while RTNL is held.

If the RTNL is held when we invoke flush_scheduled_work() we could
deadlock.  One such case is linkwatch, it is a workqueue which tries
to grab the RTNL semaphore.

The most common case are net driver ->stop() methods.  The
simplest conversion is to instead use cancel_{delayed_}work_sync()
explicitly on the various workqueues the driver uses.

This is an OK transformation because these workqueues are doing things
like resetting the chip, restarting link negotiation, and so forth.
And if we're bringing down the device, we're about to turn the chip
off and reset it anways.  So if we cancel a pending work event, that's
fine here.

Some drivers were working around this deadlock by using a msleep()
polling loop of some sort, and those cases are converted to instead
use cancel_{delayed_}work_sync() as well.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4b46e68..48ed410 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5907,12 +5907,7 @@ bnx2_close(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	u32 reset_code;
 
-	/* Calling flush_scheduled_work() may deadlock because
-	 * linkwatch_event() may be on the workqueue and it will try to get
-	 * the rtnl_lock which we are holding.
-	 */
-	while (bp->in_reset_task)
-		msleep(1);
+	cancel_work_sync(&bp->reset_task);
 
 	bnx2_disable_int_sync(bp);
 	bnx2_napi_disable(bp);
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index faae01d..075fd54 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2605,7 +2605,8 @@ static int ehea_stop(struct net_device *dev)
 	if (netif_msg_ifdown(port))
 		ehea_info("disabling port %s", dev->name);
 
-	flush_scheduled_work();
+	cancel_work_sync(&port->reset_task);
+
 	mutex_lock(&port->port_lock);
 	netif_stop_queue(dev);
 	port_napi_disable(port);
diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
index dde9c7e..00bc7fb 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -959,7 +959,7 @@ static int epp_close(struct net_device *dev)
 	unsigned char tmp[1];
 
 	bc->work_running = 0;
-	flush_scheduled_work();
+	cancel_delayed_work_sync(&bc->run_work);
 	bc->stat = EPP_DCDBIT;
 	tmp[0] = 0;
 	pp->ops->epp_write_addr(pp, tmp, 1, 0);
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index 4e28002..5c71098 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -1531,16 +1531,8 @@ static int smc911x_close(struct net_device *dev)
 	if (lp->phy_type != 0) {
 		/* We need to ensure that no calls to
 		 * smc911x_phy_configure are pending.
-
-		 * flush_scheduled_work() cannot be called because we
-		 * are running with the netlink semaphore held (from
-		 * devinet_ioctl()) and the pending work queue
-		 * contains linkwatch_event() (scheduled by
-		 * netif_carrier_off() above). linkwatch_event() also
-		 * wants the netlink semaphore.
 		 */
-		while (lp->work_pending)
-			schedule();
+		cancel_work_sync(&lp->phy_configure);
 		smc911x_phy_powerdown(dev, lp->mii.phy_id);
 	}
 
diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c
index a188e33..5dbe4d3 100644
--- a/drivers/net/smc91x.c
+++ b/drivers/net/smc91x.c
@@ -1016,15 +1016,8 @@ static void smc_phy_powerdown(struct net_device *dev)
 
 	/* We need to ensure that no calls to smc_phy_configure are
 	   pending.
-
-	   flush_scheduled_work() cannot be called because we are
-	   running with the netlink semaphore held (from
-	   devinet_ioctl()) and the pending work queue contains
-	   linkwatch_event() (scheduled by netif_carrier_off()
-	   above). linkwatch_event() also wants the netlink semaphore.
 	*/
-	while(lp->work_pending)
-		yield();
+	cancel_work_sync(&lp->phy_configure);
 
 	bmcr = smc_phy_read(dev, phy, MII_BMCR);
 	smc_phy_write(dev, phy, MII_BMCR, bmcr | BMCR_PDOWN);
diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 55670b5..af8d2c4 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -731,7 +731,7 @@ static void tulip_down (struct net_device *dev)
 	void __iomem *ioaddr = tp->base_addr;
 	unsigned long flags;
 
-	flush_scheduled_work();
+	cancel_work_sync(&tp->media_work);
 
 #ifdef CONFIG_TULIP_NAPI
 	napi_disable(&tp->napi);
diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index 0dcfc03..7c66b05 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -706,7 +706,7 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
 	usb_kill_urb(kaweth->rx_urb);
 	usb_kill_urb(kaweth->tx_urb);
 
-	flush_scheduled_work();
+	cancel_delayed_work_sync(&kaweth->lowmem_work);
 
 	/* a scheduled work may have resubmitted,
 	   we hit them again */
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index 20d387f..57b800f 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -682,7 +682,11 @@ static int prism2_close(struct net_device *dev)
 		netif_device_detach(dev);
 	}
 
-	flush_scheduled_work();
+	cancel_work_sync(&local->reset_queue);
+	cancel_work_sync(&local->set_multicast_list_queue);
+	cancel_work_sync(&local->set_tim_queue);
+	cancel_work_sync(&local->info_queue);
+	cancel_work_sync(&local->comms_qual_update);
 
 	module_put(local->hw_module);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-12  5:46               ` David Miller
@ 2008-06-12  7:20                 ` Johannes Berg
  2008-06-12  8:23                   ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Berg @ 2008-06-12  7:20 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

Technically,

> If the RTNL is held when we invoke flush_scheduled_work() we could
> deadlock.  One such case is linkwatch, it is a workqueue which tries
                                            that ^^^^^^^^^

> The most common case are net driver ->stop() methods.  The
> simplest conversion is to instead use cancel_{delayed_}work_sync()
> explicitly on the various workqueues the driver uses.
                   and that ^^^^^^^^^^

> This is an OK transformation because these workqueues are doing things
                                    and that ^^^^^^^^^^

should read "work struct" because it all uses the global workqueue.

I haven't looked at the drivers in more detail (sorry) but the patch
itself looks fine to me (but I wouldn't have caught something like Jarek
did with the now-unused variables.)

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-12  7:20                 ` Johannes Berg
@ 2008-06-12  8:23                   ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2008-06-12  8:23 UTC (permalink / raw)
  To: johannes; +Cc: davej, netdev

From: Johannes Berg <johannes@sipsolutions.net>
Date: Thu, 12 Jun 2008 09:20:34 +0200

> > If the RTNL is held when we invoke flush_scheduled_work() we could
> > deadlock.  One such case is linkwatch, it is a workqueue which tries
>                                             that ^^^^^^^^^
> 
> > The most common case are net driver ->stop() methods.  The
> > simplest conversion is to instead use cancel_{delayed_}work_sync()
> > explicitly on the various workqueues the driver uses.
>                    and that ^^^^^^^^^^
> 
> > This is an OK transformation because these workqueues are doing things
>                                     and that ^^^^^^^^^^
> 
> should read "work struct" because it all uses the global workqueue.
> 
> I haven't looked at the drivers in more detail (sorry) but the patch
> itself looks fine to me (but I wouldn't have caught something like Jarek
> did with the now-unused variables.)

I've made those corrections to the commit message, thanks for
the feedback.

Since Jarek appears to be happy with the patch at this point I'll
commit this and sned it to Linus in my next push of bug fixes.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-11  5:40             ` David Miller
                                 ` (4 preceding siblings ...)
  2008-06-12  5:46               ` David Miller
@ 2008-06-12  6:13               ` Jarek Poplawski
  2008-06-12  7:01                 ` David Miller
  5 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-12  6:13 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On 11-06-2008 07:40, David Miller wrote:
...
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 4b46e68..48ed410 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -5907,12 +5907,7 @@ bnx2_close(struct net_device *dev)
>  	struct bnx2 *bp = netdev_priv(dev);
>  	u32 reset_code;
>  
> -	/* Calling flush_scheduled_work() may deadlock because
> -	 * linkwatch_event() may be on the workqueue and it will try to get
> -	 * the rtnl_lock which we are holding.
> -	 */
> -	while (bp->in_reset_task)
> -		msleep(1);
> +	cancel_work_sync(&bp->reset_task);

It seems after this change bp->in_reset_task could be removed totally
from the code.

...
> diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
> index 4e28002..5c71098 100644
> --- a/drivers/net/smc911x.c
> +++ b/drivers/net/smc911x.c
> @@ -1531,16 +1531,8 @@ static int smc911x_close(struct net_device *dev)
>  	if (lp->phy_type != 0) {
>  		/* We need to ensure that no calls to
>  		 * smc911x_phy_configure are pending.
> -
> -		 * flush_scheduled_work() cannot be called because we
> -		 * are running with the netlink semaphore held (from
> -		 * devinet_ioctl()) and the pending work queue
> -		 * contains linkwatch_event() (scheduled by
> -		 * netif_carrier_off() above). linkwatch_event() also
> -		 * wants the netlink semaphore.
>  		 */
> -		while (lp->work_pending)
> -			schedule();
> +		cancel_work_sync(&lp->phy_configure);

... likewise lp->work_pending,


>  		smc911x_phy_powerdown(dev, lp->mii.phy_id);
>  	}
>  
> diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c
> index a188e33..5dbe4d3 100644
> --- a/drivers/net/smc91x.c
> +++ b/drivers/net/smc91x.c
> @@ -1016,15 +1016,8 @@ static void smc_phy_powerdown(struct net_device *dev)
>  
>  	/* We need to ensure that no calls to smc_phy_configure are
>  	   pending.
> -
> -	   flush_scheduled_work() cannot be called because we are
> -	   running with the netlink semaphore held (from
> -	   devinet_ioctl()) and the pending work queue contains
> -	   linkwatch_event() (scheduled by netif_carrier_off()
> -	   above). linkwatch_event() also wants the netlink semaphore.
>  	*/
> -	while(lp->work_pending)
> -		yield();
> +	cancel_work_sync(&lp->phy_configure);

... likewise lp->work_pending.

...
> diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
> index 20d387f..57b800f 100644
> --- a/drivers/net/wireless/hostap/hostap_main.c
> +++ b/drivers/net/wireless/hostap/hostap_main.c
> @@ -682,7 +682,11 @@ static int prism2_close(struct net_device *dev)
>  		netif_device_detach(dev);
>  	}
>  
> -	flush_scheduled_work();
> +	cancel_work_sync(&local->reset_queue);
> +	cancel_work_sync(&local->set_multicast_list_queue);
> +	cancel_work_sync(&local->set_tim_queue);
> +	cancel_work_sync(&local->info_queue);

local->info_queue probably needs some #ifndef like in hostap_info_init()
or could be skipped here at all (waiting for remove).

> +	cancel_work_sync(&local->comms_qual_update);
>  
>  	module_put(local->hw_module);

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-12  6:13               ` Jarek Poplawski
@ 2008-06-12  7:01                 ` David Miller
  2008-06-12  7:47                   ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-06-12  7:01 UTC (permalink / raw)
  To: jarkao2; +Cc: johannes, davej, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 12 Jun 2008 06:13:34 +0000

> It seems after this change bp->in_reset_task could be removed totally
> from the code.
 ...
> ... likewise lp->work_pending,
 ...
> ... likewise lp->work_pending.
 ...
> local->info_queue probably needs some #ifndef like in hostap_info_init()
> or could be skipped here at all (waiting for remove).

Thanks for the feedback, here's a new patch with your
suggestions added.

net: Eliminate flush_scheduled_work() calls while RTNL is held.

If the RTNL is held when we invoke flush_scheduled_work() we could
deadlock.  One such case is linkwatch, it is a workqueue which tries
to grab the RTNL semaphore.

The most common case are net driver ->stop() methods.  The
simplest conversion is to instead use cancel_{delayed_}work_sync()
explicitly on the various workqueues the driver uses.

This is an OK transformation because these workqueues are doing things
like resetting the chip, restarting link negotiation, and so forth.
And if we're bringing down the device, we're about to turn the chip
off and reset it anways.  So if we cancel a pending work event, that's
fine here.

Some drivers were working around this deadlock by using a msleep()
polling loop of some sort, and those cases are converted to instead
use cancel_{delayed_}work_sync() as well.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4b46e68..367b6d4 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -5724,14 +5724,12 @@ bnx2_reset_task(struct work_struct *work)
 	if (!netif_running(bp->dev))
 		return;
 
-	bp->in_reset_task = 1;
 	bnx2_netif_stop(bp);
 
 	bnx2_init_nic(bp);
 
 	atomic_set(&bp->intr_sem, 1);
 	bnx2_netif_start(bp);
-	bp->in_reset_task = 0;
 }
 
 static void
@@ -5907,12 +5905,7 @@ bnx2_close(struct net_device *dev)
 	struct bnx2 *bp = netdev_priv(dev);
 	u32 reset_code;
 
-	/* Calling flush_scheduled_work() may deadlock because
-	 * linkwatch_event() may be on the workqueue and it will try to get
-	 * the rtnl_lock which we are holding.
-	 */
-	while (bp->in_reset_task)
-		msleep(1);
+	cancel_work_sync(&bp->reset_task);
 
 	bnx2_disable_int_sync(bp);
 	bnx2_napi_disable(bp);
diff --git a/drivers/net/bnx2.h b/drivers/net/bnx2.h
index 1eaf5bb..2377cc1 100644
--- a/drivers/net/bnx2.h
+++ b/drivers/net/bnx2.h
@@ -6656,7 +6656,6 @@ struct bnx2 {
 	int			current_interval;
 	struct			timer_list timer;
 	struct work_struct	reset_task;
-	int			in_reset_task;
 
 	/* Used to synchronize phy accesses. */
 	spinlock_t		phy_lock;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index faae01d..075fd54 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2605,7 +2605,8 @@ static int ehea_stop(struct net_device *dev)
 	if (netif_msg_ifdown(port))
 		ehea_info("disabling port %s", dev->name);
 
-	flush_scheduled_work();
+	cancel_work_sync(&port->reset_task);
+
 	mutex_lock(&port->port_lock);
 	netif_stop_queue(dev);
 	port_napi_disable(port);
diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
index dde9c7e..00bc7fb 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -959,7 +959,7 @@ static int epp_close(struct net_device *dev)
 	unsigned char tmp[1];
 
 	bc->work_running = 0;
-	flush_scheduled_work();
+	cancel_delayed_work_sync(&bc->run_work);
 	bc->stat = EPP_DCDBIT;
 	tmp[0] = 0;
 	pp->ops->epp_write_addr(pp, tmp, 1, 0);
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index 4e28002..e2ee91a 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -136,7 +136,6 @@ struct smc911x_local {
 
 	/* work queue */
 	struct work_struct phy_configure;
-	int work_pending;
 
 	int tx_throttle;
 	spinlock_t lock;
@@ -960,11 +959,11 @@ static void smc911x_phy_configure(struct work_struct *work)
 	 * We should not be called if phy_type is zero.
 	 */
 	if (lp->phy_type == 0)
-		 goto smc911x_phy_configure_exit_nolock;
+		return;
 
 	if (smc911x_phy_reset(dev, phyaddr)) {
 		printk("%s: PHY reset timed out\n", dev->name);
-		goto smc911x_phy_configure_exit_nolock;
+		return;
 	}
 	spin_lock_irqsave(&lp->lock, flags);
 
@@ -1033,8 +1032,6 @@ static void smc911x_phy_configure(struct work_struct *work)
 
 smc911x_phy_configure_exit:
 	spin_unlock_irqrestore(&lp->lock, flags);
-smc911x_phy_configure_exit_nolock:
-	lp->work_pending = 0;
 }
 
 /*
@@ -1356,11 +1353,8 @@ static void smc911x_timeout(struct net_device *dev)
 	 * smc911x_phy_configure() calls msleep() which calls schedule_timeout()
 	 * which calls schedule().	 Hence we use a work queue.
 	 */
-	if (lp->phy_type != 0) {
-		if (schedule_work(&lp->phy_configure)) {
-			lp->work_pending = 1;
-		}
-	}
+	if (lp->phy_type != 0)
+		schedule_work(&lp->phy_configure);
 
 	/* We can accept TX packets again */
 	dev->trans_start = jiffies;
@@ -1531,16 +1525,8 @@ static int smc911x_close(struct net_device *dev)
 	if (lp->phy_type != 0) {
 		/* We need to ensure that no calls to
 		 * smc911x_phy_configure are pending.
-
-		 * flush_scheduled_work() cannot be called because we
-		 * are running with the netlink semaphore held (from
-		 * devinet_ioctl()) and the pending work queue
-		 * contains linkwatch_event() (scheduled by
-		 * netif_carrier_off() above). linkwatch_event() also
-		 * wants the netlink semaphore.
 		 */
-		while (lp->work_pending)
-			schedule();
+		cancel_work_sync(&lp->phy_configure);
 		smc911x_phy_powerdown(dev, lp->mii.phy_id);
 	}
 
diff --git a/drivers/net/smc91x.c b/drivers/net/smc91x.c
index a188e33..f2051b2 100644
--- a/drivers/net/smc91x.c
+++ b/drivers/net/smc91x.c
@@ -1016,15 +1016,8 @@ static void smc_phy_powerdown(struct net_device *dev)
 
 	/* We need to ensure that no calls to smc_phy_configure are
 	   pending.
-
-	   flush_scheduled_work() cannot be called because we are
-	   running with the netlink semaphore held (from
-	   devinet_ioctl()) and the pending work queue contains
-	   linkwatch_event() (scheduled by netif_carrier_off()
-	   above). linkwatch_event() also wants the netlink semaphore.
 	*/
-	while(lp->work_pending)
-		yield();
+	cancel_work_sync(&lp->phy_configure);
 
 	bmcr = smc_phy_read(dev, phy, MII_BMCR);
 	smc_phy_write(dev, phy, MII_BMCR, bmcr | BMCR_PDOWN);
@@ -1161,7 +1154,6 @@ static void smc_phy_configure(struct work_struct *work)
 smc_phy_configure_exit:
 	SMC_SELECT_BANK(lp, 2);
 	spin_unlock_irq(&lp->lock);
-	lp->work_pending = 0;
 }
 
 /*
@@ -1389,11 +1381,8 @@ static void smc_timeout(struct net_device *dev)
 	 * smc_phy_configure() calls msleep() which calls schedule_timeout()
 	 * which calls schedule().  Hence we use a work queue.
 	 */
-	if (lp->phy_type != 0) {
-		if (schedule_work(&lp->phy_configure)) {
-			lp->work_pending = 1;
-		}
-	}
+	if (lp->phy_type != 0)
+		schedule_work(&lp->phy_configure);
 
 	/* We can accept TX packets again */
 	dev->trans_start = jiffies;
diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 55670b5..af8d2c4 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -731,7 +731,7 @@ static void tulip_down (struct net_device *dev)
 	void __iomem *ioaddr = tp->base_addr;
 	unsigned long flags;
 
-	flush_scheduled_work();
+	cancel_work_sync(&tp->media_work);
 
 #ifdef CONFIG_TULIP_NAPI
 	napi_disable(&tp->napi);
diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index 0dcfc03..7c66b05 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -706,7 +706,7 @@ static void kaweth_kill_urbs(struct kaweth_device *kaweth)
 	usb_kill_urb(kaweth->rx_urb);
 	usb_kill_urb(kaweth->tx_urb);
 
-	flush_scheduled_work();
+	cancel_delayed_work_sync(&kaweth->lowmem_work);
 
 	/* a scheduled work may have resubmitted,
 	   we hit them again */
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index 20d387f..f7aec93 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -682,7 +682,13 @@ static int prism2_close(struct net_device *dev)
 		netif_device_detach(dev);
 	}
 
-	flush_scheduled_work();
+	cancel_work_sync(&local->reset_queue);
+	cancel_work_sync(&local->set_multicast_list_queue);
+	cancel_work_sync(&local->set_tim_queue);
+#ifndef PRISM2_NO_STATION_MODES
+	cancel_work_sync(&local->info_queue);
+#endif
+	cancel_work_sync(&local->comms_qual_update);
 
 	module_put(local->hw_module);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: 2.6.25rc7 lockdep trace
  2008-06-12  7:01                 ` David Miller
@ 2008-06-12  7:47                   ` Jarek Poplawski
  0 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-06-12  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, davej, netdev

On Thu, Jun 12, 2008 at 12:01:53AM -0700, David Miller wrote:
...
> net: Eliminate flush_scheduled_work() calls while RTNL is held.
> 
...

Very nice!

BTW, I guess this cassini fix with direct calling the work function
(and maybe moving flush_ after unregister_netdev() in 8139too, like
in a few other places) could be re-considered here as well.

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-06-12  8:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-28  0:00 2.6.25rc7 lockdep trace Dave Jones
2008-03-28  1:55 ` Dave Jones
2008-03-29  0:34 ` David Miller
2008-03-29  0:54   ` Johannes Berg
2008-03-29  1:01     ` Johannes Berg
2008-03-29  1:09       ` David Miller
2008-04-02  8:51         ` Oleg Nesterov
2008-03-29  1:06     ` David Miller
2008-03-29 10:02       ` Johannes Berg
2008-03-29 12:52         ` Jarek Poplawski
2008-03-29 12:50           ` Johannes Berg
2008-04-03 20:48         ` David Miller
2008-04-04 14:48           ` Johannes Berg
2008-06-11  5:40             ` David Miller
2008-06-11  7:08               ` Jarek Poplawski
2008-06-11  7:10                 ` David Miller
2008-06-11  9:36               ` Jarek Poplawski
2008-06-12  0:34                 ` David Miller
2008-06-12  6:29                   ` Jarek Poplawski
2008-06-11 10:40               ` Jarek Poplawski
2008-06-12  0:31                 ` David Miller
2008-06-11 13:14               ` Jarek Poplawski
2008-06-12  5:46               ` David Miller
2008-06-12  7:20                 ` Johannes Berg
2008-06-12  8:23                   ` David Miller
2008-06-12  6:13               ` Jarek Poplawski
2008-06-12  7:01                 ` David Miller
2008-06-12  7:47                   ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).