netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lockdep trace from rc2.
@ 2008-02-25  2:22 Dave Jones
  2008-02-25 10:46 ` Johannes Berg
  2008-02-27  2:13 ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Jones @ 2008-02-25  2:22 UTC (permalink / raw)
  To: netdev

https://bugzilla.redhat.com/show_bug.cgi?id=431038 has some more info,
but the trace is below...
I'll get an rc3 kernel built and ask the user to retest, but in case this
isn't a known problem, I'm forwarding this here.

	Dave
 
Feb 24 17:53:21 cirithungol kernel: =======================================================
Feb 24 17:53:21 cirithungol kernel: [ INFO: possible circular locking dependency detected ]
Feb 24 17:53:21 cirithungol kernel: 2.6.25-0.54.rc2.fc9 #1
Feb 24 17:53:21 cirithungol kernel: -------------------------------------------------------
Feb 24 17:53:21 cirithungol kernel: ip/10650 is trying to acquire lock:
Feb 24 17:53:21 cirithungol kernel:  (events){--..}, at: [<c0436f9a>] flush_workqueue+0x0/0x85
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: but task is already holding lock:
Feb 24 17:53:21 cirithungol kernel:  (rtnl_mutex){--..}, at: [<c05cea31>] rtnetlink_rcv+0x12/0x26
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: which lock already depends on the new lock.
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: the existing dependency chain (in reverse order) is:
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: -> #2 (rtnl_mutex){--..}:
Feb 24 17:53:21 cirithungol kernel:        [<c04458f7>] __lock_acquire+0xa7c/0xbf4
Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
Feb 24 17:53:21 cirithungol kernel:        [<c04415dc>] tick_program_event+0x31/0x55
Feb 24 17:53:21 cirithungol kernel:        [<c0445ad9>] lock_acquire+0x6a/0x90
Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
Feb 24 17:53:21 cirithungol kernel:        [<c0638d21>] mutex_lock_nested+0xdb/0x271
Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
Feb 24 17:53:21 cirithungol kernel:last message repeated 2 times
Feb 24 17:53:21 cirithungol kernel:        [<c05cf755>] linkwatch_event+0x8/0x22
Feb 24 17:53:21 cirithungol kernel:        [<c043675c>] run_workqueue+0xd3/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c043671a>] run_workqueue+0x91/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c05cf74d>] linkwatch_event+0x0/0x22
Feb 24 17:53:21 cirithungol kernel:        [<c04368e0>] worker_thread+0xb6/0xc2
Feb 24 17:53:21 cirithungol kernel:        [<c0439733>] autoremove_wake_function+0x0/0x33
Feb 24 17:53:21 cirithungol kernel:        [<c043682a>] worker_thread+0x0/0xc2
Feb 24 17:53:21 cirithungol kernel:        [<c04394e2>] kthread+0x3b/0x61
Feb 24 17:53:21 cirithungol kernel:        [<c04394a7>] kthread+0x0/0x61
Feb 24 17:53:21 cirithungol kernel:        [<c0406a1b>] kernel_thread_helper+0x7/0x10
Feb 24 17:53:21 cirithungol kernel:        [<ffffffff>] 0xffffffff
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: -> #1 ((linkwatch_work).work){--..}:
Feb 24 17:53:21 cirithungol kernel:        [<c04458f7>] __lock_acquire+0xa7c/0xbf4
Feb 24 17:53:21 cirithungol kernel:        [<c043671a>] run_workqueue+0x91/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c0445ad9>] lock_acquire+0x6a/0x90
Feb 24 17:53:21 cirithungol kernel:        [<c043671a>] run_workqueue+0x91/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c0436756>] run_workqueue+0xcd/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c043671a>] run_workqueue+0x91/0x1a1
Feb 24 17:53:21 cirithungol kernel:        [<c05cf74d>] linkwatch_event+0x0/0x22
Feb 24 17:53:21 cirithungol kernel:        [<c04368e0>] worker_thread+0xb6/0xc2
Feb 24 17:53:21 cirithungol kernel:        [<c0439733>] autoremove_wake_function+0x0/0x33
Feb 24 17:53:21 cirithungol kernel:        [<c043682a>] worker_thread+0x0/0xc2
Feb 24 17:53:21 cirithungol kernel:        [<c04394e2>] kthread+0x3b/0x61
Feb 24 17:53:21 cirithungol kernel:        [<c04394a7>] kthread+0x0/0x61
Feb 24 17:53:21 cirithungol kernel:        [<c0406a1b>] kernel_thread_helper+0x7/0x10
Feb 24 17:53:21 cirithungol kernel:        [<ffffffff>] 0xffffffff
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: -> #0 (events){--..}:
Feb 24 17:53:21 cirithungol kernel:        [<c0444ad8>] print_circular_bug_entry+0x39/0x43
Feb 24 17:53:21 cirithungol kernel:        [<c0445816>] __lock_acquire+0x99b/0xbf4
Feb 24 17:53:21 cirithungol kernel:        [<c040a354>] native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:        [<c0445ad9>] lock_acquire+0x6a/0x90
Feb 24 17:53:21 cirithungol kernel:        [<c0436f9a>] flush_workqueue+0x0/0x85
Feb 24 17:53:21 cirithungol kernel:        [<c0436fde>] flush_workqueue+0x44/0x85
Feb 24 17:53:21 cirithungol kernel:        [<c0436f9a>] flush_workqueue+0x0/0x85
Feb 24 17:53:21 cirithungol kernel:        [<c043702c>] flush_scheduled_work+0xd/0xf
Feb 24 17:53:21 cirithungol kernel:        [<f8f4380a>] tulip_down+0x20/0x1a3 [tulip]
Feb 24 17:53:21 cirithungol kernel:        [<c044495b>] trace_hardirqs_on+0xe9/0x10a
Feb 24 17:53:21 cirithungol kernel:        [<c05d5103>] dev_deactivate+0xb1/0xde
Feb 24 17:53:21 cirithungol kernel:        [<f8f442b5>] tulip_close+0x24/0xd6 [tulip]
Feb 24 17:53:21 cirithungol kernel:        [<c05c7265>] dev_close+0x52/0x6f
Feb 24 17:53:21 cirithungol kernel:        [<c05c6fbc>] dev_change_flags+0x9f/0x152
Feb 24 17:53:21 cirithungol kernel:        [<c05cdb5e>] do_setlink+0x258/0x34a
Feb 24 17:53:21 cirithungol kernel:        [<c05cee5b>] rtnl_newlink+0x257/0x3ad
Feb 24 17:53:21 cirithungol kernel:        [<c05cec70>] rtnl_newlink+0x6c/0x3ad
Feb 24 17:53:21 cirithungol kernel:        [<c05cecb1>] rtnl_newlink+0xad/0x3ad
Feb 24 17:53:21 cirithungol kernel:        [<c04d880c>] selinux_netlink_recv+0x4d/0x57
Feb 24 17:53:21 cirithungol kernel:        [<c05cec04>] rtnl_newlink+0x0/0x3ad
Feb 24 17:53:21 cirithungol kernel:        [<c05cebea>] rtnetlink_rcv_msg+0x1a5/0x1bf
Feb 24 17:53:21 cirithungol kernel:        [<c05cea45>] rtnetlink_rcv_msg+0x0/0x1bf
Feb 24 17:53:21 cirithungol kernel:        [<c05dbcca>] netlink_rcv_skb+0x30/0x86
Feb 24 17:53:21 cirithungol kernel:        [<c05cea3d>] rtnetlink_rcv+0x1e/0x26
Feb 24 17:53:21 cirithungol kernel:        [<c05db7ee>] netlink_unicast+0x1b7/0x215
Feb 24 17:53:21 cirithungol kernel:        [<c05dbaa4>] netlink_sendmsg+0x258/0x265
Feb 24 17:53:21 cirithungol kernel:        [<c05bbd4b>] sock_sendmsg+0xde/0xf9
Feb 24 17:53:21 cirithungol kernel:        [<c0439733>] autoremove_wake_function+0x0/0x33
Feb 24 17:53:21 cirithungol kernel:        [<c040a354>] native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:        [<c04fec74>] copy_from_user+0x39/0x121
Feb 24 17:53:21 cirithungol kernel:        [<c05c2755>] verify_iovec+0x40/0x6f
Feb 24 17:53:21 cirithungol kernel:        [<c05bbea5>] sys_sendmsg+0x13f/0x192
Feb 24 17:53:21 cirithungol kernel:        [<c040a086>] sched_clock+0x8/0xb
Feb 24 17:53:21 cirithungol kernel:        [<c04431f1>] lock_release_holdtime+0x1a/0x115
Feb 24 17:53:21 cirithungol kernel:        [<c04447d3>] mark_held_locks+0x4e/0x66
Feb 24 17:53:21 cirithungol kernel:        [<c0484cd7>] __slab_alloc+0xc7/0x506
Feb 24 17:53:21 cirithungol kernel:        [<c05bd901>] release_sock+0xac/0xb4
Feb 24 17:53:21 cirithungol kernel:        [<c04770d8>] __vma_link+0x6e/0x73
Feb 24 17:53:21 cirithungol kernel:        [<c047712c>] vma_link+0x4f/0xc4
Feb 24 17:53:21 cirithungol kernel:        [<c040a354>] native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:        [<c05bcdd9>] sys_socketcall+0x16b/0x186
Feb 24 17:53:21 cirithungol kernel:        [<c0405d3e>] syscall_call+0x7/0xb
Feb 24 17:53:21 cirithungol kernel:        [<ffffffff>] 0xffffffff
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: other info that might help us debug this:
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: 1 lock held by ip/10650:
Feb 24 17:53:21 cirithungol kernel:  #0:  (rtnl_mutex){--..}, at: [<c05cea31>] rtnetlink_rcv+0x12/0x26
Feb 24 17:53:21 cirithungol kernel: 
Feb 24 17:53:21 cirithungol kernel: stack backtrace:
Feb 24 17:53:21 cirithungol kernel: Pid: 10650, comm: ip Not tainted 2.6.25-0.54.rc2.fc9 #1
Feb 24 17:53:21 cirithungol kernel:  [<c0444c65>] print_circular_bug_tail+0x5b/0x66
Feb 24 17:53:21 cirithungol kernel:  [<c0444ad8>] ? print_circular_bug_entry+0x39/0x43
Feb 24 17:53:21 cirithungol kernel:  [<c0445816>] __lock_acquire+0x99b/0xbf4
Feb 24 17:53:21 cirithungol kernel:  [<c040a354>] ? native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:  [<c0445ad9>] lock_acquire+0x6a/0x90
Feb 24 17:53:21 cirithungol kernel:  [<c0436f9a>] ? flush_workqueue+0x0/0x85
Feb 24 17:53:21 cirithungol kernel:  [<c0436fde>] flush_workqueue+0x44/0x85
Feb 24 17:53:21 cirithungol kernel:  [<c0436f9a>] ? flush_workqueue+0x0/0x85
Feb 24 17:53:21 cirithungol kernel:  [<c043702c>] flush_scheduled_work+0xd/0xf
Feb 24 17:53:21 cirithungol kernel:  [<f8f4380a>] tulip_down+0x20/0x1a3 [tulip]
Feb 24 17:53:21 cirithungol kernel:  [<c044495b>] ? trace_hardirqs_on+0xe9/0x10a
Feb 24 17:53:21 cirithungol kernel:  [<c05d5103>] ? dev_deactivate+0xb1/0xde
Feb 24 17:53:21 cirithungol kernel:  [<f8f442b5>] tulip_close+0x24/0xd6 [tulip]
Feb 24 17:53:21 cirithungol kernel:  [<c05c7265>] dev_close+0x52/0x6f
Feb 24 17:53:21 cirithungol kernel:  [<c05c6fbc>] dev_change_flags+0x9f/0x152
Feb 24 17:53:21 cirithungol kernel:  [<c05cdb5e>] do_setlink+0x258/0x34a
Feb 24 17:53:21 cirithungol kernel:  [<c05cee5b>] rtnl_newlink+0x257/0x3ad
Feb 24 17:53:21 cirithungol kernel:  [<c05cec70>] ? rtnl_newlink+0x6c/0x3ad
Feb 24 17:53:21 cirithungol kernel:  [<c05cecb1>] ? rtnl_newlink+0xad/0x3ad
Feb 24 17:53:21 cirithungol kernel:  [<c04d880c>] ? selinux_netlink_recv+0x4d/0x57
Feb 24 17:53:21 cirithungol kernel:  [<c05cec04>] ? rtnl_newlink+0x0/0x3ad
Feb 24 17:53:21 cirithungol kernel:  [<c05cebea>] rtnetlink_rcv_msg+0x1a5/0x1bf
Feb 24 17:53:21 cirithungol kernel:  [<c05cea45>] ? rtnetlink_rcv_msg+0x0/0x1bf
Feb 24 17:53:21 cirithungol kernel:  [<c05dbcca>] netlink_rcv_skb+0x30/0x86
Feb 24 17:53:21 cirithungol kernel:  [<c05cea3d>] rtnetlink_rcv+0x1e/0x26
Feb 24 17:53:21 cirithungol kernel:  [<c05db7ee>] netlink_unicast+0x1b7/0x215
Feb 24 17:53:21 cirithungol kernel:  [<c05dbaa4>] netlink_sendmsg+0x258/0x265
Feb 24 17:53:21 cirithungol kernel:  [<c05bbd4b>] sock_sendmsg+0xde/0xf9
Feb 24 17:53:21 cirithungol kernel:  [<c0439733>] ? autoremove_wake_function+0x0/0x33
Feb 24 17:53:21 cirithungol kernel:  [<c040a354>] ? native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:  [<c04fec74>] ? copy_from_user+0x39/0x121
Feb 24 17:53:21 cirithungol kernel:  [<c05c2755>] ? verify_iovec+0x40/0x6f
Feb 24 17:53:21 cirithungol kernel:  [<c05bbea5>] sys_sendmsg+0x13f/0x192
Feb 24 17:53:21 cirithungol kernel:  [<c040a086>] ? sched_clock+0x8/0xb
Feb 24 17:53:21 cirithungol kernel:  [<c04431f1>] ? lock_release_holdtime+0x1a/0x115
Feb 24 17:53:21 cirithungol kernel:  [<c04447d3>] ? mark_held_locks+0x4e/0x66
Feb 24 17:53:21 cirithungol kernel:  [<c0484cd7>] ? __slab_alloc+0xc7/0x506
Feb 24 17:53:21 cirithungol kernel:  [<c05bd901>] ? release_sock+0xac/0xb4
Feb 24 17:53:21 cirithungol kernel:  [<c04770d8>] ? __vma_link+0x6e/0x73
Feb 24 17:53:21 cirithungol kernel:  [<c047712c>] ? vma_link+0x4f/0xc4
Feb 24 17:53:21 cirithungol kernel:  [<c040a354>] ? native_sched_clock+0xb5/0xd1
Feb 24 17:53:21 cirithungol kernel:  [<c05bcdd9>] sys_socketcall+0x16b/0x186
Feb 24 17:53:21 cirithungol kernel:  [<c0405d3e>] syscall_call+0x7/0xb
Feb 24 17:53:21 cirithungol kernel:  =======================

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-25  2:22 lockdep trace from rc2 Dave Jones
@ 2008-02-25 10:46 ` Johannes Berg
  2008-02-27 11:44   ` Johannes Berg
  2008-02-27  2:13 ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2008-02-25 10:46 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 3189 bytes --]


On Sun, 2008-02-24 at 21:22 -0500, Dave Jones wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=431038 has some more info,
> but the trace is below...
> I'll get an rc3 kernel built and ask the user to retest, but in case this
> isn't a known problem, I'm forwarding this here.

I can't fix it but I can explain it.

> Feb 24 17:53:21 cirithungol kernel: ip/10650 is trying to acquire lock:
> Feb 24 17:53:21 cirithungol kernel:  (events){--..}, at: [<c0436f9a>] flush_workqueue+0x0/0x85
> Feb 24 17:53:21 cirithungol kernel: 
> Feb 24 17:53:21 cirithungol kernel: but task is already holding lock:
> Feb 24 17:53:21 cirithungol kernel:  (rtnl_mutex){--..}, at: [<c05cea31>] rtnetlink_rcv+0x12/0x26
> Feb 24 17:53:21 cirithungol kernel: 
> Feb 24 17:53:21 cirithungol kernel: which lock already depends on the new lock.

What's happening here is that the linkwatch_work runs on the generic
schedule_work() workqueue.

> Feb 24 17:53:21 cirithungol kernel: -> #1 ((linkwatch_work).work){--..}:

The function that is called is linkwatch_event(), which acquires the
RTNL as you can see here:

> Feb 24 17:53:21 cirithungol kernel: -> #2 (rtnl_mutex){--..}:
> Feb 24 17:53:21 cirithungol kernel:        [<c04458f7>] __lock_acquire+0xa7c/0xbf4
> Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
> Feb 24 17:53:21 cirithungol kernel:        [<c04415dc>] tick_program_event+0x31/0x55
> Feb 24 17:53:21 cirithungol kernel:        [<c0445ad9>] lock_acquire+0x6a/0x90
> Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
> Feb 24 17:53:21 cirithungol kernel:        [<c0638d21>] mutex_lock_nested+0xdb/0x271
> Feb 24 17:53:21 cirithungol kernel:        [<c05cea1d>] rtnl_lock+0xf/0x11
> Feb 24 17:53:21 cirithungol kernel:last message repeated 2 times
> Feb 24 17:53:21 cirithungol kernel:        [<c05cf755>] linkwatch_event+0x8/0x22

The problem with that is that tulip_down() calls flush_scheduled_work()
while holding the RTNL:

> Feb 24 17:53:21 cirithungol kernel:        [<c0436f9a>] flush_workqueue+0x0/0x85
> Feb 24 17:53:21 cirithungol kernel:        [<c043702c>] flush_scheduled_work+0xd/0xf
> Feb 24 17:53:21 cirithungol kernel:        [<f8f4380a>] tulip_down+0x20/0x1a3 [tulip]
[...]
> Feb 24 17:53:21 cirithungol kernel:        [<c05cea3d>] rtnetlink_rcv+0x1e/0x26

(rtnetlink_rcv will acquire the RTNL)


The deadlock that can now happen is that linkwatch_work is scheduled on
the workqueue but not running yet. During tulip_down(),
flush_scheduled_work() is called which will wait for everything that is
scheduled to complete. Among those things could be linkwatch_event()
which will start running and try to acquire the RTNL. Because that is
already locked it will wait for the RTNL, but on the other hand we're
waiting for linkwatch_event() to finish while holding the RTNL.

The fix here would most likely be to not use flush_scheduled_work() but
rather cancel_work_sync().

This should be a correct change afaict, unless tulip has more work
structs than the media work.

@@ tulip_down
-	flush_scheduled_work();
+	cancel_work_sync(&tp->media_work);

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-25  2:22 lockdep trace from rc2 Dave Jones
  2008-02-25 10:46 ` Johannes Berg
@ 2008-02-27  2:13 ` David Miller
  2008-02-27  3:51   ` Stephen Hemminger
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2008-02-27  2:13 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Sun, 24 Feb 2008 21:22:37 -0500

> https://bugzilla.redhat.com/show_bug.cgi?id=431038 has some more info,
> but the trace is below...
> I'll get an rc3 kernel built and ask the user to retest, but in case this
> isn't a known problem, I'm forwarding this here.

So basically, tulip's ->stop() calls tulip_down() which does a
flush_scheduled_work().

Lockdep is saying that the RTNL mutex (which is held when we
call the driver's ->stop() method) conflicts with that work
queue lock that flush_scheduled_work() takes.

Maybe this has something to do with how linkwatch works, but
I can't say that I've seen this before nor do I quite understand
what lockdep doesn't exactly like here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-27  2:13 ` David Miller
@ 2008-02-27  3:51   ` Stephen Hemminger
  2008-02-27  3:54     ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2008-02-27  3:51 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev

On Tue, 26 Feb 2008 18:13:54 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Dave Jones <davej@codemonkey.org.uk>
> Date: Sun, 24 Feb 2008 21:22:37 -0500
> 
> > https://bugzilla.redhat.com/show_bug.cgi?id=431038 has some more info,
> > but the trace is below...
> > I'll get an rc3 kernel built and ask the user to retest, but in case this
> > isn't a known problem, I'm forwarding this here.
> 
> So basically, tulip's ->stop() calls tulip_down() which does a
> flush_scheduled_work().
> 
> Lockdep is saying that the RTNL mutex (which is held when we
> call the driver's ->stop() method) conflicts with that work
> queue lock that flush_scheduled_work() takes.
> 
> Maybe this has something to do with how linkwatch works, but
> I can't say that I've seen this before nor do I quite understand
> what lockdep doesn't exactly like here.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This has come up before in other drivers,
you can't call flush_scheduled_work under RTNL.
See:  http://article.gmane.org/gmane.linux.network/86941

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-27  3:51   ` Stephen Hemminger
@ 2008-02-27  3:54     ` Stephen Hemminger
  2008-02-27  6:40       ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2008-02-27  3:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, davej, netdev

On Tue, 26 Feb 2008 19:51:57 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Tue, 26 Feb 2008 18:13:54 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Dave Jones <davej@codemonkey.org.uk>
> > Date: Sun, 24 Feb 2008 21:22:37 -0500
> > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=431038 has some more info,
> > > but the trace is below...
> > > I'll get an rc3 kernel built and ask the user to retest, but in case this
> > > isn't a known problem, I'm forwarding this here.
> > 
> > So basically, tulip's ->stop() calls tulip_down() which does a
> > flush_scheduled_work().
> > 
> > Lockdep is saying that the RTNL mutex (which is held when we
> > call the driver's ->stop() method) conflicts with that work
> > queue lock that flush_scheduled_work() takes.
> > 
> > Maybe this has something to do with how linkwatch works, but
> > I can't say that I've seen this before nor do I quite understand
> > what lockdep doesn't exactly like here.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> This has come up before in other drivers,
> you can't call flush_scheduled_work under RTNL.

Correct link: http://kerneltrap.org/mailarchive/linux-kernel/2007/4/11/76312

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-27  3:54     ` Stephen Hemminger
@ 2008-02-27  6:40       ` David Miller
  2008-02-27 11:40         ` Johannes Berg
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2008-02-27  6:40 UTC (permalink / raw)
  To: shemminger; +Cc: davej, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 26 Feb 2008 19:54:00 -0800

> > This has come up before in other drivers,
> > you can't call flush_scheduled_work under RTNL.
> 
> Correct link: http://kerneltrap.org/mailarchive/linux-kernel/2007/4/11/76312

Ok.

When I first looked at Dave's lockdep trace I grepped around and there
are a bunch of drivers that call flush_scheduled_work like this.

It seems that the common case is a driver that shares a lot of code
between the suspend and ->stop paths.  Tulip is just such a case.

I can't tell if it's legal to just move the flush_scheduled_work()
call out of tulip_down() and into it's suspend function.  If it
needs this for suspend it probably needs it for ->stop() too.

Maybe it could do something similar to the 8139too fix at the above
URL, but this is quite cumbersome if you ask me.

Anyways, could someone please work on a fix for this stuff?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-27  6:40       ` David Miller
@ 2008-02-27 11:40         ` Johannes Berg
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Berg @ 2008-02-27 11:40 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, davej, netdev

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]


> > Correct link: http://kerneltrap.org/mailarchive/linux-kernel/2007/4/11/76312

Also I just posted an explanation in this thread:

http://marc.info/?l=linux-netdev&m=120393645226652&w=2

> When I first looked at Dave's lockdep trace I grepped around and there
> are a bunch of drivers that call flush_scheduled_work like this.
> 
> It seems that the common case is a driver that shares a lot of code
> between the suspend and ->stop paths.  Tulip is just such a case.

Bugger.

> Maybe it could do something similar to the 8139too fix at the above
> URL, but this is quite cumbersome if you ask me.
> 
> Anyways, could someone please work on a fix for this stuff?

I think most of the instances should use cancel_work_sync() instead of
flush_scheduled_work() unless the work absolutely must still run before
suspend/stop/whatever. See bottom of the second link above, but it needs
to be done by someone familiar with the driver.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: lockdep trace from rc2.
  2008-02-25 10:46 ` Johannes Berg
@ 2008-02-27 11:44   ` Johannes Berg
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Berg @ 2008-02-27 11:44 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, David S. Miller


> This should be a correct change afaict, unless tulip has more work
> structs than the media work.
> 
> @@ tulip_down
> -	flush_scheduled_work();
> +	cancel_work_sync(&tp->media_work);

Forgot to mention that the latter just makes sure the work won't run
afterwards, while the former also makes sure it has run if it was
scheduled. That might make a difference if this code assumes it will
run, although for tulip it doesn't looks to be the case.

johannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-02-27 11:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-25  2:22 lockdep trace from rc2 Dave Jones
2008-02-25 10:46 ` Johannes Berg
2008-02-27 11:44   ` Johannes Berg
2008-02-27  2:13 ` David Miller
2008-02-27  3:51   ` Stephen Hemminger
2008-02-27  3:54     ` Stephen Hemminger
2008-02-27  6:40       ` David Miller
2008-02-27 11:40         ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).