workqueue lockdep bug.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* workqueue lockdep bug.
@ 2006-08-14 18:33 Dave Jones
  2006-08-15  6:36 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Jones @ 2006-08-14 18:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel

Andrew,
 I merged the workqueue changes from -mm into the Fedora-devel kernel to
kill off those billion cpufreq lockdep warnings.  The bug has now mutated
into this:

(Trimmed version of log from  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223)

		Dave

 > Breaking affinity for irq 185
 > Breaking affinity for irq 193
 > Breaking affinity for irq 209
 > CPU 1 is now offline
 > lockdep: not fixing up alternatives.
 > 
 > =======================================================
 > [ INFO: possible circular locking dependency detected ]
 > 2.6.17-1.2548.fc6 #1
 > -------------------------------------------------------
 > pm-hibernate/4335 is trying to acquire lock:
 >  ((cpu_chain).rwsem){..--}, at: [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
 > 
 > but task is already holding lock:
 >  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
 > 
 > which lock already depends on the new lock.
 > 
 > 
 > the existing dependency chain (in reverse order) is:
 > 
 > -> #1 (workqueue_mutex){--..}:
 >        [<c043c08e>] lock_acquire+0x4b/0x6d
 >        [<c06126b1>] __mutex_lock_slowpath+0xbc/0x20a
 >        [<c0612820>] mutex_lock+0x21/0x24
 >        [<c0433c25>] workqueue_cpu_callback+0xfd/0x1ee
 >        [<c0614ef5>] notifier_call_chain+0x20/0x31
 >        [<c0430fb0>] blocking_notifier_call_chain+0x1d/0x2d
 >        [<c043f4c5>] _cpu_down+0x47/0x1c4
 >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
 >        [<c0445b32>] prepare_processes+0xe/0x41
 >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
 >        [<c0444e12>] enter_state+0x54/0x1b7
 >        [<c0444ffb>] state_store+0x86/0x9c
 >        [<c04a9f88>] subsys_attr_store+0x20/0x25
 >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
 >        [<c04732cb>] vfs_write+0xab/0x157
 >        [<c0473910>] sys_write+0x3b/0x60
 >        [<c0403faf>] syscall_call+0x7/0xb
 > 
 > -> #0 ((cpu_chain).rwsem){..--}:
 >        [<c043c08e>] lock_acquire+0x4b/0x6d
 >        [<c04390a0>] down_read+0x2d/0x40
 >        [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
 >        [<c043f5aa>] _cpu_down+0x12c/0x1c4
 >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
 >        [<c0445b32>] prepare_processes+0xe/0x41
 >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
 >        [<c0444e12>] enter_state+0x54/0x1b7
 >        [<c0444ffb>] state_store+0x86/0x9c
 >        [<c04a9f88>] subsys_attr_store+0x20/0x25
 >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
 >        [<c04732cb>] vfs_write+0xab/0x157
 >        [<c0473910>] sys_write+0x3b/0x60
 >        [<c0403faf>] syscall_call+0x7/0xb
 > 
 > other info that might help us debug this:
 > 
 > 2 locks held by pm-hibernate/4335:
 >  #0:  (cpu_add_remove_lock){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
 >  #1:  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
 > 
 > stack backtrace:
 >  [<c04051ee>] show_trace_log_lvl+0x58/0x159
 >  [<c04057ea>] show_trace+0xd/0x10
 >  [<c0405903>] dump_stack+0x19/0x1b
 >  [<c043b176>] print_circular_bug_tail+0x59/0x64
 >  [<c043b98e>] __lock_acquire+0x80d/0x99c
 >  [<c043c08e>] lock_acquire+0x4b/0x6d
 >  [<c04390a0>] down_read+0x2d/0x40
 >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
 >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
 >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
 >  [<c0445b32>] prepare_processes+0xe/0x41
 >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
 >  [<c0444e12>] enter_state+0x54/0x1b7
 >  [<c0444ffb>] state_store+0x86/0x9c
 >  [<c04a9f88>] subsys_attr_store+0x20/0x25
 >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
 >  [<c04732cb>] vfs_write+0xab/0x157
 >  [<c0473910>] sys_write+0x3b/0x60
 >  [<c0403faf>] syscall_call+0x7/0xb
 > DWARF2 unwinder stuck at syscall_call+0x7/0xb
 > Leftover inexact backtrace:
 >  [<c04057ea>] show_trace+0xd/0x10
 >  [<c0405903>] dump_stack+0x19/0x1b
 >  [<c043b176>] print_circular_bug_tail+0x59/0x64
 >  [<c043b98e>] __lock_acquire+0x80d/0x99c
 >  [<c043c08e>] lock_acquire+0x4b/0x6d
 >  [<c04390a0>] down_read+0x2d/0x40
 >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
 >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
 >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
 >  [<c0445b32>] prepare_processes+0xe/0x41
 >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
 >  [<c0444e12>] enter_state+0x54/0x1b7
 >  [<c0444ffb>] state_store+0x86/0x9c
 >  [<c04a9f88>] subsys_attr_store+0x20/0x25
 >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
 >  [<c04732cb>] vfs_write+0xab/0x157
 >  [<c0473910>] sys_write+0x3b/0x60
 >  [<c0403faf>] syscall_call+0x7/0xb
 > CPU1 is down
 > Stopping tasks:

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: workqueue lockdep bug.
  2006-08-14 18:33 workqueue lockdep bug Dave Jones
@ 2006-08-15  6:36 ` Andrew Morton
       [not found]   ` <3888a5cd0608161631j629ae4a2l929d9b8fead6a2d8@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2006-08-15  6:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, Arjan van de Ven, Ingo Molnar

On Mon, 14 Aug 2006 14:33:19 -0400
Dave Jones <davej@redhat.com> wrote:

> Andrew,
>  I merged the workqueue changes from -mm into the Fedora-devel kernel to
> kill off those billion cpufreq lockdep warnings.  The bug has now mutated
> into this:
> 
> (Trimmed version of log from  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223)
> 

I don't get it.

> 
>  > Breaking affinity for irq 185
>  > Breaking affinity for irq 193
>  > Breaking affinity for irq 209
>  > CPU 1 is now offline
>  > lockdep: not fixing up alternatives.
>  > 
>  > =======================================================
>  > [ INFO: possible circular locking dependency detected ]
>  > 2.6.17-1.2548.fc6 #1
>  > -------------------------------------------------------
>  > pm-hibernate/4335 is trying to acquire lock:
>  >  ((cpu_chain).rwsem){..--}, at: [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>  > 
>  > but task is already holding lock:
>  >  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>  > 
>  > which lock already depends on the new lock.
>  > 
>  > 
>  > the existing dependency chain (in reverse order) is:
>  > 
>  > -> #1 (workqueue_mutex){--..}:
>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>  >        [<c06126b1>] __mutex_lock_slowpath+0xbc/0x20a
>  >        [<c0612820>] mutex_lock+0x21/0x24
>  >        [<c0433c25>] workqueue_cpu_callback+0xfd/0x1ee
>  >        [<c0614ef5>] notifier_call_chain+0x20/0x31
>  >        [<c0430fb0>] blocking_notifier_call_chain+0x1d/0x2d
>  >        [<c043f4c5>] _cpu_down+0x47/0x1c4
>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>  >        [<c0445b32>] prepare_processes+0xe/0x41
>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>  >        [<c0444e12>] enter_state+0x54/0x1b7
>  >        [<c0444ffb>] state_store+0x86/0x9c
>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>  >        [<c04732cb>] vfs_write+0xab/0x157
>  >        [<c0473910>] sys_write+0x3b/0x60
>  >        [<c0403faf>] syscall_call+0x7/0xb

cpu_add_remove_lock -> cpu_chain.rwsem -> workqueue_mutex

>  > -> #0 ((cpu_chain).rwsem){..--}:
>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>  >        [<c04390a0>] down_read+0x2d/0x40
>  >        [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>  >        [<c043f5aa>] _cpu_down+0x12c/0x1c4
>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>  >        [<c0445b32>] prepare_processes+0xe/0x41
>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>  >        [<c0444e12>] enter_state+0x54/0x1b7
>  >        [<c0444ffb>] state_store+0x86/0x9c
>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>  >        [<c04732cb>] vfs_write+0xab/0x157
>  >        [<c0473910>] sys_write+0x3b/0x60
>  >        [<c0403faf>] syscall_call+0x7/0xb

cpu_add_remove_lock -> cpu_chain.rwsem

>  > other info that might help us debug this:
>  > 
>  > 2 locks held by pm-hibernate/4335:
>  >  #0:  (cpu_add_remove_lock){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>  >  #1:  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>  > 
>  > stack backtrace:
>  >  [<c04051ee>] show_trace_log_lvl+0x58/0x159
>  >  [<c04057ea>] show_trace+0xd/0x10
>  >  [<c0405903>] dump_stack+0x19/0x1b
>  >  [<c043b176>] print_circular_bug_tail+0x59/0x64
>  >  [<c043b98e>] __lock_acquire+0x80d/0x99c
>  >  [<c043c08e>] lock_acquire+0x4b/0x6d
>  >  [<c04390a0>] down_read+0x2d/0x40
>  >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>  >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
>  >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>  >  [<c0445b32>] prepare_processes+0xe/0x41
>  >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
>  >  [<c0444e12>] enter_state+0x54/0x1b7
>  >  [<c0444ffb>] state_store+0x86/0x9c
>  >  [<c04a9f88>] subsys_attr_store+0x20/0x25
>  >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
>  >  [<c04732cb>] vfs_write+0xab/0x157
>  >  [<c0473910>] sys_write+0x3b/0x60
>  >  [<c0403faf>] syscall_call+0x7/0xb

cpu_add_remove_lock -> cpu_chain.rwsem

I don't see anywhere where this process took workqueue_mutex.

>  > DWARF2 unwinder stuck at syscall_call+0x7/0xb

bah.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: workqueue lockdep bug.
       [not found]   ` <3888a5cd0608161631j629ae4a2l929d9b8fead6a2d8@mail.gmail.com>
@ 2006-08-17  1:07     ` Jiri Slaby
  2006-08-17  1:17       ` Jiri Slaby
  2006-09-08  9:19       ` Jiri Slaby
  0 siblings, 2 replies; 5+ messages in thread
From: Jiri Slaby @ 2006-08-17  1:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel, Arjan van de Ven, Ingo Molnar

Andrew Morton wrote:
> On Mon, 14 Aug 2006 14:33:19 -0400
> Dave Jones <davej@redhat.com> wrote:
> 
>> Andrew,
>>  I merged the workqueue changes from -mm into the Fedora-devel kernel to
>> kill off those billion cpufreq lockdep warnings.  The bug has now mutated
>> into this:
>>
>> (Trimmed version of log from  
>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223)
>>
> 
> I don't get it.

Let me extend the output a little bit:

clock = mutex_lock(cpu_add_remove_lock)
wqlock = mutex_lock(workqueue_mutex)
slock = mutex_lock(cpu_chain.rwsem)
similar for cunlock, wqunlock, sunlock.

The number after colon is linenumber, where the mutex_XXX lies.
Prints are _after_ mutex_lock and _before_ mutex_unlock calls.

So here it comes:

[   30.947289] clock: 268
[   30.947340] Disabling non-boot CPUs ...
[   30.947392] slock: 334
[   30.964622] wqlock: 689
[   30.964659] sunlock: 336

Isn't this strange for validator (lock1-lock2-unlock1 + 
(below)lock1-unlock2-unlock1)?

[   30.966762] Breaking affinity for irq 0
[   30.968116] CPU 1 is now offline
[   30.968155] lockdep: not fixing up alternatives.
[   30.968200]
[   30.968201] =======================================================
[   30.968269] [ INFO: possible circular locking dependency detected ]
[   30.968307] 2.6.18-rc4-mm1-bug #11
[   30.968342] -------------------------------------------------------


>>  > Breaking affinity for irq 185
>>  > Breaking affinity for irq 193
>>  > Breaking affinity for irq 209
>>  > CPU 1 is now offline
>>  > lockdep: not fixing up alternatives.
>>  >
>>  > =======================================================
>>  > [ INFO: possible circular locking dependency detected ]
>>  > 2.6.17-1.2548.fc6 #1
>>  > -------------------------------------------------------
>>  > pm-hibernate/4335 is trying to acquire lock:
>>  >  ((cpu_chain).rwsem){..--}, at: [<c0430fa4>] 
>> blocking_notifier_call_chain+0x11/0x2d
>>  >
>>  > but task is already holding lock:
>>  >  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>  >
>>  > which lock already depends on the new lock.
>>  >
>>  >
>>  > the existing dependency chain (in reverse order) is:
>>  >
>>  > -> #1 (workqueue_mutex){--..}:
>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>  >        [<c06126b1>] __mutex_lock_slowpath+0xbc/0x20a
>>  >        [<c0612820>] mutex_lock+0x21/0x24
>>  >        [<c0433c25>] workqueue_cpu_callback+0xfd/0x1ee
>>  >        [<c0614ef5>] notifier_call_chain+0x20/0x31
>>  >        [<c0430fb0>] blocking_notifier_call_chain+0x1d/0x2d
>>  >        [<c043f4c5>] _cpu_down+0x47/0x1c4
>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>  >        [<c0444ffb>] state_store+0x86/0x9c
>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>  >        [<c0473910>] sys_write+0x3b/0x60
>>  >        [<c0403faf>] syscall_call+0x7/0xb
> 
> cpu_add_remove_lock -> cpu_chain.rwsem -> workqueue_mutex
> 
>>  > -> #0 ((cpu_chain).rwsem){..--}:
>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>  >        [<c04390a0>] down_read+0x2d/0x40
>>  >        [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>  >        [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>  >        [<c0444ffb>] state_store+0x86/0x9c
>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>  >        [<c0473910>] sys_write+0x3b/0x60
>>  >        [<c0403faf>] syscall_call+0x7/0xb
> 
> cpu_add_remove_lock -> cpu_chain.rwsem
> 
>>  > other info that might help us debug this:
>>  >
>>  > 2 locks held by pm-hibernate/4335:
>>  >  #0:  (cpu_add_remove_lock){--..}, at: [<c0612820>] 
>> mutex_lock+0x21/0x24
>>  >  #1:  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>  >
>>  > stack backtrace:
>>  >  [<c04051ee>] show_trace_log_lvl+0x58/0x159
>>  >  [<c04057ea>] show_trace+0xd/0x10
>>  >  [<c0405903>] dump_stack+0x19/0x1b
>>  >  [<c043b176>] print_circular_bug_tail+0x59/0x64
>>  >  [<c043b98e>] __lock_acquire+0x80d/0x99c
>>  >  [<c043c08e>] lock_acquire+0x4b/0x6d
>>  >  [<c04390a0>] down_read+0x2d/0x40
>>  >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>  >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>  >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>  >  [<c0445b32>] prepare_processes+0xe/0x41
>>  >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>  >  [<c0444e12>] enter_state+0x54/0x1b7
>>  >  [<c0444ffb>] state_store+0x86/0x9c
>>  >  [<c04a9f88>] subsys_attr_store+0x20/0x25
>>  >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>  >  [<c04732cb>] vfs_write+0xab/0x157
>>  >  [<c0473910>] sys_write+0x3b/0x60
>>  >  [<c0403faf>] syscall_call+0x7/0xb

[   30.981176]  [<c0170514>] sys_write+0x47/0x6e
[   30.981249]  [<c01031fb>] syscall_call+0x7/0xb
[   30.981322]  =======================
[   30.981378] slock: 334

The one, that failed.

[   30.981882] wqunlock: 702
[   30.981939] sunlock: 336
[   30.981996] CPU1 is down
[   30.982036] cunlock: 309
[   30.982075] Stopping tasks: ============
[   31.149008] ==================|

> cpu_add_remove_lock -> cpu_chain.rwsem
> 
> I don't see anywhere where this process took workqueue_mutex.

Hope this helps?

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: workqueue lockdep bug.
  2006-08-17  1:07     ` Jiri Slaby
@ 2006-08-17  1:17       ` Jiri Slaby
  2006-09-08  9:19       ` Jiri Slaby
  1 sibling, 0 replies; 5+ messages in thread
From: Jiri Slaby @ 2006-08-17  1:17 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrew Morton, Linux Kernel, Arjan van de Ven, Ingo Molnar,
	Dave Jones

Oops, davej disappeared. Cc him back.

Jiri Slaby wrote:
> Andrew Morton wrote:
>> On Mon, 14 Aug 2006 14:33:19 -0400
>> Dave Jones <davej@redhat.com> wrote:
>>
>>> Andrew,
>>>  I merged the workqueue changes from -mm into the Fedora-devel kernel to
>>> kill off those billion cpufreq lockdep warnings.  The bug has now 
>>> mutated
>>> into this:
>>>
>>> (Trimmed version of log from  
>>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223)
>>>
>>
>> I don't get it.
> 
> Let me extend the output a little bit:
> 
> clock = mutex_lock(cpu_add_remove_lock)
> wqlock = mutex_lock(workqueue_mutex)
> slock = mutex_lock(cpu_chain.rwsem)
> similar for cunlock, wqunlock, sunlock.
> 
> The number after colon is linenumber, where the mutex_XXX lies.
> Prints are _after_ mutex_lock and _before_ mutex_unlock calls.
> 
> So here it comes:
> 
> [   30.947289] clock: 268
> [   30.947340] Disabling non-boot CPUs ...
> [   30.947392] slock: 334
> [   30.964622] wqlock: 689
> [   30.964659] sunlock: 336
> 
> Isn't this strange for validator (lock1-lock2-unlock1 + 
> (below)lock1-unlock2-unlock1)?
> 
> [   30.966762] Breaking affinity for irq 0
> [   30.968116] CPU 1 is now offline
> [   30.968155] lockdep: not fixing up alternatives.
> [   30.968200]
> [   30.968201] =======================================================
> [   30.968269] [ INFO: possible circular locking dependency detected ]
> [   30.968307] 2.6.18-rc4-mm1-bug #11
> [   30.968342] -------------------------------------------------------
> 
> 
>>>  > Breaking affinity for irq 185
>>>  > Breaking affinity for irq 193
>>>  > Breaking affinity for irq 209
>>>  > CPU 1 is now offline
>>>  > lockdep: not fixing up alternatives.
>>>  >
>>>  > =======================================================
>>>  > [ INFO: possible circular locking dependency detected ]
>>>  > 2.6.17-1.2548.fc6 #1
>>>  > -------------------------------------------------------
>>>  > pm-hibernate/4335 is trying to acquire lock:
>>>  >  ((cpu_chain).rwsem){..--}, at: [<c0430fa4>] 
>>> blocking_notifier_call_chain+0x11/0x2d
>>>  >
>>>  > but task is already holding lock:
>>>  >  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>>  >
>>>  > which lock already depends on the new lock.
>>>  >
>>>  >
>>>  > the existing dependency chain (in reverse order) is:
>>>  >
>>>  > -> #1 (workqueue_mutex){--..}:
>>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >        [<c06126b1>] __mutex_lock_slowpath+0xbc/0x20a
>>>  >        [<c0612820>] mutex_lock+0x21/0x24
>>>  >        [<c0433c25>] workqueue_cpu_callback+0xfd/0x1ee
>>>  >        [<c0614ef5>] notifier_call_chain+0x20/0x31
>>>  >        [<c0430fb0>] blocking_notifier_call_chain+0x1d/0x2d
>>>  >        [<c043f4c5>] _cpu_down+0x47/0x1c4
>>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>>  >        [<c0444ffb>] state_store+0x86/0x9c
>>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>>  >        [<c0473910>] sys_write+0x3b/0x60
>>>  >        [<c0403faf>] syscall_call+0x7/0xb
>>
>> cpu_add_remove_lock -> cpu_chain.rwsem -> workqueue_mutex
>>
>>>  > -> #0 ((cpu_chain).rwsem){..--}:
>>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >        [<c04390a0>] down_read+0x2d/0x40
>>>  >        [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>>  >        [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>>  >        [<c0444ffb>] state_store+0x86/0x9cakov
>>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>>  >        [<c0473910>] sys_write+0x3b/0x60
>>>  >        [<c0403faf>] syscall_call+0x7/0xb
>>
>> cpu_add_remove_lock -> cpu_chain.rwsem
>>
>>>  > other info that might help us debug this:
>>>  >
>>>  > 2 locks held by pm-hibernate/4335:
>>>  >  #0:  (cpu_add_remove_lock){--..}, at: [<c0612820>] 
>>> mutex_lock+0x21/0x24
>>>  >  #1:  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>>  >
>>>  > stack backtrace:
>>>  >  [<c04051ee>] show_trace_log_lvl+0x58/0x159
>>>  >  [<c04057ea>] show_trace+0xd/0x10
>>>  >  [<c0405903>] dump_stack+0x19/0x1b
>>>  >  [<c043b176>] print_circular_bug_tail+0x59/0x64
>>>  >  [<c043b98e>] __lock_acquire+0x80d/0x99c
>>>  >  [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >  [<c04390a0>] down_read+0x2d/0x40
>>>  >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>>  >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>>  >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >  [<c0445b32>] prepare_processes+0xe/0x41
>>>  >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >  [<c0444e12>] enter_state+0x54/0x1b7
>>>  >  [<c0444ffb>] state_store+0x86/0x9c
>>>  >  [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >  [<c04732cb>] vfs_write+0xab/0x157
>>>  >  [<c0473910>] sys_write+0x3b/0x60
>>>  >  [<c0403faf>] syscall_call+0x7/0xb
> 
> [   30.981176]  [<c0170514>] sys_write+0x47/0x6e
> [   30.981249]  [<c01031fb>] syscall_call+0x7/0xb
> [   30.981322]  =======================
> [   30.981378] slock: 334
> 
> The one, that failed.
> 
> [   30.981882] wqunlock: 702
> [   30.981939] sunlock: 336
> [   30.981996] CPU1 is down
> [   30.982036] cunlock: 309
> [   30.982075] Stopping tasks: ============
> [   31.149008] ==================|
> 
>> cpu_add_remove_lock -> cpu_chain.rwsem
>>
>> I don't see anywhere where this process took workqueue_mutex.
> 
> Hope this helps?
> 
> regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: workqueue lockdep bug.
  2006-08-17  1:07     ` Jiri Slaby
  2006-08-17  1:17       ` Jiri Slaby
@ 2006-09-08  9:19       ` Jiri Slaby
  1 sibling, 0 replies; 5+ messages in thread
From: Jiri Slaby @ 2006-09-08  9:19 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrew Morton, Linux Kernel, Arjan van de Ven, Ingo Molnar,
	Dave Jones

Jiri Slaby wrote:
> Andrew Morton wrote:
>> On Mon, 14 Aug 2006 14:33:19 -0400
>> Dave Jones <davej@redhat.com> wrote:
>>
>>> Andrew,
>>>  I merged the workqueue changes from -mm into the Fedora-devel kernel to
>>> kill off those billion cpufreq lockdep warnings.  The bug has now 
>>> mutated
>>> into this:
>>>
>>> (Trimmed version of log from  
>>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223)
>>>
>>
>> I don't get it.
> 
> Let me extend the output a little bit:
> 
> clock = mutex_lock(cpu_add_remove_lock)
> wqlock = mutex_lock(workqueue_mutex)
> slock = mutex_lock(cpu_chain.rwsem)
> similar for cunlock, wqunlock, sunlock.
> 
> The number after colon is linenumber, where the mutex_XXX lies.
> Prints are _after_ mutex_lock and _before_ mutex_unlock calls.
> 
> So here it comes:
> 
> [   30.947289] clock: 268
> [   30.947340] Disabling non-boot CPUs ...
> [   30.947392] slock: 334
> [   30.964622] wqlock: 689
> [   30.964659] sunlock: 336
> 
> Isn't this strange for validator (lock1-lock2-unlock1 + 
> (below)lock1-unlock2-unlock1)?
> 
> [   30.966762] Breaking affinity for irq 0
> [   30.968116] CPU 1 is now offline
> [   30.968155] lockdep: not fixing up alternatives.
> [   30.968200]
> [   30.968201] =======================================================
> [   30.968269] [ INFO: possible circular locking dependency detected ]
> [   30.968307] 2.6.18-rc4-mm1-bug #11
> [   30.968342] -------------------------------------------------------
> 
> 
>>>  > Breaking affinity for irq 185
>>>  > Breaking affinity for irq 193
>>>  > Breaking affinity for irq 209
>>>  > CPU 1 is now offline
>>>  > lockdep: not fixing up alternatives.
>>>  >
>>>  > =======================================================
>>>  > [ INFO: possible circular locking dependency detected ]
>>>  > 2.6.17-1.2548.fc6 #1
>>>  > -------------------------------------------------------
>>>  > pm-hibernate/4335 is trying to acquire lock:
>>>  >  ((cpu_chain).rwsem){..--}, at: [<c0430fa4>] 
>>> blocking_notifier_call_chain+0x11/0x2d
>>>  >
>>>  > but task is already holding lock:
>>>  >  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>>  >
>>>  > which lock already depends on the new lock.
>>>  >
>>>  >
>>>  > the existing dependency chain (in reverse order) is:
>>>  >
>>>  > -> #1 (workqueue_mutex){--..}:
>>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >        [<c06126b1>] __mutex_lock_slowpath+0xbc/0x20a
>>>  >        [<c0612820>] mutex_lock+0x21/0x24
>>>  >        [<c0433c25>] workqueue_cpu_callback+0xfd/0x1ee
>>>  >        [<c0614ef5>] notifier_call_chain+0x20/0x31
>>>  >        [<c0430fb0>] blocking_notifier_call_chain+0x1d/0x2d
>>>  >        [<c043f4c5>] _cpu_down+0x47/0x1c4
>>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>>  >        [<c0444ffb>] state_store+0x86/0x9c
>>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>>  >        [<c0473910>] sys_write+0x3b/0x60
>>>  >        [<c0403faf>] syscall_call+0x7/0xb
>>
>> cpu_add_remove_lock -> cpu_chain.rwsem -> workqueue_mutex
>>
>>>  > -> #0 ((cpu_chain).rwsem){..--}:
>>>  >        [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >        [<c04390a0>] down_read+0x2d/0x40
>>>  >        [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>>  >        [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>>  >        [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >        [<c0445b32>] prepare_processes+0xe/0x41
>>>  >        [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >        [<c0444e12>] enter_state+0x54/0x1b7
>>>  >        [<c0444ffb>] state_store+0x86/0x9c
>>>  >        [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >        [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >        [<c04732cb>] vfs_write+0xab/0x157
>>>  >        [<c0473910>] sys_write+0x3b/0x60
>>>  >        [<c0403faf>] syscall_call+0x7/0xb
>>
>> cpu_add_remove_lock -> cpu_chain.rwsem
>>
>>>  > other info that might help us debug this:
>>>  >
>>>  > 2 locks held by pm-hibernate/4335:
>>>  >  #0:  (cpu_add_remove_lock){--..}, at: [<c0612820>] 
>>> mutex_lock+0x21/0x24
>>>  >  #1:  (workqueue_mutex){--..}, at: [<c0612820>] mutex_lock+0x21/0x24
>>>  >
>>>  > stack backtrace:
>>>  >  [<c04051ee>] show_trace_log_lvl+0x58/0x159
>>>  >  [<c04057ea>] show_trace+0xd/0x10
>>>  >  [<c0405903>] dump_stack+0x19/0x1b
>>>  >  [<c043b176>] print_circular_bug_tail+0x59/0x64
>>>  >  [<c043b98e>] __lock_acquire+0x80d/0x99c
>>>  >  [<c043c08e>] lock_acquire+0x4b/0x6d
>>>  >  [<c04390a0>] down_read+0x2d/0x40
>>>  >  [<c0430fa4>] blocking_notifier_call_chain+0x11/0x2d
>>>  >  [<c043f5aa>] _cpu_down+0x12c/0x1c4
>>>  >  [<c043f805>] disable_nonboot_cpus+0x9b/0x11a
>>>  >  [<c0445b32>] prepare_processes+0xe/0x41
>>>  >  [<c0445d87>] pm_suspend_disk+0x9/0xf3
>>>  >  [<c0444e12>] enter_state+0x54/0x1b7
>>>  >  [<c0444ffb>] state_store+0x86/0x9c
>>>  >  [<c04a9f88>] subsys_attr_store+0x20/0x25
>>>  >  [<c04aa08c>] sysfs_write_file+0xab/0xd1
>>>  >  [<c04732cb>] vfs_write+0xab/0x157
>>>  >  [<c0473910>] sys_write+0x3b/0x60
>>>  >  [<c0403faf>] syscall_call+0x7/0xb
> 
> [   30.981176]  [<c0170514>] sys_write+0x47/0x6e
> [   30.981249]  [<c01031fb>] syscall_call+0x7/0xb
> [   30.981322]  =======================
> [   30.981378] slock: 334
> 
> The one, that failed.
> 
> [   30.981882] wqunlock: 702
> [   30.981939] sunlock: 336
> [   30.981996] CPU1 is down
> [   30.982036] cunlock: 309
> [   30.982075] Stopping tasks: ============
> [   31.149008] ==================|
> 
>> cpu_add_remove_lock -> cpu_chain.rwsem
>>
>> I don't see anywhere where this process took workqueue_mutex.
> 
> Hope this helps?

I can't see any difference in 2.6.18-rc6-mm1. Was this somehow processed?

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-09-08  9:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-14 18:33 workqueue lockdep bug Dave Jones
2006-08-15  6:36 ` Andrew Morton
     [not found]   ` <3888a5cd0608161631j629ae4a2l929d9b8fead6a2d8@mail.gmail.com>
2006-08-17  1:07     ` Jiri Slaby
2006-08-17  1:17       ` Jiri Slaby
2006-09-08  9:19       ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox