INFO: possible circular locking dependency at cleanup_workqueue

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* INFO: possible circular locking dependency at cleanup_workqueue_thread
@ 2009-05-12  7:59 Zdenek Kabelac
  2009-05-17  7:18 ` Ingo Molnar
  2009-05-24 18:58 ` Peter Zijlstra
  0 siblings, 2 replies; 43+ messages in thread
From: Zdenek Kabelac @ 2009-05-12  7:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi

With this kernel a4d7749be5de4a7261bcbe3c7d96c748792ec455

I've got this INFO trace during suspend:


CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.30-rc5-00097-gd665355 #59
-------------------------------------------------------
pm-suspend/12129 is trying to acquire lock:
 (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0

but task is already holding lock:
 (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
cpu_maps_update_begin+0x17/0x20

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (cpu_add_remove_lock){+.+.+.}:
       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
       [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
       [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
       [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
       [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
       [<ffffffff8027a784>] sys_delete_module+0x84/0x270
       [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #4 (setup_lock){+.+.+.}:
       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
       [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
       [<ffffffff80287af7>] stop_machine_create+0x17/0xb0
       [<ffffffff80246f06>] disable_nonboot_cpus+0x26/0x130
       [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
       [<ffffffff8027dff7>] enter_state+0x107/0x170
       [<ffffffff8027e0f9>] state_store+0x99/0x100
       [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
       [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
       [<ffffffff802e1c88>] vfs_write+0xb8/0x180
       [<ffffffff802e2771>] sys_write+0x51/0x90
       [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #3 (dpm_list_mtx){+.+.+.}:
       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
       [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
       [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
       [<ffffffff8044b9bf>] device_add+0x45f/0x570
       [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
       [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
       [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
       [<ffffffff803c4307>] local_pci_probe+0x17/0x20
       [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
       [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
       [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
       [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
       [<ffffffff8044e03e>] driver_attach+0x1e/0x20
       [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
       [<ffffffff8044e668>] driver_register+0x78/0x140
       [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
       [<ffffffffa00bd05c>] 0xffffffffa00bd05c
       [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
       [<ffffffff8027d071>] sys_init_module+0xb1/0x200
       [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (cfg80211_mutex){+.+.+.}:
       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
       [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
       [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
       [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
       [<ffffffff8025dc3a>] kthread+0x5a/0xa0
       [<ffffffff8020d23a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (reg_work){+.+.+.}:
       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff80258f12>] worker_thread+0x1e2/0x3a0
       [<ffffffff8025dc3a>] kthread+0x5a/0xa0
       [<ffffffff8020d23a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (events){+.+.+.}:
       [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
       [<ffffffff80271f38>] lock_acquire+0x98/0x140
       [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
       [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
       [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
       [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
       [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
       [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
       [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
       [<ffffffff8027dff7>] enter_state+0x107/0x170
       [<ffffffff8027e0f9>] state_store+0x99/0x100
       [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
       [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
       [<ffffffff802e1c88>] vfs_write+0xb8/0x180
       [<ffffffff802e2771>] sys_write+0x51/0x90
       [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by pm-suspend/12129:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8033f154>]
sysfs_write_file+0x44/0x160
 #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8027df44>] enter_state+0x54/0x170
 #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff80452747>] device_pm_lock+0x17/0x20
 #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
cpu_maps_update_begin+0x17/0x20

stack backtrace:
Pid: 12129, comm: pm-suspend Not tainted 2.6.30-rc5-00097-gd665355 #59
Call Trace:
 [<ffffffff8026fa3d>] print_circular_bug_tail+0x9d/0xe0
 [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
 [<ffffffff80270000>] ? mark_lock+0x3e0/0x400
 [<ffffffff80271f38>] lock_acquire+0x98/0x140
 [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
 [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
 [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
 [<ffffffff802703ed>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
 [<ffffffff8054acf1>] ? cpu_callback+0x12/0x280
 [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
 [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
 [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
 [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
 [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
 [<ffffffff8027dff7>] enter_state+0x107/0x170
 [<ffffffff8027e0f9>] state_store+0x99/0x100
 [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
 [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
 [<ffffffff802e1c88>] vfs_write+0xb8/0x180
 [<ffffffff8029088c>] ? audit_syscall_entry+0x21c/0x240
 [<ffffffff802e2771>] sys_write+0x51/0x90
 [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
CPU1 is down
Extended CMOS year: 2000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Back to C!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-12  7:59 INFO: possible circular locking dependency at cleanup_workqueue_thread Zdenek Kabelac
@ 2009-05-17  7:18 ` Ingo Molnar
  2009-05-17 10:42   ` Ming Lei
                     ` (2 more replies)
  2009-05-24 18:58 ` Peter Zijlstra
  1 sibling, 3 replies; 43+ messages in thread
From: Ingo Molnar @ 2009-05-17  7:18 UTC (permalink / raw)
  To: Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra, Oleg Nesterov
  Cc: Linux Kernel Mailing List


Cc:s added. This dependency:

> -> #2 (cfg80211_mutex){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
>        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
>        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
>        [<ffffffff8020d23a>] child_rip+0xa/0x20

is what sets the dependencies upside down.

	Ingo

* Zdenek Kabelac <zdenek.kabelac@gmail.com> wrote:

> Hi
> 
> With this kernel a4d7749be5de4a7261bcbe3c7d96c748792ec455
> 
> I've got this INFO trace during suspend:
> 
> 
> CPU 1 is now offline
> lockdep: fixing up alternatives.
> SMP alternatives: switching to UP code
> CPU0 attaching NULL sched-domain.
> CPU1 attaching NULL sched-domain.
> CPU0 attaching NULL sched-domain.
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.30-rc5-00097-gd665355 #59
> -------------------------------------------------------
> pm-suspend/12129 is trying to acquire lock:
>  (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0
> 
> but task is already holding lock:
>  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> cpu_maps_update_begin+0x17/0x20
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #5 (cpu_add_remove_lock){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
>        [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
>        [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
>        [<ffffffff8027a784>] sys_delete_module+0x84/0x270
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #4 (setup_lock){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff80287af7>] stop_machine_create+0x17/0xb0
>        [<ffffffff80246f06>] disable_nonboot_cpus+0x26/0x130
>        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>        [<ffffffff8027dff7>] enter_state+0x107/0x170
>        [<ffffffff8027e0f9>] state_store+0x99/0x100
>        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>        [<ffffffff802e2771>] sys_write+0x51/0x90
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #3 (dpm_list_mtx){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
>        [<ffffffff8044b9bf>] device_add+0x45f/0x570
>        [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
>        [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
>        [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
>        [<ffffffff803c4307>] local_pci_probe+0x17/0x20
>        [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
>        [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
>        [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
>        [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
>        [<ffffffff8044e03e>] driver_attach+0x1e/0x20
>        [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
>        [<ffffffff8044e668>] driver_register+0x78/0x140
>        [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
>        [<ffffffffa00bd05c>] 0xffffffffa00bd05c
>        [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
>        [<ffffffff8027d071>] sys_init_module+0xb1/0x200
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #2 (cfg80211_mutex){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
>        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
>        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
>        [<ffffffff8020d23a>] child_rip+0xa/0x20
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #1 (reg_work){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff80258f12>] worker_thread+0x1e2/0x3a0
>        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
>        [<ffffffff8020d23a>] child_rip+0xa/0x20
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #0 (events){+.+.+.}:
>        [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
>        [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
>        [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
>        [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
>        [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
>        [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
>        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>        [<ffffffff8027dff7>] enter_state+0x107/0x170
>        [<ffffffff8027e0f9>] state_store+0x99/0x100
>        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>        [<ffffffff802e2771>] sys_write+0x51/0x90
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> other info that might help us debug this:
> 
> 4 locks held by pm-suspend/12129:
>  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8033f154>]
> sysfs_write_file+0x44/0x160
>  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8027df44>] enter_state+0x54/0x170
>  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff80452747>] device_pm_lock+0x17/0x20
>  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> cpu_maps_update_begin+0x17/0x20
> 
> stack backtrace:
> Pid: 12129, comm: pm-suspend Not tainted 2.6.30-rc5-00097-gd665355 #59
> Call Trace:
>  [<ffffffff8026fa3d>] print_circular_bug_tail+0x9d/0xe0
>  [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
>  [<ffffffff80270000>] ? mark_lock+0x3e0/0x400
>  [<ffffffff80271f38>] lock_acquire+0x98/0x140
>  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
>  [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
>  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
>  [<ffffffff802703ed>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
>  [<ffffffff8054acf1>] ? cpu_callback+0x12/0x280
>  [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
>  [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
>  [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
>  [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
>  [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>  [<ffffffff8027dff7>] enter_state+0x107/0x170
>  [<ffffffff8027e0f9>] state_store+0x99/0x100
>  [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>  [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>  [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>  [<ffffffff8029088c>] ? audit_syscall_entry+0x21c/0x240
>  [<ffffffff802e2771>] sys_write+0x51/0x90
>  [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> CPU1 is down
> Extended CMOS year: 2000
> x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
> Back to C!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-17  7:18 ` Ingo Molnar
@ 2009-05-17 10:42   ` Ming Lei
  2009-05-17 11:18   ` Johannes Berg
  2009-05-20 12:18   ` Peter Zijlstra
  2 siblings, 0 replies; 43+ messages in thread
From: Ming Lei @ 2009-05-17 10:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra, Oleg Nesterov,
	Linux Kernel Mailing List

On Sun, 17 May 2009 09:18:34 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> Cc:s added. This dependency:
> 
> > -> #2 (cfg80211_mutex){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
> >        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
> >        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
> >        [<ffffffff8020d23a>] child_rip+0xa/0x20
> 
> is what sets the dependencies upside down.
> 
> 	Ingo
> 

If I remove cfg80211 and related wireless module, the warning becomes:


[  562.689264] [ INFO: possible circular locking dependency detected ]
[  562.689267] 2.6.30-rc5-next-20090515-00004-g346c29d-dirty #106
[  562.689270] -------------------------------------------------------
[  562.689272] pm-suspend/4672 is trying to acquire lock:
[  562.689275]  (events){+.+.+.}, at: [<ffffffff80258523>] cleanup_workqueue_thread+0x23/0xf1
[  562.689284] 
[  562.689285] but task is already holding lock:
[  562.689287]  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80247685>] disable_nonboot_cpus+0x38/0x125
[  562.689295] 
[  562.689296] which lock already depends on the new lock.
[  562.689298] 
[  562.689299] 
[  562.689300] the existing dependency chain (in reverse order) is:
[  562.689302] 
[  562.689303] -> #5 (cpu_add_remove_lock){+.+.+.}:
[  562.689308]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689313]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689318]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
[  562.689324]        [<ffffffff802475df>] cpu_maps_update_begin+0x17/0x19
[  562.689328]        [<ffffffff8025883d>] __create_workqueue_key+0xef/0x1bd
[  562.689333]        [<ffffffff80282024>] stop_machine_create+0x3f/0xa0
[  562.689338]        [<ffffffff802820a3>] stop_machine+0x1e/0x4f
[  562.689343]        [<ffffffff8028ef29>] ftrace_run_update_code+0x2b/0x7c
[  562.689347]        [<ffffffff8028f21d>] ftrace_startup+0x42/0x44
[  562.689351]        [<ffffffff8028f2a2>] register_ftrace_function+0x83/0x95
[  562.689356]        [<ffffffff8029b09b>] function_trace_init+0x91/0xa1
[  562.689360]        [<ffffffff80298017>] tracer_init+0x1d/0x22
[  562.689366]        [<ffffffff80298c49>] trace_selftest_startup_function+0x4e/0xf7
[  562.689371]        [<ffffffff8029840a>] register_tracer+0x151/0x270
[  562.689375]        [<ffffffff806e7132>] init_function_trace+0x3c/0x3e
[  562.689381]        [<ffffffff80209080>] do_one_initcall+0x75/0x18a
[  562.689386]        [<ffffffff806d0678>] kernel_init+0x138/0x18e
[  562.689391]        [<ffffffff8020c33a>] child_rip+0xa/0x20
[  562.689396]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689406] 
[  562.689407] -> #4 (setup_lock){+.+...}:
[  562.689411]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689416]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689420]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
[  562.689424]        [<ffffffff80281ffc>] stop_machine_create+0x17/0xa0
[  562.689429]        [<ffffffff80247668>] disable_nonboot_cpus+0x1b/0x125
[  562.689434]        [<ffffffff8027b235>] suspend_devices_and_enter+0xdb/0x1c0
[  562.689439]        [<ffffffff8027b4b0>] enter_state+0x168/0x1ce
[  562.689443]        [<ffffffff8027b5d2>] state_store+0xbc/0xdd
[  562.689447]        [<ffffffff803696ef>] kobj_attr_store+0x17/0x19
[  562.689452]        [<ffffffff8032d5f9>] sysfs_write_file+0xe9/0x11e
[  562.689457]        [<ffffffff802db580>] vfs_write+0xb3/0x13c
[  562.689462]        [<ffffffff802db6d7>] sys_write+0x4c/0x73
[  562.689466]        [<ffffffff8020b202>] system_call_fastpath+0x16/0x1b
[  562.689471]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689475] 
[  562.689476] -> #3 (dpm_list_mtx){+.+.+.}:
[  562.689480]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689484]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689489]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
[  562.689493]        [<ffffffff803f6395>] device_pm_add+0x4b/0xf2
[  562.689499]        [<ffffffff803ef382>] device_add+0x498/0x62a
[  562.689503]        [<ffffffff8043fd32>] netdev_register_kobject+0x7b/0x80
[  562.689509]        [<ffffffff80434761>] register_netdevice+0x2d0/0x469
[  562.689514]        [<ffffffff80434939>] register_netdev+0x3f/0x4d
[  562.689519]        [<ffffffff806f634f>] loopback_net_init+0x40/0x7d
[  562.689524]        [<ffffffff8042ee79>] register_pernet_device+0x32/0x5f
[  562.689528]        [<ffffffff806fb41a>] net_dev_init+0x143/0x1a1
[  562.689533]        [<ffffffff80209080>] do_one_initcall+0x75/0x18a
[  562.689538]        [<ffffffff806d0678>] kernel_init+0x138/0x18e
[  562.689542]        [<ffffffff8020c33a>] child_rip+0xa/0x20
[  562.689546]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689550] 
[  562.689551] -> #2 (rtnl_mutex){+.+.+.}:
[  562.689555]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689559]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689564]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
[  562.689568]        [<ffffffff8043cf2d>] rtnl_lock+0x17/0x19
[  562.689573]        [<ffffffff8043df37>] linkwatch_event+0xe/0x2c
[  562.689577]        [<ffffffff8025810b>] worker_thread+0x23a/0x359
[  562.689581]        [<ffffffff8025c258>] kthread+0x5b/0x88
[  562.689586]        [<ffffffff8020c33a>] child_rip+0xa/0x20
[  562.689590]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689594] 
[  562.689595] -> #1 ((linkwatch_work).work){+.+...}:
[  562.689599]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689603]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689607]        [<ffffffff80258102>] worker_thread+0x231/0x359
[  562.689611]        [<ffffffff8025c258>] kthread+0x5b/0x88
[  562.689616]        [<ffffffff8020c33a>] child_rip+0xa/0x20
[  562.689619]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689635] 
[  562.689636] -> #0 (events){+.+.+.}:
[  562.689639]        [<ffffffff8026e81d>] __lock_acquire+0x10de/0x171c
[  562.689644]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689648]        [<ffffffff8025854a>] cleanup_workqueue_thread+0x4a/0xf1
[  562.689652]        [<ffffffff8049b999>] workqueue_cpu_callback+0xc7/0x10a
[  562.689658]        [<ffffffff802606ff>] notifier_call_chain+0x33/0x5b
[  562.689663]        [<ffffffff802607ab>] raw_notifier_call_chain+0x14/0x16
[  562.689667]        [<ffffffff80499bca>] _cpu_down+0x280/0x29d
[  562.689672]        [<ffffffff802476ca>] disable_nonboot_cpus+0x7d/0x125
[  562.689677]        [<ffffffff8027b235>] suspend_devices_and_enter+0xdb/0x1c0
[  562.689681]        [<ffffffff8027b4b0>] enter_state+0x168/0x1ce
[  562.689685]        [<ffffffff8027b5d2>] state_store+0xbc/0xdd
[  562.689689]        [<ffffffff803696ef>] kobj_attr_store+0x17/0x19
[  562.689694]        [<ffffffff8032d5f9>] sysfs_write_file+0xe9/0x11e
[  562.689698]        [<ffffffff802db580>] vfs_write+0xb3/0x13c
[  562.689702]        [<ffffffff802db6d7>] sys_write+0x4c/0x73
[  562.689706]        [<ffffffff8020b202>] system_call_fastpath+0x16/0x1b
[  562.689711]        [<ffffffffffffffff>] 0xffffffffffffffff
[  562.689715] 
[  562.689716] other info that might help us debug this:
[  562.689717] 
[  562.689720] 4 locks held by pm-suspend/4672:
[  562.689722]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8032d54d>] sysfs_write_file+0x3d/0x11e
[  562.689729]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8027b50d>] enter_state+0x1c5/0x1ce
[  562.689736]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff803f55f9>] device_pm_lock+0x17/0x19
[  562.689742]  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80247685>] disable_nonboot_cpus+0x38/0x125
[  562.689749] 
[  562.689750] stack backtrace:
[  562.689753] Pid: 4672, comm: pm-suspend Not tainted 2.6.30-rc5-next-20090515-00004-g346c29d-dirty #106
[  562.689757] Call Trace:
[  562.689762]  [<ffffffff8026d297>] print_circular_bug_tail+0xc1/0xcc
[  562.689767]  [<ffffffff8026e81d>] __lock_acquire+0x10de/0x171c
[  562.689772]  [<ffffffff8026add2>] ? lock_release_holdtime+0x7d/0x100
[  562.689776]  [<ffffffff8026acf6>] ? get_lock_stats+0x14/0x4c
[  562.689781]  [<ffffffff804aaca2>] ? __mutex_unlock_slowpath+0x121/0x14d
[  562.689786]  [<ffffffff8026ca24>] ? trace_hardirqs_on_caller+0x12d/0x158
[  562.689791]  [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689795]  [<ffffffff80258523>] ? cleanup_workqueue_thread+0x23/0xf1
[  562.689800]  [<ffffffff8025854a>] cleanup_workqueue_thread+0x4a/0xf1
[  562.689804]  [<ffffffff80258523>] ? cleanup_workqueue_thread+0x23/0xf1
[  562.689809]  [<ffffffff8049b999>] workqueue_cpu_callback+0xc7/0x10a
[  562.689813]  [<ffffffff802606ff>] notifier_call_chain+0x33/0x5b
[  562.689818]  [<ffffffff802607ab>] raw_notifier_call_chain+0x14/0x16
[  562.689822]  [<ffffffff80499bca>] _cpu_down+0x280/0x29d
[  562.689827]  [<ffffffff802476ca>] disable_nonboot_cpus+0x7d/0x125
[  562.689832]  [<ffffffff8027b235>] suspend_devices_and_enter+0xdb/0x1c0
[  562.689836]  [<ffffffff8027b4b0>] enter_state+0x168/0x1ce
[  562.689841]  [<ffffffff8027b5d2>] state_store+0xbc/0xdd
[  562.689845]  [<ffffffff803696ef>] kobj_attr_store+0x17/0x19
[  562.689849]  [<ffffffff8032d5f9>] sysfs_write_file+0xe9/0x11e
[  562.689854]  [<ffffffff802db580>] vfs_write+0xb3/0x13c
[  562.689859]  [<ffffffff802db6d7>] sys_write+0x4c/0x73
[  562.689863]  [<ffffffff8020b202>] system_call_fastpath+0x16/0x1b



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-17  7:18 ` Ingo Molnar
  2009-05-17 10:42   ` Ming Lei
@ 2009-05-17 11:18   ` Johannes Berg
  2009-05-17 13:10     ` Ingo Molnar
  2009-05-18 19:47     ` Oleg Nesterov
  2009-05-20 12:18   ` Peter Zijlstra
  2 siblings, 2 replies; 43+ messages in thread
From: Johannes Berg @ 2009-05-17 11:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra, Oleg Nesterov,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

On Sun, 2009-05-17 at 09:18 +0200, Ingo Molnar wrote:
> Cc:s added. This dependency:

Not sure why you're not adding the cfg80211 maintainer if you think
cfg80211 causes the problem...

> > -> #2 (cfg80211_mutex){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
> >        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
> >        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
> >        [<ffffffff8020d23a>] child_rip+0xa/0x20
> 
> is what sets the dependencies upside down.

I'm also not sure how you arrived at that conclusion, I would be
interested to hear how you did. In any case, it's most definitely not
cfg80211 causing it.

Cf. this, almost identical, lockdep report for example:
http://paste.pocoo.org/show/116240/
The logical conclusion here would be to say that the rtnl is responsible
here...

As you can see from the report, the only thing cfg80211_mutex does is
register a device struct while holding it -- claiming cfg80211 (or rtnl
in the other report which behaves the same) responsibility here because
of that is totally ludicrous -- that would mean you've suddenly changed
all the locking rules so that you can no longer register devices under a
lock that you also need from a work struct executed due to
schedule_work().

I'm not entirely sure yet, but I would think the problem might be a
false positive in the workqueue code -- remember this report only
triggers because cleanup_workqueue_thread() acquires the fake lock for
the workqueue. Maybe it shouldn't do that from the CPU_POST_DEAD
notifier? Oleg, can you help me out here?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-17 11:18   ` Johannes Berg
@ 2009-05-17 13:10     ` Ingo Molnar
  2009-05-18 19:47     ` Oleg Nesterov
  1 sibling, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2009-05-17 13:10 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra, Oleg Nesterov,
	Linux Kernel Mailing List


* Johannes Berg <johannes@sipsolutions.net> wrote:

> On Sun, 2009-05-17 at 09:18 +0200, Ingo Molnar wrote:
> > Cc:s added. This dependency:
> 
> Not sure why you're not adding the cfg80211 maintainer if you 
> think cfg80211 causes the problem...

Oversight and lack of time. Why are you asking it in such an edgy 
way instead of just addig the Cc:s? Why do you want to set a 
negative tone in the thread? Do you think it results in a faster 
resolution?

> > > -> #2 (cfg80211_mutex){+.+.+.}:
> > >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> > >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> > >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> > >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> > >        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
> > >        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
> > >        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
> > >        [<ffffffff8020d23a>] child_rip+0xa/0x20
> > 
> > is what sets the dependencies upside down.
> 
> I'm also not sure how you arrived at that conclusion, I would be 
> interested to hear how you did. [...]

( no strong reason, i looked for 10 seconds and this is what popped
  up. You looked a bit deeper and found something different. )

> [...] In any case, it's most definitely not cfg80211 causing it.
> 
> Cf. this, almost identical, lockdep report for example:
> http://paste.pocoo.org/show/116240/
> The logical conclusion here would be to say that the rtnl is responsible
> here...
> 
> As you can see from the report, the only thing cfg80211_mutex does 
> is register a device struct while holding it -- claiming cfg80211 
> (or rtnl in the other report which behaves the same) 
> responsibility here because of that is totally ludicrous -- that 
> would mean you've suddenly changed all the locking rules so that 
> you can no longer register devices under a lock that you also need 
> from a work struct executed due to schedule_work().
> 
> I'm not entirely sure yet, but I would think the problem might be 
> a false positive in the workqueue code -- remember this report 
> only triggers because cleanup_workqueue_thread() acquires the fake 
> lock for the workqueue. Maybe it shouldn't do that from the 
> CPU_POST_DEAD notifier? Oleg, can you help me out here?
> 
> johannes

We can also remove the workqueue lockdep annotations if the false 
positive rate is too high.

	Ingo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-17 11:18   ` Johannes Berg
  2009-05-17 13:10     ` Ingo Molnar
@ 2009-05-18 19:47     ` Oleg Nesterov
  2009-05-18 20:00       ` Peter Zijlstra
  2009-05-19  8:51       ` Johannes Berg
  1 sibling, 2 replies; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-18 19:47 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

On 05/17, Johannes Berg wrote:
>
> I'm not entirely sure yet, but I would think the problem might be a
> false positive in the workqueue code -- remember this report only
> triggers because cleanup_workqueue_thread() acquires the fake lock for
> the workqueue.

I spent a lot of time, but I can't explain this report too :( Even
if it is false positive, I don't understand why lockdep complains.

> Maybe it shouldn't do that from the CPU_POST_DEAD
> notifier?

Well, in any case we should understand why we have the problem, before
changing the code. And CPU_POST_DEAD is not special, why should we treat
it specially and skip lock_map_acquire(wq->lockdep_map) ?


But, I am starting to suspect we have some problems with lockdep too.
OK, I can't explain what I mean... But consider this code:

	DEFINE_SPINLOCK(Z);
	DEFINE_SPINLOCK(L1);
	DEFINE_SPINLOCK(L2);

	#define L(l)	spin_lock(&l)
	#define U(l)	spin_unlock(&l)

	void t1(void)
	{
		L(L1);
		L(L2);

		U(L2);
		U(L1);
	}

	void t2(void)
	{
		L(L2);
			L(Z);
		L(L1);

		U(L1);
			U(Z);
		U(L2);
	}

	void tst(void)
	{
		t1();
		t2();
	}

We have the trivial AB-BA deadlock with L1 and L2, but lockdep says:

	[ INFO: possible circular locking dependency detected ]
	2.6.30-rc6-00043-g22ef37e-dirty #3
	-------------------------------------------------------
	perl/676 is trying to acquire lock:
	 (L1){+.+...}, at: [<ffffffff802522b8>] t2+0x28/0x50

	but task is already holding lock:
	 (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50

	which lock already depends on the new lock.


	the existing dependency chain (in reverse order) is:

	-> #2 (Z){+.+...}:

	-> #1 (L2){+.+...}:

	-> #0 (L1){+.+...}:

	other info that might help us debug this:

	2 locks held by perl/676:
	 #0:  (L2){+.+...}, at: [<ffffffff802522a0>] t2+0x10/0x50
	 #1:  (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50

This output looks obviously wrong, Z does not depend on L1 or any
other lock.

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 19:47     ` Oleg Nesterov
@ 2009-05-18 20:00       ` Peter Zijlstra
  2009-05-18 20:16         ` Oleg Nesterov
  2009-05-19  8:51       ` Johannes Berg
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-18 20:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> On 05/17, Johannes Berg wrote:
> >
> > I'm not entirely sure yet, but I would think the problem might be a
> > false positive in the workqueue code -- remember this report only
> > triggers because cleanup_workqueue_thread() acquires the fake lock for
> > the workqueue.
> 
> I spent a lot of time, but I can't explain this report too :( Even
> if it is false positive, I don't understand why lockdep complains.
> 
> > Maybe it shouldn't do that from the CPU_POST_DEAD
> > notifier?
> 
> Well, in any case we should understand why we have the problem, before
> changing the code. And CPU_POST_DEAD is not special, why should we treat
> it specially and skip lock_map_acquire(wq->lockdep_map) ?
> 
> 
> But, I am starting to suspect we have some problems with lockdep too.
> OK, I can't explain what I mean... But consider this code:
> 
> 	DEFINE_SPINLOCK(Z);
> 	DEFINE_SPINLOCK(L1);
> 	DEFINE_SPINLOCK(L2);
> 
> 	#define L(l)	spin_lock(&l)
> 	#define U(l)	spin_unlock(&l)
> 
> 	void t1(void)
> 	{
> 		L(L1);
> 		L(L2);
> 
> 		U(L2);
> 		U(L1);
> 	}

(1) L1 -> L2  

> 	void t2(void)
> 	{
> 		L(L2);
> 			L(Z);

(2) L2 -> Z

> 		L(L1);

(3) Z -> L1

> 		U(L1);
> 			U(Z);
> 		U(L2);
> 	}
> 
> 	void tst(void)
> 	{
> 		t1();
> 		t2();
> 	}
> 
> We have the trivial AB-BA deadlock with L1 and L2, but lockdep says:
> 
> 	[ INFO: possible circular locking dependency detected ]
> 	2.6.30-rc6-00043-g22ef37e-dirty #3
> 	-------------------------------------------------------
> 	perl/676 is trying to acquire lock:
> 	 (L1){+.+...}, at: [<ffffffff802522b8>] t2+0x28/0x50
> 
> 	but task is already holding lock:
> 	 (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50
> 
> 	which lock already depends on the new lock.
> 
> 
> 	the existing dependency chain (in reverse order) is:
> 
> 	-> #2 (Z){+.+...}:
> 
> 	-> #1 (L2){+.+...}:
> 
> 	-> #0 (L1){+.+...}:
> 
> 	other info that might help us debug this:
> 
> 	2 locks held by perl/676:
> 	 #0:  (L2){+.+...}, at: [<ffffffff802522a0>] t2+0x10/0x50
> 	 #1:  (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50
> 
> This output looks obviously wrong, Z does not depend on L1 or any
> other lock.

It does, L1 -> L2 -> Z as per 1 and 2
which 3 obviously reverses.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 20:00       ` Peter Zijlstra
@ 2009-05-18 20:16         ` Oleg Nesterov
  2009-05-18 20:40           ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-18 20:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On 05/18, Peter Zijlstra wrote:
>
> On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> >
> > But, I am starting to suspect we have some problems with lockdep too.
> > OK, I can't explain what I mean... But consider this code:
> >
> > 	DEFINE_SPINLOCK(Z);
> > 	DEFINE_SPINLOCK(L1);
> > 	DEFINE_SPINLOCK(L2);
> >
> > 	#define L(l)	spin_lock(&l)
> > 	#define U(l)	spin_unlock(&l)
> >
> > 	void t1(void)
> > 	{
> > 		L(L1);
> > 		L(L2);
> >
> > 		U(L2);
> > 		U(L1);
> > 	}
>
> (1) L1 -> L2
>
> > 	void t2(void)
> > 	{
> > 		L(L2);
> > 			L(Z);
>
> (2) L2 -> Z
>
> > 		L(L1);
>
> (3) Z -> L1
>
> > 		U(L1);
> > 			U(Z);
> > 		U(L2);
> > 	}
> >
> > 	void tst(void)
> > 	{
> > 		t1();
> > 		t2();
> > 	}
> >
> > We have the trivial AB-BA deadlock with L1 and L2, but lockdep says:
> >
> > 	[ INFO: possible circular locking dependency detected ]
> > 	2.6.30-rc6-00043-g22ef37e-dirty #3
> > 	-------------------------------------------------------
> > 	perl/676 is trying to acquire lock:
> > 	 (L1){+.+...}, at: [<ffffffff802522b8>] t2+0x28/0x50
> >
> > 	but task is already holding lock:
> > 	 (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50
> >
> >
> > This output looks obviously wrong, Z does not depend on L1 or any
> > other lock.
>
> It does, L1 -> L2 -> Z as per 1 and 2
> which 3 obviously reverses.

Yes, yes, I see. And, as I said, I can't explain what I mean.

I mean... The output above looks as if we take L1 and Z in wrong order.
But Z has nothing to do with this deadlock, it can't depend on any lock
from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 20:16         ` Oleg Nesterov
@ 2009-05-18 20:40           ` Peter Zijlstra
  2009-05-18 22:14             ` Oleg Nesterov
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-18 20:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy

On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> On 05/18, Peter Zijlstra wrote:
> >
> > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > >
> > > But, I am starting to suspect we have some problems with lockdep too.
> > > OK, I can't explain what I mean... But consider this code:
> > >
> > > 	DEFINE_SPINLOCK(Z);
> > > 	DEFINE_SPINLOCK(L1);
> > > 	DEFINE_SPINLOCK(L2);
> > >
> > > 	#define L(l)	spin_lock(&l)
> > > 	#define U(l)	spin_unlock(&l)
> > >
> > > 	void t1(void)
> > > 	{
> > > 		L(L1);
> > > 		L(L2);
> > >
> > > 		U(L2);
> > > 		U(L1);
> > > 	}
> >
> > (1) L1 -> L2
> >
> > > 	void t2(void)
> > > 	{
> > > 		L(L2);
> > > 			L(Z);
> >
> > (2) L2 -> Z
> >
> > > 		L(L1);
> >
> > (3) Z -> L1
> >
> > > 		U(L1);
> > > 			U(Z);
> > > 		U(L2);
> > > 	}
> > >
> > > 	void tst(void)
> > > 	{
> > > 		t1();
> > > 		t2();
> > > 	}
> > >
> > > We have the trivial AB-BA deadlock with L1 and L2, but lockdep says:
> > >
> > > 	[ INFO: possible circular locking dependency detected ]
> > > 	2.6.30-rc6-00043-g22ef37e-dirty #3
> > > 	-------------------------------------------------------
> > > 	perl/676 is trying to acquire lock:
> > > 	 (L1){+.+...}, at: [<ffffffff802522b8>] t2+0x28/0x50
> > >
> > > 	but task is already holding lock:
> > > 	 (Z){+.+...}, at: [<ffffffff802522ac>] t2+0x1c/0x50
> > >
> > >
> > > This output looks obviously wrong, Z does not depend on L1 or any
> > > other lock.
> >
> > It does, L1 -> L2 -> Z as per 1 and 2
> > which 3 obviously reverses.
> 
> Yes, yes, I see. And, as I said, I can't explain what I mean.
> 
> I mean... The output above looks as if we take L1 and Z in wrong order.
> But Z has nothing to do with this deadlock, it can't depend on any lock
> from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.

AB-BC-CA deadlock

Thread 1		Thread 2		Thread 3

L(L1)
			L(L2)
						L(Z)
L(L2)
			L(Z)
						L(L1)

And you're saying, we can't have that deadlock because we don't have the
3 separate functions?

That is, there is no concurrency on Z because its always taken under L2?

For those situations we have the spin_lock_nest_lock(X, y) annotation,
where we say, there cannot be any concurrency on x element of X, because
all such locks are always taken under y.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 20:40           ` Peter Zijlstra
@ 2009-05-18 22:14             ` Oleg Nesterov
  2009-05-19  9:13               ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-18 22:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy

On 05/18, Peter Zijlstra wrote:
>
> On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> > On 05/18, Peter Zijlstra wrote:
> > >
> > > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > > >
> > > > This output looks obviously wrong, Z does not depend on L1 or any
> > > > other lock.
> > >
> > > It does, L1 -> L2 -> Z as per 1 and 2
> > > which 3 obviously reverses.
> >
> > Yes, yes, I see. And, as I said, I can't explain what I mean.
> >
> > I mean... The output above looks as if we take L1 and Z in wrong order.
> > But Z has nothing to do with this deadlock, it can't depend on any lock
> > from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.
>
> AB-BC-CA deadlock
>
> Thread 1		Thread 2		Thread 3
>
> L(L1)
> 			L(L2)
> 						L(Z)
> L(L2)
> 			L(Z)
> 						L(L1)

Sure. Now Z really depends on L1. But if you change Thread 3 to take yet
another unique lock X under Z, then lockdep will complain that X depends
on L1, not Z.

To clarify, I do not say the output is bugggy. I only meant it could be
better. But I don't understand how to improve it.

If we return to the original bug report, perhaps cpu_add_remove_lock
has nothing to do with this problem... we could have the similar output
if device_pm_lock() is called from work_struct.

> And you're saying, we can't have that deadlock because we don't have the
> 3 separate functions?

No,

> That is, there is no concurrency on Z because its always taken under L2?

Yes, nobody else can hold Z when we take L2.

But this wasn't my point.

> For those situations we have the spin_lock_nest_lock(X, y) annotation,
> where we say, there cannot be any concurrency on x element of X, because
> all such locks are always taken under y.

We can just kill L(Z) instead of annotating, this changes nothing from
the correctness pov, we have the same deadlock. But the output becomes
very clear: L1 depends on L2.

OK, please forget. Not sure why I started this thread. Just because I
was surprised a bit when I figured out that lockdep's output does not
match my naive expectations.

Oleg.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 19:47     ` Oleg Nesterov
  2009-05-18 20:00       ` Peter Zijlstra
@ 2009-05-19  8:51       ` Johannes Berg
  2009-05-19 12:00         ` Oleg Nesterov
  1 sibling, 1 reply; 43+ messages in thread
From: Johannes Berg @ 2009-05-19  8:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:

> > Maybe it shouldn't do that from the CPU_POST_DEAD
> > notifier?
> 
> Well, in any case we should understand why we have the problem, before
> changing the code. And CPU_POST_DEAD is not special, why should we treat
> it specially and skip lock_map_acquire(wq->lockdep_map) ?

I'm not familiar enough with the code -- but what are we really trying
to do in CPU_POST_DEAD? It seems to me that at that time things must
already be off the CPU, so ...? On the other hand that calls
flush_cpu_workqueue() so it seems it would actually wait for the work to
be executed on some other CPU, within the CPU_POST_DEAD notification?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-18 22:14             ` Oleg Nesterov
@ 2009-05-19  9:13               ` Peter Zijlstra
  2009-05-19 10:49                 ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-19  9:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy

On Tue, 2009-05-19 at 00:14 +0200, Oleg Nesterov wrote:
> On 05/18, Peter Zijlstra wrote:
> >
> > On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> > > On 05/18, Peter Zijlstra wrote:
> > > >
> > > > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > > > >
> > > > > This output looks obviously wrong, Z does not depend on L1 or any
> > > > > other lock.
> > > >
> > > > It does, L1 -> L2 -> Z as per 1 and 2
> > > > which 3 obviously reverses.
> > >
> > > Yes, yes, I see. And, as I said, I can't explain what I mean.
> > >
> > > I mean... The output above looks as if we take L1 and Z in wrong order.
> > > But Z has nothing to do with this deadlock, it can't depend on any lock
> > > from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.
> >
> > AB-BC-CA deadlock
> >
> > Thread 1		Thread 2		Thread 3
> >
> > L(L1)
> > 			L(L2)
> > 						L(Z)
> > L(L2)
> > 			L(Z)
> > 						L(L1)
> 
> Sure. Now Z really depends on L1. But if you change Thread 3 to take yet
> another unique lock X under Z, then lockdep will complain that X depends
> on L1, not Z.
> 
> To clarify, I do not say the output is bugggy. I only meant it could be
> better. But I don't understand how to improve it.
> 
> If we return to the original bug report, perhaps cpu_add_remove_lock
> has nothing to do with this problem... we could have the similar output
> if device_pm_lock() is called from work_struct.
> 
> > And you're saying, we can't have that deadlock because we don't have the
> > 3 separate functions?
> 
> No,
> 
> > That is, there is no concurrency on Z because its always taken under L2?
> 
> Yes, nobody else can hold Z when we take L2.
> 
> But this wasn't my point.
> 
> > For those situations we have the spin_lock_nest_lock(X, y) annotation,
> > where we say, there cannot be any concurrency on x element of X, because
> > all such locks are always taken under y.
> 
> We can just kill L(Z) instead of annotating, this changes nothing from
> the correctness pov, we have the same deadlock. But the output becomes
> very clear: L1 depends on L2.
> 
> 
> OK, please forget. Not sure why I started this thread. Just because I
> was surprised a bit when I figured out that lockdep's output does not
> match my naive expectations.

Well, since you're quite versed in the field, I'm guessing other people
might find it even more confusing -- so it might well do to explore this
situation a bit further, if only to see if we can make lockdep output
'easier'.

There is a solution to this, Gautham suggested it a while back, we could
make lockdep scan a lock (Z) his dependencies and if in every chain a
particular other lock (L2) was taken, ignore this lock (Z) his
dependency for the circular analysis at hand.

That would mean we would not find the Z->L1 dep to violate the existing
one, because we would ignore L2->Z (because in every Z we hold L2), and
we would indeed fail on the next: L2->L1 on the next line in your
initial program.

Implementing this however might be slightly less trivial than this
explanation -- it would however rid us of the spin_lock_nest_lock()
annotation's need.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19  9:13               ` Peter Zijlstra
@ 2009-05-19 10:49                 ` Peter Zijlstra
  2009-05-19 14:53                   ` Oleg Nesterov
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-19 10:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy

On Tue, 2009-05-19 at 11:13 +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-19 at 00:14 +0200, Oleg Nesterov wrote:
> > On 05/18, Peter Zijlstra wrote:
> > >
> > > On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> > > > On 05/18, Peter Zijlstra wrote:
> > > > >
> > > > > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > > > > >
> > > > > > This output looks obviously wrong, Z does not depend on L1 or any
> > > > > > other lock.
> > > > >
> > > > > It does, L1 -> L2 -> Z as per 1 and 2
> > > > > which 3 obviously reverses.
> > > >
> > > > Yes, yes, I see. And, as I said, I can't explain what I mean.
> > > >
> > > > I mean... The output above looks as if we take L1 and Z in wrong order.
> > > > But Z has nothing to do with this deadlock, it can't depend on any lock
> > > > from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.
> > >
> > > AB-BC-CA deadlock
> > >
> > > Thread 1		Thread 2		Thread 3
> > >
> > > L(L1)
> > > 			L(L2)
> > > 						L(Z)
> > > L(L2)
> > > 			L(Z)
> > > 						L(L1)
> > 
> > Sure. Now Z really depends on L1. But if you change Thread 3 to take yet
> > another unique lock X under Z, then lockdep will complain that X depends
> > on L1, not Z.
> > 
> > To clarify, I do not say the output is bugggy. I only meant it could be
> > better. But I don't understand how to improve it.
> > 
> > If we return to the original bug report, perhaps cpu_add_remove_lock
> > has nothing to do with this problem... we could have the similar output
> > if device_pm_lock() is called from work_struct.
> > 
> > > And you're saying, we can't have that deadlock because we don't have the
> > > 3 separate functions?
> > 
> > No,
> > 
> > > That is, there is no concurrency on Z because its always taken under L2?
> > 
> > Yes, nobody else can hold Z when we take L2.
> > 
> > But this wasn't my point.
> > 
> > > For those situations we have the spin_lock_nest_lock(X, y) annotation,
> > > where we say, there cannot be any concurrency on x element of X, because
> > > all such locks are always taken under y.
> > 
> > We can just kill L(Z) instead of annotating, this changes nothing from
> > the correctness pov, we have the same deadlock. But the output becomes
> > very clear: L1 depends on L2.
> > 
> > 
> > OK, please forget. Not sure why I started this thread. Just because I
> > was surprised a bit when I figured out that lockdep's output does not
> > match my naive expectations.
> 
> Well, since you're quite versed in the field, I'm guessing other people
> might find it even more confusing -- so it might well do to explore this
> situation a bit further, if only to see if we can make lockdep output
> 'easier'.
> 
> There is a solution to this, Gautham suggested it a while back, we could
> make lockdep scan a lock (Z) his dependencies and if in every chain a
> particular other lock (L2) was taken, ignore this lock (Z) his
> dependency for the circular analysis at hand.
> 
> That would mean we would not find the Z->L1 dep to violate the existing
> one, because we would ignore L2->Z (because in every Z we hold L2), and
> we would indeed fail on the next: L2->L1 on the next line in your
> initial program.
> 
> Implementing this however might be slightly less trivial than this
> explanation -- it would however rid us of the spin_lock_nest_lock()
> annotation's need.

Ingo pointed out that that would weaken the possible deadlock detection
in that it would have to observe a Z outside of L2 before reporting the
problem, which might be a very rare, but existing, code path.

Another possible way might be to find the smallest cycle instead of just
any (the first) cycle.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19  8:51       ` Johannes Berg
@ 2009-05-19 12:00         ` Oleg Nesterov
  2009-05-19 15:33           ` Johannes Berg
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-19 12:00 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

On 05/19, Johannes Berg wrote:
>
> On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
>
> > > Maybe it shouldn't do that from the CPU_POST_DEAD
> > > notifier?
> >
> > Well, in any case we should understand why we have the problem, before
> > changing the code. And CPU_POST_DEAD is not special, why should we treat
> > it specially and skip lock_map_acquire(wq->lockdep_map) ?
>
> I'm not familiar enough with the code -- but what are we really trying
> to do in CPU_POST_DEAD? It seems to me that at that time things must
> already be off the CPU, so ...?

Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
cwq->thread.

> On the other hand that calls
> flush_cpu_workqueue() so it seems it would actually wait for the work to
> be executed on some other CPU, within the CPU_POST_DEAD notification?

Yes. Because we can't just kill cwq->thread, we can have the pending
work_structs so we have to flush.

Why can't we move these works to another CPU? We can, but this doesn't
really help. Because in any case we should at least wait for
cwq->current_work to complete.

Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
Because work->func() can sleep in get_online_cpus(), we can't flush
until we drop cpu_hotplug.lock.

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 10:49                 ` Peter Zijlstra
@ 2009-05-19 14:53                   ` Oleg Nesterov
  0 siblings, 0 replies; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-19 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy

On 05/19, Peter Zijlstra wrote:
>
> > There is a solution to this, Gautham suggested it a while back, we could
> > make lockdep scan a lock (Z) his dependencies and if in every chain a
> > particular other lock (L2) was taken, ignore this lock (Z) his
> > dependency for the circular analysis at hand.
> >
> > That would mean we would not find the Z->L1 dep to violate the existing
> > one, because we would ignore L2->Z (because in every Z we hold L2), and
> > we would indeed fail on the next: L2->L1 on the next line in your
> > initial program.
> >
> > Implementing this however might be slightly less trivial than this
> > explanation -- it would however rid us of the spin_lock_nest_lock()
> > annotation's need.
>
> Ingo pointed out that that would weaken the possible deadlock detection
> in that it would have to observe a Z outside of L2 before reporting the
> problem, which might be a very rare, but existing, code path.
>
> Another possible way might be to find the smallest cycle instead of just
> any (the first) cycle.

Yes, thanks Peter for your explanations. Not that I fully understand them,
but at least I do understand this is not trivial.

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 12:00         ` Oleg Nesterov
@ 2009-05-19 15:33           ` Johannes Berg
  2009-05-19 16:09             ` Oleg Nesterov
  2009-05-20  3:36             ` Ming Lei
  0 siblings, 2 replies; 43+ messages in thread
From: Johannes Berg @ 2009-05-19 15:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

On Tue, 2009-05-19 at 14:00 +0200, Oleg Nesterov wrote:

> > I'm not familiar enough with the code -- but what are we really trying
> > to do in CPU_POST_DEAD? It seems to me that at that time things must
> > already be off the CPU, so ...?
> 
> Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
> cwq->thread.
> 
> > On the other hand that calls
> > flush_cpu_workqueue() so it seems it would actually wait for the work to
> > be executed on some other CPU, within the CPU_POST_DEAD notification?
> 
> Yes. Because we can't just kill cwq->thread, we can have the pending
> work_structs so we have to flush.
> 
> Why can't we move these works to another CPU? We can, but this doesn't
> really help. Because in any case we should at least wait for
> cwq->current_work to complete.
> 
> Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
> Because work->func() can sleep in get_online_cpus(), we can't flush
> until we drop cpu_hotplug.lock.

Right. But exactly this happens in the hibernate case -- the hibernate
code calls kernel/cpu.c:disable_nonboot_cpus() which calls _cpu_down()
which calls raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD... Sadly,
it does so while holding the cpu_add_remove_lock, which is happens to
have the dependencies outlined in the original email...

The same happens in cpu_down() (without leading _) which you can trigger
from sysfs by manually removing the CPU, so it's not hibernate specific.

Anyway, you can have a deadlock like this:

CPU 3			CPU 2				CPU 1
							suspend/hibernate
			something:
			rtnl_lock()			device_pm_lock()
							-> mutex_lock(&dpm_list_mtx)

			mutex_lock(&dpm_list_mtx)

linkwatch_work
 -> rtnl_lock()
							disable_nonboot_cpus()
							-> flush CPU 3 workqueue

johannes


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 15:33           ` Johannes Berg
@ 2009-05-19 16:09             ` Oleg Nesterov
  2009-05-19 16:27               ` Johannes Berg
  2009-05-20  3:36             ` Ming Lei
  1 sibling, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-19 16:09 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

On 05/19, Johannes Berg wrote:
>
> On Tue, 2009-05-19 at 14:00 +0200, Oleg Nesterov wrote:
>
> > > I'm not familiar enough with the code -- but what are we really trying
> > > to do in CPU_POST_DEAD? It seems to me that at that time things must
> > > already be off the CPU, so ...?
> >
> > Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
> > cwq->thread.
> >
> > > On the other hand that calls
> > > flush_cpu_workqueue() so it seems it would actually wait for the work to
> > > be executed on some other CPU, within the CPU_POST_DEAD notification?
> >
> > Yes. Because we can't just kill cwq->thread, we can have the pending
> > work_structs so we have to flush.
> >
> > Why can't we move these works to another CPU? We can, but this doesn't
> > really help. Because in any case we should at least wait for
> > cwq->current_work to complete.
> >
> > Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
> > Because work->func() can sleep in get_online_cpus(), we can't flush
> > until we drop cpu_hotplug.lock.
>
> Right. But exactly this happens in the hibernate case --

not sure I understand your "exactly this" ;)

But your explanation of the deadlock below looks great!

> the hibernate
> code calls kernel/cpu.c:disable_nonboot_cpus() which calls _cpu_down()
> which calls raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD... Sadly,
> it does so while holding the cpu_add_remove_lock, which is happens to
> have the dependencies outlined in the original email...
>
> The same happens in cpu_down() (without leading _) which you can trigger
> from sysfs by manually removing the CPU, so it's not hibernate specific.

except I don't understand how cpu_add_remove_lock makes the difference...
And thus I can't understand why cpu_down() (called lockless) have the
same problems. Please see below.

> Anyway, you can have a deadlock like this:
>
> CPU 3			CPU 2				CPU 1
> 							suspend/hibernate
> 			something:
> 			rtnl_lock()			device_pm_lock()
> 							-> mutex_lock(&dpm_list_mtx)
>
> 			mutex_lock(&dpm_list_mtx)
>
> linkwatch_work
>  -> rtnl_lock()
> 							disable_nonboot_cpus()

let's suppose disable_nonboot_cpus() does not take cpu_add_remove_lock,

> 							-> flush CPU 3 workqueue

in this case the deadlock is still here?

We can't flush because we hold the lock (dpm_list_mtx) which depends
on another lock taken by work->func(), the "classical" bug with flush.

No?

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 16:09             ` Oleg Nesterov
@ 2009-05-19 16:27               ` Johannes Berg
  2009-05-19 18:51                 ` Oleg Nesterov
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Berg @ 2009-05-19 16:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Tue, 2009-05-19 at 18:09 +0200, Oleg Nesterov wrote:

> > Right. But exactly this happens in the hibernate case --
> 
> not sure I understand your "exactly this" ;)
> 
> But your explanation of the deadlock below looks great!

Yeah... I got side-tracked, I had a scenario in mind that actually
needed cpu_add_remove_lock().

> except I don't understand how cpu_add_remove_lock makes the difference...
> And thus I can't understand why cpu_down() (called lockless) have the
> same problems. Please see below.
> 
> > Anyway, you can have a deadlock like this:
> >
> > CPU 3			CPU 2				CPU 1
> > 							suspend/hibernate
> > 			something:
> > 			rtnl_lock()			device_pm_lock()
> > 							-> mutex_lock(&dpm_list_mtx)
> >
> > 			mutex_lock(&dpm_list_mtx)
> >
> > linkwatch_work
> >  -> rtnl_lock()
> > 							disable_nonboot_cpus()
> 
> let's suppose disable_nonboot_cpus() does not take cpu_add_remove_lock,
> 
> > 							-> flush CPU 3 workqueue
> 
> in this case the deadlock is still here?
> 
> We can't flush because we hold the lock (dpm_list_mtx) which depends
> on another lock taken by work->func(), the "classical" bug with flush.
> 
> No?

Yeah, it looks like cpu_add_remove_lock doesn't make a difference...
It's just lockdep reporting a longer chain that also leads to a
deadlock. OTOH just replace dpm_list_mtx with cpu_add_remove_lock and
you have the same scenario... happens too, I guess, somehow.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 16:27               ` Johannes Berg
@ 2009-05-19 18:51                 ` Oleg Nesterov
  2009-05-22 10:46                   ` Johannes Berg
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-19 18:51 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

On 05/19, Johannes Berg wrote:
>
> On Tue, 2009-05-19 at 18:09 +0200, Oleg Nesterov wrote:
>
> > > Right. But exactly this happens in the hibernate case --
> >
> > not sure I understand your "exactly this" ;)
> >
> > But your explanation of the deadlock below looks great!
>
> Yeah... I got side-tracked, I had a scenario in mind that actually
> needed cpu_add_remove_lock().
>
> > except I don't understand how cpu_add_remove_lock makes the difference...
> > And thus I can't understand why cpu_down() (called lockless) have the
> > same problems. Please see below.
> >
> > > Anyway, you can have a deadlock like this:
> > >
> > > CPU 3			CPU 2				CPU 1
> > > 							suspend/hibernate
> > > 			something:
> > > 			rtnl_lock()			device_pm_lock()
> > > 							-> mutex_lock(&dpm_list_mtx)
> > >
> > > 			mutex_lock(&dpm_list_mtx)
> > >
> > > linkwatch_work
> > >  -> rtnl_lock()
> > > 							disable_nonboot_cpus()
> >
> > let's suppose disable_nonboot_cpus() does not take cpu_add_remove_lock,
> >
> > > 							-> flush CPU 3 workqueue
> >
> > in this case the deadlock is still here?
> >
> > We can't flush because we hold the lock (dpm_list_mtx) which depends
> > on another lock taken by work->func(), the "classical" bug with flush.
> >
> > No?
>
> Yeah, it looks like cpu_add_remove_lock doesn't make a difference...
> It's just lockdep reporting a longer chain that also leads to a
> deadlock.

So. we should not call cpu_down/disable_nonboot_cpus under device_pm_lock().

At first glance this was changed by

	PM: Change hibernation code ordering
	4aecd6718939eb3c4145b248369b65f7483a8a02

	PM: Change suspend code ordering
	900af0d973856d6feb6fc088c2d0d3fde57707d3

commits. Rafael, could you take a look?


> OTOH just replace dpm_list_mtx with cpu_add_remove_lock and
> you have the same scenario...

Yes, but

> happens too, I guess, somehow.

Oh, I hope not ;) nobody should use cpu_maps_update_begin() except
cpu_down/up pathes. And workqueue.c, which uses it exactly because
we want to call _cpu_down()->flush_cpu_workqueue() without any other
locks held. But if the caller of cpu_down() holds some lock, then
we have the usual problems with the flush under lock.

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-19 15:33           ` Johannes Berg
  2009-05-19 16:09             ` Oleg Nesterov
@ 2009-05-20  3:36             ` Ming Lei
  2009-05-20  6:47               ` Johannes Berg
  1 sibling, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-20  3:36 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

2009/5/19 Johannes Berg <johannes@sipsolutions.net>:
> On Tue, 2009-05-19 at 14:00 +0200, Oleg Nesterov wrote:
>
>> > I'm not familiar enough with the code -- but what are we really trying
>> > to do in CPU_POST_DEAD? It seems to me that at that time things must
>> > already be off the CPU, so ...?
>>
>> Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
>> cwq->thread.
>>
>> > On the other hand that calls
>> > flush_cpu_workqueue() so it seems it would actually wait for the work to
>> > be executed on some other CPU, within the CPU_POST_DEAD notification?
>>
>> Yes. Because we can't just kill cwq->thread, we can have the pending
>> work_structs so we have to flush.
>>
>> Why can't we move these works to another CPU? We can, but this doesn't
>> really help. Because in any case we should at least wait for
>> cwq->current_work to complete.
>>
>> Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
>> Because work->func() can sleep in get_online_cpus(), we can't flush
>> until we drop cpu_hotplug.lock.
>
> Right. But exactly this happens in the hibernate case -- the hibernate
> code calls kernel/cpu.c:disable_nonboot_cpus() which calls _cpu_down()
> which calls raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD... Sadly,
> it does so while holding the cpu_add_remove_lock, which is happens to
> have the dependencies outlined in the original email...
>
> The same happens in cpu_down() (without leading _) which you can trigger
> from sysfs by manually removing the CPU, so it's not hibernate specific.
>
> Anyway, you can have a deadlock like this:
>
> CPU 3                   CPU 2                           CPU 1
>                                                        suspend/hibernate
>                        something:
>                        rtnl_lock()                     device_pm_lock()
>                                                        -> mutex_lock(&dpm_list_mtx)
>
>                        mutex_lock(&dpm_list_mtx)

Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
and depends on rtnl_lock?
Thanks!

>
> linkwatch_work
>  -> rtnl_lock()
>                                                        disable_nonboot_cpus()
>                                                        -> flush CPU 3 workqueue
>
> johannes
>
>



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  3:36             ` Ming Lei
@ 2009-05-20  6:47               ` Johannes Berg
  2009-05-20  7:09                 ` Ming Lei
  2009-05-22  8:03                 ` Ming Lei
  0 siblings, 2 replies; 43+ messages in thread
From: Johannes Berg @ 2009-05-20  6:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:

> > Anyway, you can have a deadlock like this:
> >
> > CPU 3                   CPU 2                           CPU 1
> >                                                        suspend/hibernate
> >                        something:
> >                        rtnl_lock()                     device_pm_lock()
> >                                                        -> mutex_lock(&dpm_list_mtx)
> >
> >                        mutex_lock(&dpm_list_mtx)
> 
> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
> and depends on rtnl_lock?

Why not? Something is registering a hotplugged netdev.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  6:47               ` Johannes Berg
@ 2009-05-20  7:09                 ` Ming Lei
  2009-05-20  7:12                   ` Johannes Berg
  2009-05-22  8:03                 ` Ming Lei
  1 sibling, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-20  7:09 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
> On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:
>
>> > Anyway, you can have a deadlock like this:
>> >
>> > CPU 3                   CPU 2                           CPU 1
>> >                                                        suspend/hibernate
>> >                        something:
>> >                        rtnl_lock()                     device_pm_lock()
>> >                                                        -> mutex_lock(&dpm_list_mtx)
>> >
>> >                        mutex_lock(&dpm_list_mtx)
>>
>> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
>> and depends on rtnl_lock?
>
> Why not? Something is registering a hotplugged netdev.

I see. I just feel a bit curious how lockdep may build the dependency
of dpm_list_mtx on rtnl_lock,  and it  is certainly related with
lockdep internal.

-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  7:09                 ` Ming Lei
@ 2009-05-20  7:12                   ` Johannes Berg
  2009-05-20  8:21                     ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Berg @ 2009-05-20  7:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On Wed, 2009-05-20 at 15:09 +0800, Ming Lei wrote:
> 2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
> > On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:
> >
> >> > Anyway, you can have a deadlock like this:
> >> >
> >> > CPU 3                   CPU 2                           CPU 1
> >> >                                                        suspend/hibernate
> >> >                        something:
> >> >                        rtnl_lock()                     device_pm_lock()
> >> >                                                        -> mutex_lock(&dpm_list_mtx)
> >> >
> >> >                        mutex_lock(&dpm_list_mtx)
> >>
> >> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
> >> and depends on rtnl_lock?
> >
> > Why not? Something is registering a hotplugged netdev.
> 
> I see. I just feel a bit curious how lockdep may build the dependency
> of dpm_list_mtx on rtnl_lock,  and it  is certainly related with
> lockdep internal.

No, it's just the way drivers/base/power/ works -- it acquires the lock
when you register a new struct device.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  7:12                   ` Johannes Berg
@ 2009-05-20  8:21                     ` Ming Lei
  2009-05-20  8:45                       ` Johannes Berg
  0 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-20  8:21 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
> On Wed, 2009-05-20 at 15:09 +0800, Ming Lei wrote:
>> 2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
>> > On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:
>> >
>> >> > Anyway, you can have a deadlock like this:
>> >> >
>> >> > CPU 3                   CPU 2                           CPU 1
>> >> >                                                        suspend/hibernate
>> >> >                        something:
>> >> >                        rtnl_lock()                     device_pm_lock()
>> >> >                                                        -> mutex_lock(&dpm_list_mtx)
>> >> >
>> >> >                        mutex_lock(&dpm_list_mtx)
>> >>
>> >> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
>> >> and depends on rtnl_lock?
>> >
>> > Why not? Something is registering a hotplugged netdev.
>>
>> I see. I just feel a bit curious how lockdep may build the dependency
>> of dpm_list_mtx on rtnl_lock,  and it  is certainly related with
>> lockdep internal.
>
> No, it's just the way drivers/base/power/ works -- it acquires the lock
> when you register a new struct device.

For me, the real puzzle is that  how lockdep introduce  #3
(dpm_list_mtx){+.+.+.}

-> #3 (dpm_list_mtx){+.+.+.}:
      [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
      [<ffffffff80271f38>] lock_acquire+0x98/0x140
      [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
      [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
      [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
      [<ffffffff8044b9bf>] device_add+0x45f/0x570
      [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
      [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
      [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
      [<ffffffff803c4307>] local_pci_probe+0x17/0x20
      [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
      [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
      [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
      [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
      [<ffffffff8044e03e>] driver_attach+0x1e/0x20
      [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
      [<ffffffff8044e668>] driver_register+0x78/0x140
      [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
      [<ffffffffa00bd05c>] 0xffffffffa00bd05c
      [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
      [<ffffffff8027d071>] sys_init_module+0xb1/0x200
      [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff

into the lockdep graph?  in which process context?  and what is the
previous held lock?
After all, there is a path ( #0,#1,#2,...,#5 ) in the directed graph
and #3 is added by
add_lock_to_list().

Thanks.

-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  8:21                     ` Ming Lei
@ 2009-05-20  8:45                       ` Johannes Berg
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Berg @ 2009-05-20  8:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]

On Wed, 2009-05-20 at 16:21 +0800, Ming Lei wrote:

> For me, the real puzzle is that  how lockdep introduce  #3
> (dpm_list_mtx){+.+.+.}
> 
> -> #3 (dpm_list_mtx){+.+.+.}:
>       [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>       [<ffffffff80271f38>] lock_acquire+0x98/0x140
>       [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>       [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>       [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
>       [<ffffffff8044b9bf>] device_add+0x45f/0x570
>       [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
>       [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
>       [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
>       [<ffffffff803c4307>] local_pci_probe+0x17/0x20
>       [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
>       [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
>       [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
>       [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
>       [<ffffffff8044e03e>] driver_attach+0x1e/0x20
>       [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
>       [<ffffffff8044e668>] driver_register+0x78/0x140
>       [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
>       [<ffffffffa00bd05c>] 0xffffffffa00bd05c
>       [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
>       [<ffffffff8027d071>] sys_init_module+0xb1/0x200
>       [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> into the lockdep graph?  in which process context?  and what is the
> previous held lock?
> After all, there is a path ( #0,#1,#2,...,#5 ) in the directed graph
> and #3 is added by
> add_lock_to_list().

Well, as Peter explained in the other part of the thread, lockdep will
always show the first cycle it found, not the shortest. The scenario
that I outlined is actually closer to this report:
http://paste.pocoo.org/show/116240/ but not quite that either, I further
shortened it.

Anyway, I have no idea how to fix it.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-17  7:18 ` Ingo Molnar
  2009-05-17 10:42   ` Ming Lei
  2009-05-17 11:18   ` Johannes Berg
@ 2009-05-20 12:18   ` Peter Zijlstra
  2009-05-20 13:18     ` Oleg Nesterov
  2 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-20 12:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zdenek Kabelac, Rafael J. Wysocki, Oleg Nesterov,
	Linux Kernel Mailing List


> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.30-rc5-00097-gd665355 #59
> > -------------------------------------------------------
> > pm-suspend/12129 is trying to acquire lock:
> >  (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0
> > 
> > but task is already holding lock:
> >  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> > cpu_maps_update_begin+0x17/0x20
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #5 (cpu_add_remove_lock){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
> >        [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
> >        [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
> >        [<ffffffff8027a784>] sys_delete_module+0x84/0x270
> >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> >        [<ffffffffffffffff>] 0xffffffffffffffff

Oleg, why does __create_workqueue_key() require cpu_maps_update_begin()?
Wouldn't get_online_cpus() be enough to freeze the online cpus?

Breaking the setup_lock -> cpu_add_remove_lock dependency seems
sufficient.

> > -> #4 (setup_lock){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffff80287af7>] stop_machine_create+0x17/0xb0
> >        [<ffffffff80246f06>] disable_nonboot_cpus+0x26/0x130
> >        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
> >        [<ffffffff8027dff7>] enter_state+0x107/0x170
> >        [<ffffffff8027e0f9>] state_store+0x99/0x100
> >        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
> >        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
> >        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
> >        [<ffffffff802e2771>] sys_write+0x51/0x90
> >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> >        [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > -> #3 (dpm_list_mtx){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
> >        [<ffffffff8044b9bf>] device_add+0x45f/0x570
> >        [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
> >        [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
> >        [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
> >        [<ffffffff803c4307>] local_pci_probe+0x17/0x20
> >        [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
> >        [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
> >        [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
> >        [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
> >        [<ffffffff8044e03e>] driver_attach+0x1e/0x20
> >        [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
> >        [<ffffffff8044e668>] driver_register+0x78/0x140
> >        [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
> >        [<ffffffffa00bd05c>] 0xffffffffa00bd05c
> >        [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
> >        [<ffffffff8027d071>] sys_init_module+0xb1/0x200
> >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> >        [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > -> #2 (cfg80211_mutex){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> >        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
> >        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
> >        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
> >        [<ffffffff8020d23a>] child_rip+0xa/0x20
> >        [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > -> #1 (reg_work){+.+.+.}:
> >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff80258f12>] worker_thread+0x1e2/0x3a0
> >        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
> >        [<ffffffff8020d23a>] child_rip+0xa/0x20
> >        [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > -> #0 (events){+.+.+.}:
> >        [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
> >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >        [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
> >        [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
> >        [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
> >        [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
> >        [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
> >        [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
> >        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
> >        [<ffffffff8027dff7>] enter_state+0x107/0x170
> >        [<ffffffff8027e0f9>] state_store+0x99/0x100
> >        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
> >        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
> >        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
> >        [<ffffffff802e2771>] sys_write+0x51/0x90
> >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> >        [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > other info that might help us debug this:
> > 
> > 4 locks held by pm-suspend/12129:
> >  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8033f154>]
> > sysfs_write_file+0x44/0x160
> >  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8027df44>] enter_state+0x54/0x170
> >  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff80452747>] device_pm_lock+0x17/0x20
> >  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> > cpu_maps_update_begin+0x17/0x20
> > 
> > stack backtrace:
> > Pid: 12129, comm: pm-suspend Not tainted 2.6.30-rc5-00097-gd665355 #59
> > Call Trace:
> >  [<ffffffff8026fa3d>] print_circular_bug_tail+0x9d/0xe0
> >  [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
> >  [<ffffffff80270000>] ? mark_lock+0x3e0/0x400
> >  [<ffffffff80271f38>] lock_acquire+0x98/0x140
> >  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
> >  [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
> >  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
> >  [<ffffffff802703ed>] ? trace_hardirqs_on+0xd/0x10
> >  [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
> >  [<ffffffff8054acf1>] ? cpu_callback+0x12/0x280
> >  [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
> >  [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
> >  [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
> >  [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
> >  [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
> >  [<ffffffff8027dff7>] enter_state+0x107/0x170
> >  [<ffffffff8027e0f9>] state_store+0x99/0x100
> >  [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
> >  [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
> >  [<ffffffff802e1c88>] vfs_write+0xb8/0x180
> >  [<ffffffff8029088c>] ? audit_syscall_entry+0x21c/0x240
> >  [<ffffffff802e2771>] sys_write+0x51/0x90
> >  [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-20 12:18   ` Peter Zijlstra
@ 2009-05-20 13:18     ` Oleg Nesterov
  2009-05-20 13:44       ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-20 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On 05/20, Peter Zijlstra wrote:
>
> > > =======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 2.6.30-rc5-00097-gd665355 #59
> > > -------------------------------------------------------
> > > pm-suspend/12129 is trying to acquire lock:
> > >  (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0
> > >
> > > but task is already holding lock:
> > >  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> > > cpu_maps_update_begin+0x17/0x20
> > >
> > > which lock already depends on the new lock.
> > >
> > >
> > > the existing dependency chain (in reverse order) is:
> > >
> > > -> #5 (cpu_add_remove_lock){+.+.+.}:
> > >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> > >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> > >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> > >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> > >        [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
> > >        [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
> > >        [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
> > >        [<ffffffff8027a784>] sys_delete_module+0x84/0x270
> > >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> > >        [<ffffffffffffffff>] 0xffffffffffffffff
>
> Oleg, why does __create_workqueue_key() require cpu_maps_update_begin()?
> Wouldn't get_online_cpus() be enough to freeze the online cpus?

Yes, get_online_cpus() pins online CPUs. But CPU_POST_DEAD calls
cleanup_workqueue_thread() without cpu_hotplug.lock, this means
that create/destroy can race with cpu_down().

We can avoid cpu_add_remove_lock, but then we have to add another
lock to protect workqueues, cpu_populated_map, etc.

> Breaking the setup_lock -> cpu_add_remove_lock dependency seems
> sufficient.

Hmm. What do you mean? Afaics setup_lock -> cpu_add_remove_lock
is not a problem?

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-20 13:18     ` Oleg Nesterov
@ 2009-05-20 13:44       ` Peter Zijlstra
  2009-05-20 13:55         ` Oleg Nesterov
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-20 13:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On Wed, 2009-05-20 at 15:18 +0200, Oleg Nesterov wrote:
> On 05/20, Peter Zijlstra wrote:
> >
> > > > =======================================================
> > > > [ INFO: possible circular locking dependency detected ]
> > > > 2.6.30-rc5-00097-gd665355 #59
> > > > -------------------------------------------------------
> > > > pm-suspend/12129 is trying to acquire lock:
> > > >  (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0
> > > >
> > > > but task is already holding lock:
> > > >  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> > > > cpu_maps_update_begin+0x17/0x20
> > > >
> > > > which lock already depends on the new lock.
> > > >
> > > >
> > > > the existing dependency chain (in reverse order) is:
> > > >
> > > > -> #5 (cpu_add_remove_lock){+.+.+.}:
> > > >        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
> > > >        [<ffffffff80271f38>] lock_acquire+0x98/0x140
> > > >        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
> > > >        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
> > > >        [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
> > > >        [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
> > > >        [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
> > > >        [<ffffffff8027a784>] sys_delete_module+0x84/0x270
> > > >        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
> > > >        [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > Oleg, why does __create_workqueue_key() require cpu_maps_update_begin()?
> > Wouldn't get_online_cpus() be enough to freeze the online cpus?
> 
> Yes, get_online_cpus() pins online CPUs. But CPU_POST_DEAD calls
> cleanup_workqueue_thread() without cpu_hotplug.lock, this means
> that create/destroy can race with cpu_down().
> 
> We can avoid cpu_add_remove_lock, but then we have to add another
> lock to protect workqueues, cpu_populated_map, etc.

Joy..

> > Breaking the setup_lock -> cpu_add_remove_lock dependency seems
> > sufficient.
> 
> Hmm. What do you mean? Afaics setup_lock -> cpu_add_remove_lock
> is not a problem?

>From what I could see that is the only dependency that makes
cpu_add_remove_lock nest under "events" workqueue 'lock', which is what
is generating the deadlock.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-20 13:44       ` Peter Zijlstra
@ 2009-05-20 13:55         ` Oleg Nesterov
  2009-05-20 14:12           ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Oleg Nesterov @ 2009-05-20 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On 05/20, Peter Zijlstra wrote:
>
> On Wed, 2009-05-20 at 15:18 +0200, Oleg Nesterov wrote:
> > On 05/20, Peter Zijlstra wrote:
> > >
> > > Breaking the setup_lock -> cpu_add_remove_lock dependency seems
> > > sufficient.
> >
> > Hmm. What do you mean? Afaics setup_lock -> cpu_add_remove_lock
> > is not a problem?
>
> >From what I could see that is the only dependency that makes
> cpu_add_remove_lock nest under "events" workqueue 'lock', which is what
> is generating the deadlock.

But cpu_add_remove_lock does not participate in this deadlock, see
http://marc.info/?l=linux-kernel&m=124274977707363 ?

Perhaps you mean something else, could you spell in that case?

Oleg.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-20 13:55         ` Oleg Nesterov
@ 2009-05-20 14:12           ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-20 14:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List

On Wed, 2009-05-20 at 15:55 +0200, Oleg Nesterov wrote:
> On 05/20, Peter Zijlstra wrote:
> >
> > On Wed, 2009-05-20 at 15:18 +0200, Oleg Nesterov wrote:
> > > On 05/20, Peter Zijlstra wrote:
> > > >
> > > > Breaking the setup_lock -> cpu_add_remove_lock dependency seems
> > > > sufficient.
> > >
> > > Hmm. What do you mean? Afaics setup_lock -> cpu_add_remove_lock
> > > is not a problem?
> >
> > >From what I could see that is the only dependency that makes
> > cpu_add_remove_lock nest under "events" workqueue 'lock', which is what
> > is generating the deadlock.
> 
> But cpu_add_remove_lock does not participate in this deadlock, see
> http://marc.info/?l=linux-kernel&m=124274977707363 ?
> 
> Perhaps you mean something else, could you spell in that case?

Ah, right, I knew I should have payed more attention when reading the
thread. Yes breaking that chain around dpm_list_mutex would also work,
and solve this other deadlock too.




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-20  6:47               ` Johannes Berg
  2009-05-20  7:09                 ` Ming Lei
@ 2009-05-22  8:03                 ` Ming Lei
  2009-05-22  8:11                   ` Johannes Berg
  1 sibling, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-22  8:03 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
> On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:
>
>> > Anyway, you can have a deadlock like this:
>> >
>> > CPU 3                   CPU 2                           CPU 1
>> >                                                        suspend/hibernate
>> >                        something:
>> >                        rtnl_lock()                     device_pm_lock()
>> >                                                        -> mutex_lock(&dpm_list_mtx)
>> >
>> >                        mutex_lock(&dpm_list_mtx)
>>
>> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
>> and depends on rtnl_lock?
>
> Why not? Something is registering a hotplugged netdev.

It seems dpm_list_mtx is held in kernel_init context, so should not consider
registering a hotplugged netdev, isn't it?

I am still confused, since rtnl_mutex is held before dpm_list_mtx which
is acquired in kernel_init context.

[  562.689476] -> #3 (dpm_list_mtx){+.+.+.}:
[  562.689480]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
[  562.689484]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
[  562.689489]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
[  562.689493]        [<ffffffff803f6395>] device_pm_add+0x4b/0xf2
[  562.689499]        [<ffffffff803ef382>] device_add+0x498/0x62a
[  562.689503]        [<ffffffff8043fd32>] netdev_register_kobject+0x7b/0x80
[  562.689509]        [<ffffffff80434761>] register_netdevice+0x2d0/0x469
[  562.689514]        [<ffffffff80434939>] register_netdev+0x3f/0x4d
[  562.689519]        [<ffffffff806f634f>] loopback_net_init+0x40/0x7d
[  562.689524]        [<ffffffff8042ee79>] register_pernet_device+0x32/0x5f
[  562.689528]        [<ffffffff806fb41a>] net_dev_init+0x143/0x1a1
[  562.689533]        [<ffffffff80209080>] do_one_initcall+0x75/0x18a
[  562.689538]        [<ffffffff806d0678>] kernel_init+0x138/0x18e
[  562.689542]        [<ffffffff8020c33a>] child_rip+0xa/0x20
[  562.689546]        [<ffffffffffffffff>] 0xffffffffffffffff

>
> johannes
>



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at  cleanup_workqueue_thread
  2009-05-22  8:03                 ` Ming Lei
@ 2009-05-22  8:11                   ` Johannes Berg
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Berg @ 2009-05-22  8:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Peter Zijlstra, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]

On Fri, 2009-05-22 at 16:03 +0800, Ming Lei wrote:
> 2009/5/20 Johannes Berg <johannes@sipsolutions.net>:
> > On Wed, 2009-05-20 at 11:36 +0800, Ming Lei wrote:
> >
> >> > Anyway, you can have a deadlock like this:
> >> >
> >> > CPU 3                   CPU 2                           CPU 1
> >> >                                                        suspend/hibernate
> >> >                        something:
> >> >                        rtnl_lock()                     device_pm_lock()
> >> >                                                        -> mutex_lock(&dpm_list_mtx)
> >> >
> >> >                        mutex_lock(&dpm_list_mtx)
> >>
> >> Would you give a explaination why mutex_lock(&dpm_list_mtx) runs in CPU2
> >> and depends on rtnl_lock?
> >
> > Why not? Something is registering a hotplugged netdev.
> 
> It seems dpm_list_mtx is held in kernel_init context, so should not consider
> registering a hotplugged netdev, isn't it?
> 
> I am still confused, since rtnl_mutex is held before dpm_list_mtx which
> is acquired in kernel_init context.
> 
> [  562.689476] -> #3 (dpm_list_mtx){+.+.+.}:
> [  562.689480]        [<ffffffff8026eae8>] __lock_acquire+0x13a9/0x171c
> [  562.689484]        [<ffffffff8026ef5f>] lock_acquire+0x104/0x130
> [  562.689489]        [<ffffffff804ab2f0>] mutex_lock_nested+0x6f/0x36b
> [  562.689493]        [<ffffffff803f6395>] device_pm_add+0x4b/0xf2
> [  562.689499]        [<ffffffff803ef382>] device_add+0x498/0x62a
> [  562.689503]        [<ffffffff8043fd32>] netdev_register_kobject+0x7b/0x80
> [  562.689509]        [<ffffffff80434761>] register_netdevice+0x2d0/0x469
> [  562.689514]        [<ffffffff80434939>] register_netdev+0x3f/0x4d
> [  562.689519]        [<ffffffff806f634f>] loopback_net_init+0x40/0x7d
> [  562.689524]        [<ffffffff8042ee79>] register_pernet_device+0x32/0x5f
> [  562.689528]        [<ffffffff806fb41a>] net_dev_init+0x143/0x1a1
> [  562.689533]        [<ffffffff80209080>] do_one_initcall+0x75/0x18a
> [  562.689538]        [<ffffffff806d0678>] kernel_init+0x138/0x18e
> [  562.689542]        [<ffffffff8020c33a>] child_rip+0xa/0x20
> [  562.689546]        [<ffffffffffffffff>] 0xffffffffffffffff

You need to stop looking at the lockdep report. We've strayed far enough
from it that it's no longer useful. Anyhow, the kernel_init context here
doesn't matter -- what is relevant is that we have
	device_pm_add
being called somewhere inside
	register_netdev

where register_netdev acquires the rtnl, and device_pm_add acquires
dpm_list_mtx so we get this dependency of the two which I mapped onto
CPU 2.

The fact that this is in kernel_init isn't significant, that's because
the network device is built-in or whatever, if you hotplug a USB network
device you get the same chain of events starting from USB.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-19 18:51                 ` Oleg Nesterov
@ 2009-05-22 10:46                   ` Johannes Berg
  2009-05-22 22:23                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Berg @ 2009-05-22 10:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki, Peter Zijlstra,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On Tue, 2009-05-19 at 20:51 +0200, Oleg Nesterov wrote:

> > > > Anyway, you can have a deadlock like this:
> > > >
> > > > CPU 3			CPU 2				CPU 1
> > > > 							suspend/hibernate
> > > > 			something:
> > > > 			rtnl_lock()			device_pm_lock()
> > > > 							-> mutex_lock(&dpm_list_mtx)
> > > >
> > > > 			mutex_lock(&dpm_list_mtx)
> > > >
> > > > linkwatch_work
> > > >  -> rtnl_lock()
> > > > 							disable_nonboot_cpus()
> > >
> > > let's suppose disable_nonboot_cpus() does not take cpu_add_remove_lock,
> > >
> > > > 							-> flush CPU 3 workqueue
> > >
> > > in this case the deadlock is still here?
> > >
> > > We can't flush because we hold the lock (dpm_list_mtx) which depends
> > > on another lock taken by work->func(), the "classical" bug with flush.
> > >
> > > No?
> >
> > Yeah, it looks like cpu_add_remove_lock doesn't make a difference...
> > It's just lockdep reporting a longer chain that also leads to a
> > deadlock.
> 
> So. we should not call cpu_down/disable_nonboot_cpus under device_pm_lock().
> 
> At first glance this was changed by
> 
> 	PM: Change hibernation code ordering
> 	4aecd6718939eb3c4145b248369b65f7483a8a02
> 
> 	PM: Change suspend code ordering
> 	900af0d973856d6feb6fc088c2d0d3fde57707d3
> 
> commits. Rafael, could you take a look?

I just arrived at the same conclusion, heh. I can't say I understand
these changes though, the part about calling the platform differently
may make sense, but calling why disable non-boot CPUs at a different
place?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-22 10:46                   ` Johannes Berg
@ 2009-05-22 22:23                     ` Rafael J. Wysocki
  2009-05-23  8:21                       ` Johannes Berg
  0 siblings, 1 reply; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-05-22 22:23 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Peter Zijlstra,
	Linux Kernel Mailing List

On Friday 22 May 2009, Johannes Berg wrote:
> On Tue, 2009-05-19 at 20:51 +0200, Oleg Nesterov wrote:
> 
> > > > > Anyway, you can have a deadlock like this:
> > > > >
> > > > > CPU 3			CPU 2				CPU 1
> > > > > 							suspend/hibernate
> > > > > 			something:
> > > > > 			rtnl_lock()			device_pm_lock()
> > > > > 							-> mutex_lock(&dpm_list_mtx)
> > > > >
> > > > > 			mutex_lock(&dpm_list_mtx)
> > > > >
> > > > > linkwatch_work
> > > > >  -> rtnl_lock()
> > > > > 							disable_nonboot_cpus()
> > > >
> > > > let's suppose disable_nonboot_cpus() does not take cpu_add_remove_lock,
> > > >
> > > > > 							-> flush CPU 3 workqueue
> > > >
> > > > in this case the deadlock is still here?
> > > >
> > > > We can't flush because we hold the lock (dpm_list_mtx) which depends
> > > > on another lock taken by work->func(), the "classical" bug with flush.
> > > >
> > > > No?
> > >
> > > Yeah, it looks like cpu_add_remove_lock doesn't make a difference...
> > > It's just lockdep reporting a longer chain that also leads to a
> > > deadlock.
> > 
> > So. we should not call cpu_down/disable_nonboot_cpus under device_pm_lock().
> > 
> > At first glance this was changed by
> > 
> > 	PM: Change hibernation code ordering
> > 	4aecd6718939eb3c4145b248369b65f7483a8a02
> > 
> > 	PM: Change suspend code ordering
> > 	900af0d973856d6feb6fc088c2d0d3fde57707d3
> > 
> > commits. Rafael, could you take a look?
> 
> I just arrived at the same conclusion, heh. I can't say I understand
> these changes though, the part about calling the platform differently
> may make sense, but calling why disable non-boot CPUs at a different
> place?

Because the ordering of platform callbacks and cpu[_up()|_down()] is also
important, at least on resume.

In principle we can call device_pm_unlock() right before calling
disable_nonboot_cpus() and take the lock again right after calling
enable_nonboot_cpus(), if that helps.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-22 22:23                     ` Rafael J. Wysocki
@ 2009-05-23  8:21                       ` Johannes Berg
  2009-05-23 23:20                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Berg @ 2009-05-23  8:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Peter Zijlstra,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Sat, 2009-05-23 at 00:23 +0200, Rafael J. Wysocki wrote:

> > I just arrived at the same conclusion, heh. I can't say I understand
> > these changes though, the part about calling the platform differently
> > may make sense, but calling why disable non-boot CPUs at a different
> > place?
> 
> Because the ordering of platform callbacks and cpu[_up()|_down()] is also
> important, at least on resume.
> 
> In principle we can call device_pm_unlock() right before calling
> disable_nonboot_cpus() and take the lock again right after calling
> enable_nonboot_cpus(), if that helps.

Probably, unless the cpu_add_remove_lock wasn't a red herring after all.
I'd test, but I don't have much time today, will be travelling tomorrow
and be at UDS all week next week so I don't know when I'll get to it --
could you provide a patch and also attach it to
http://bugzilla.kernel.org/show_bug.cgi?id=13245 please? Miles (the
reporter of that bug) has been very helpful in testing before.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-23  8:21                       ` Johannes Berg
@ 2009-05-23 23:20                         ` Rafael J. Wysocki
  2009-05-24  3:29                           ` Ming Lei
  2009-05-24 14:30                           ` Alan Stern
  0 siblings, 2 replies; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-05-23 23:20 UTC (permalink / raw)
  To: Johannes Berg, Alan Stern
  Cc: Oleg Nesterov, Ingo Molnar, Zdenek Kabelac, Peter Zijlstra,
	Linux Kernel Mailing List, pm list

On Saturday 23 May 2009, Johannes Berg wrote:
> On Sat, 2009-05-23 at 00:23 +0200, Rafael J. Wysocki wrote:
> 
> > > I just arrived at the same conclusion, heh. I can't say I understand
> > > these changes though, the part about calling the platform differently
> > > may make sense, but calling why disable non-boot CPUs at a different
> > > place?
> > 
> > Because the ordering of platform callbacks and cpu[_up()|_down()] is also
> > important, at least on resume.
> > 
> > In principle we can call device_pm_unlock() right before calling
> > disable_nonboot_cpus() and take the lock again right after calling
> > enable_nonboot_cpus(), if that helps.
> 
> Probably, unless the cpu_add_remove_lock wasn't a red herring after all.
> I'd test, but I don't have much time today, will be travelling tomorrow
> and be at UDS all week next week so I don't know when I'll get to it --
> could you provide a patch and also attach it to
> http://bugzilla.kernel.org/show_bug.cgi?id=13245 please? Miles (the
> reporter of that bug) has been very helpful in testing before.

OK

The patch is appended for reference (Alan, please have a look; I can't recall
why exactly we have called device_pm_lock() from the core suspend/hibernation
code instead of acquiring the lock locally in drivers/base/power/main.c) and
I'll attach it to the bug entry too.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs

We shouldn't hold dpm_list_mtx while executing
[disable|enable]_nonboot_cpus(), because theoretically this may lead
to a deadlock as shown by the following example (provided by Johannes
Berg):

CPU 3       CPU 2                     CPU 1
                                      suspend/hibernate
            something:
            rtnl_lock()               device_pm_lock()
                                       -> mutex_lock(&dpm_list_mtx)

            mutex_lock(&dpm_list_mtx)

linkwatch_work
 -> rtnl_lock()
                                      disable_nonboot_cpus()
                                       -> flush CPU 3 workqueue

Fortunately, device drivers are supposed to stop any activities that
might lead to the registration of new device objects and/or to the
removal of the existing ones way before disable_nonboot_cpus() is
called, so it shouldn't be necessary to hold dpm_list_mtx over the
entire late part of device suspend and early part of device resume.

Thus, during the late suspend and the early resume of devices acquire
dpm_list_mtx only when dpm_list is going to be traversed and release
it right after that.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/main.c |    4 ++++
 kernel/kexec.c            |    2 --
 kernel/power/disk.c       |   21 +++------------------
 kernel/power/main.c       |    7 +------
 4 files changed, 8 insertions(+), 26 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -215,8 +215,6 @@ static int create_image(int platform_mod
 	if (error)
 		return error;
 
-	device_pm_lock();
-
 	/* At this point, device_suspend() has been called, but *not*
 	 * device_power_down(). We *must* call device_power_down() now.
 	 * Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -227,7 +225,7 @@ static int create_image(int platform_mod
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting hibernation\n");
-		goto Unlock;
+		return error;
 	}
 
 	error = platform_pre_snapshot(platform_mode);
@@ -280,9 +278,6 @@ static int create_image(int platform_mod
 	device_power_up(in_suspend ?
 		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
 
- Unlock:
-	device_pm_unlock();
-
 	return error;
 }
 
@@ -348,13 +343,11 @@ static int resume_target_kernel(bool pla
 {
 	int error;
 
-	device_pm_lock();
-
 	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
-		goto Unlock;
+		return error;
 	}
 
 	error = platform_pre_restore(platform_mode);
@@ -407,9 +400,6 @@ static int resume_target_kernel(bool pla
 
 	device_power_up(PMSG_RECOVER);
 
- Unlock:
-	device_pm_unlock();
-
 	return error;
 }
 
@@ -468,11 +458,9 @@ int hibernation_platform_enter(void)
 		goto Resume_devices;
 	}
 
-	device_pm_lock();
-
 	error = device_power_down(PMSG_HIBERNATE);
 	if (error)
-		goto Unlock;
+		goto Resume_devices;
 
 	error = hibernation_ops->prepare();
 	if (error)
@@ -497,9 +485,6 @@ int hibernation_platform_enter(void)
 
 	device_power_up(PMSG_RESTORE);
 
- Unlock:
-	device_pm_unlock();
-
  Resume_devices:
 	entering_platform_hibernation = false;
 	device_resume(PMSG_RESTORE);
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -271,12 +271,10 @@ static int suspend_enter(suspend_state_t
 {
 	int error;
 
-	device_pm_lock();
-
 	if (suspend_ops->prepare) {
 		error = suspend_ops->prepare();
 		if (error)
-			goto Done;
+			return error;
 	}
 
 	error = device_power_down(PMSG_SUSPEND);
@@ -325,9 +323,6 @@ static int suspend_enter(suspend_state_t
 	if (suspend_ops->finish)
 		suspend_ops->finish();
 
- Done:
-	device_pm_unlock();
-
 	return error;
 }
 
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -357,6 +357,7 @@ static void dpm_power_up(pm_message_t st
 {
 	struct device *dev;
 
+	mutex_lock(&dpm_list_mtx);
 	list_for_each_entry(dev, &dpm_list, power.entry)
 		if (dev->power.status > DPM_OFF) {
 			int error;
@@ -366,6 +367,7 @@ static void dpm_power_up(pm_message_t st
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+	mutex_unlock(&dpm_list_mtx);
 }
 
 /**
@@ -614,6 +616,7 @@ int device_power_down(pm_message_t state
 	int error = 0;
 
 	suspend_device_irqs();
+	mutex_lock(&dpm_list_mtx);
 	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
 		error = suspend_device_noirq(dev, state);
 		if (error) {
@@ -622,6 +625,7 @@ int device_power_down(pm_message_t state
 		}
 		dev->power.status = DPM_OFF_IRQ;
 	}
+	mutex_unlock(&dpm_list_mtx);
 	if (error)
 		device_power_up(resume_event(state));
 	return error;
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1451,7 +1451,6 @@ int kernel_kexec(void)
 		error = device_suspend(PMSG_FREEZE);
 		if (error)
 			goto Resume_console;
-		device_pm_lock();
 		/* At this point, device_suspend() has been called,
 		 * but *not* device_power_down(). We *must*
 		 * device_power_down() now.  Otherwise, drivers for
@@ -1489,7 +1488,6 @@ int kernel_kexec(void)
 		enable_nonboot_cpus();
 		device_power_up(PMSG_RESTORE);
  Resume_devices:
-		device_pm_unlock();
 		device_resume(PMSG_RESTORE);
  Resume_console:
 		resume_console();

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-23 23:20                         ` Rafael J. Wysocki
@ 2009-05-24  3:29                           ` Ming Lei
  2009-05-24 11:09                             ` Rafael J. Wysocki
  2009-05-24 14:30                           ` Alan Stern
  1 sibling, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-24  3:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Alan Stern, Oleg Nesterov, Ingo Molnar,
	Zdenek Kabelac, Peter Zijlstra, Linux Kernel Mailing List,
	pm list

于 Sun, 24 May 2009 01:20:29 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> 写道:

> On Saturday 23 May 2009, Johannes Berg wrote:
> > On Sat, 2009-05-23 at 00:23 +0200, Rafael J. Wysocki wrote:
> > 
> > > > I just arrived at the same conclusion, heh. I can't say I
> > > > understand these changes though, the part about calling the
> > > > platform differently may make sense, but calling why disable
> > > > non-boot CPUs at a different place?
> > > 
> > > Because the ordering of platform callbacks and cpu[_up()|_down()]
> > > is also important, at least on resume.
> > > 
> > > In principle we can call device_pm_unlock() right before calling
> > > disable_nonboot_cpus() and take the lock again right after calling
> > > enable_nonboot_cpus(), if that helps.
> > 
> > Probably, unless the cpu_add_remove_lock wasn't a red herring after
> > all. I'd test, but I don't have much time today, will be travelling
> > tomorrow and be at UDS all week next week so I don't know when I'll
> > get to it -- could you provide a patch and also attach it to
> > http://bugzilla.kernel.org/show_bug.cgi?id=13245 please? Miles (the
> > reporter of that bug) has been very helpful in testing before.
> 
> OK
> 
> The patch is appended for reference (Alan, please have a look; I
> can't recall why exactly we have called device_pm_lock() from the
> core suspend/hibernation code instead of acquiring the lock locally
> in drivers/base/power/main.c) and I'll attach it to the bug entry too.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Do not hold dpm_list_mtx while disabling/enabling
> nonboot CPUs
> 
> We shouldn't hold dpm_list_mtx while executing
> [disable|enable]_nonboot_cpus(), because theoretically this may lead
> to a deadlock as shown by the following example (provided by Johannes
> Berg):
> 
> CPU 3       CPU 2                     CPU 1
>                                       suspend/hibernate
>             something:
>             rtnl_lock()               device_pm_lock()
>                                        -> mutex_lock(&dpm_list_mtx)
> 
>             mutex_lock(&dpm_list_mtx)
> 
> linkwatch_work
>  -> rtnl_lock()
>                                       disable_nonboot_cpus()
>                                        -> flush CPU 3 workqueue
> 
> Fortunately, device drivers are supposed to stop any activities that
> might lead to the registration of new device objects and/or to the
> removal of the existing ones way before disable_nonboot_cpus() is
> called, so it shouldn't be necessary to hold dpm_list_mtx over the
> entire late part of device suspend and early part of device resume.
> 
> Thus, during the late suspend and the early resume of devices acquire
> dpm_list_mtx only when dpm_list is going to be traversed and release
> it right after that.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  drivers/base/power/main.c |    4 ++++
>  kernel/kexec.c            |    2 --
>  kernel/power/disk.c       |   21 +++------------------
>  kernel/power/main.c       |    7 +------
>  4 files changed, 8 insertions(+), 26 deletions(-)
> 

I try to apply the patch against lastest next tree(2009-05-22), but
"patch -p1" is failured:


[lm@linux-lm linux-2.6]$ patch -p1 <  ../patch_rx/INFO_possible_circular_locking_dependency_at_cleanup_workqueue_thread.patch 
patching file kernel/power/disk.c
Hunk #1 succeeded at 215 with fuzz 2.
Hunk #3 succeeded at 278 with fuzz 1.
Hunk #4 FAILED at 343.
Hunk #5 succeeded at 396 with fuzz 2 (offset -4 lines).
Hunk #6 FAILED at 454.
Hunk #7 succeeded at 485 with fuzz 2.
2 out of 7 hunks FAILED -- saving rejects to file kernel/power/disk.c.rej
patching file kernel/power/main.c
Hunk #1 succeeded at 289 with fuzz 1 (offset 18 lines).
patching file drivers/base/power/main.c
Hunk #3 succeeded at 616 with fuzz 2.
Hunk #4 succeeded at 625 with fuzz 2.
patching file kernel/kexec.c
Hunk #1 succeeded at 1451 with fuzz 2.
Hunk #2 succeeded at 1488 with fuzz 2.



> Index: linux-2.6/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/disk.c
> +++ linux-2.6/kernel/power/disk.c
> @@ -215,8 +215,6 @@ static int create_image(int platform_mod
>  	if (error)
>  		return error;
>  
> -	device_pm_lock();
> -
>  	/* At this point, device_suspend() has been called, but *not*
>  	 * device_power_down(). We *must* call device_power_down()
> now.
>  	 * Otherwise, drivers for some devices (e.g. interrupt
> controllers) @@ -227,7 +225,7 @@ static int create_image(int
> platform_mod if (error) {
>  		printk(KERN_ERR "PM: Some devices failed to power
> down, " "aborting hibernation\n");
> -		goto Unlock;
> +		return error;
>  	}
>  
>  	error = platform_pre_snapshot(platform_mode);
> @@ -280,9 +278,6 @@ static int create_image(int platform_mod
>  	device_power_up(in_suspend ?
>  		(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
>  
> - Unlock:
> -	device_pm_unlock();
> -
>  	return error;
>  }
>  
> @@ -348,13 +343,11 @@ static int resume_target_kernel(bool pla
>  {
>  	int error;
>  
> -	device_pm_lock();
> -
>  	error = device_power_down(PMSG_QUIESCE);
>  	if (error) {
>  		printk(KERN_ERR "PM: Some devices failed to power
> down, " "aborting resume\n");
> -		goto Unlock;
> +		return error;
>  	}
>  
>  	error = platform_pre_restore(platform_mode);
> @@ -407,9 +400,6 @@ static int resume_target_kernel(bool pla
>  
>  	device_power_up(PMSG_RECOVER);
>  
> - Unlock:
> -	device_pm_unlock();
> -
>  	return error;
>  }
>  
> @@ -468,11 +458,9 @@ int hibernation_platform_enter(void)
>  		goto Resume_devices;
>  	}
>  
> -	device_pm_lock();
> -
>  	error = device_power_down(PMSG_HIBERNATE);
>  	if (error)
> -		goto Unlock;
> +		goto Resume_devices;
>  
>  	error = hibernation_ops->prepare();
>  	if (error)
> @@ -497,9 +485,6 @@ int hibernation_platform_enter(void)
>  
>  	device_power_up(PMSG_RESTORE);
>  
> - Unlock:
> -	device_pm_unlock();
> -
>   Resume_devices:
>  	entering_platform_hibernation = false;
>  	device_resume(PMSG_RESTORE);
> Index: linux-2.6/kernel/power/main.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/main.c
> +++ linux-2.6/kernel/power/main.c
> @@ -271,12 +271,10 @@ static int suspend_enter(suspend_state_t
>  {
>  	int error;
>  
> -	device_pm_lock();
> -
>  	if (suspend_ops->prepare) {
>  		error = suspend_ops->prepare();
>  		if (error)
> -			goto Done;
> +			return error;
>  	}
>  
>  	error = device_power_down(PMSG_SUSPEND);
> @@ -325,9 +323,6 @@ static int suspend_enter(suspend_state_t
>  	if (suspend_ops->finish)
>  		suspend_ops->finish();
>  
> - Done:
> -	device_pm_unlock();
> -
>  	return error;
>  }
>  
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -357,6 +357,7 @@ static void dpm_power_up(pm_message_t st
>  {
>  	struct device *dev;
>  
> +	mutex_lock(&dpm_list_mtx);
>  	list_for_each_entry(dev, &dpm_list, power.entry)
>  		if (dev->power.status > DPM_OFF) {
>  			int error;
> @@ -366,6 +367,7 @@ static void dpm_power_up(pm_message_t st
>  			if (error)
>  				pm_dev_err(dev, state, " early",
> error); }
> +	mutex_unlock(&dpm_list_mtx);
>  }
>  
>  /**
> @@ -614,6 +616,7 @@ int device_power_down(pm_message_t state
>  	int error = 0;
>  
>  	suspend_device_irqs();
> +	mutex_lock(&dpm_list_mtx);
>  	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
>  		error = suspend_device_noirq(dev, state);
>  		if (error) {
> @@ -622,6 +625,7 @@ int device_power_down(pm_message_t state
>  		}
>  		dev->power.status = DPM_OFF_IRQ;
>  	}
> +	mutex_unlock(&dpm_list_mtx);
>  	if (error)
>  		device_power_up(resume_event(state));
>  	return error;
> Index: linux-2.6/kernel/kexec.c
> ===================================================================
> --- linux-2.6.orig/kernel/kexec.c
> +++ linux-2.6/kernel/kexec.c
> @@ -1451,7 +1451,6 @@ int kernel_kexec(void)
>  		error = device_suspend(PMSG_FREEZE);
>  		if (error)
>  			goto Resume_console;
> -		device_pm_lock();
>  		/* At this point, device_suspend() has been called,
>  		 * but *not* device_power_down(). We *must*
>  		 * device_power_down() now.  Otherwise, drivers for
> @@ -1489,7 +1488,6 @@ int kernel_kexec(void)
>  		enable_nonboot_cpus();
>  		device_power_up(PMSG_RESTORE);
>   Resume_devices:
> -		device_pm_unlock();
>  		device_resume(PMSG_RESTORE);
>   Resume_console:
>  		resume_console();
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-24  3:29                           ` Ming Lei
@ 2009-05-24 11:09                             ` Rafael J. Wysocki
  2009-05-24 12:48                               ` Ming Lei
  0 siblings, 1 reply; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-05-24 11:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Johannes Berg, Alan Stern, Oleg Nesterov, Ingo Molnar,
	Zdenek Kabelac, Peter Zijlstra, Linux Kernel Mailing List,
	pm list

On Sunday 24 May 2009, Ming Lei wrote:
> 于 Sun, 24 May 2009 01:20:29 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> 写道:
> 
> > On Saturday 23 May 2009, Johannes Berg wrote:
> > > On Sat, 2009-05-23 at 00:23 +0200, Rafael J. Wysocki wrote:
> > > 
> > > > > I just arrived at the same conclusion, heh. I can't say I
> > > > > understand these changes though, the part about calling the
> > > > > platform differently may make sense, but calling why disable
> > > > > non-boot CPUs at a different place?
> > > > 
> > > > Because the ordering of platform callbacks and cpu[_up()|_down()]
> > > > is also important, at least on resume.
> > > > 
> > > > In principle we can call device_pm_unlock() right before calling
> > > > disable_nonboot_cpus() and take the lock again right after calling
> > > > enable_nonboot_cpus(), if that helps.
> > > 
> > > Probably, unless the cpu_add_remove_lock wasn't a red herring after
> > > all. I'd test, but I don't have much time today, will be travelling
> > > tomorrow and be at UDS all week next week so I don't know when I'll
> > > get to it -- could you provide a patch and also attach it to
> > > http://bugzilla.kernel.org/show_bug.cgi?id=13245 please? Miles (the
> > > reporter of that bug) has been very helpful in testing before.
> > 
> > OK
> > 
> > The patch is appended for reference (Alan, please have a look; I
> > can't recall why exactly we have called device_pm_lock() from the
> > core suspend/hibernation code instead of acquiring the lock locally
> > in drivers/base/power/main.c) and I'll attach it to the bug entry too.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Do not hold dpm_list_mtx while disabling/enabling
> > nonboot CPUs
> > 
> > We shouldn't hold dpm_list_mtx while executing
> > [disable|enable]_nonboot_cpus(), because theoretically this may lead
> > to a deadlock as shown by the following example (provided by Johannes
> > Berg):
> > 
> > CPU 3       CPU 2                     CPU 1
> >                                       suspend/hibernate
> >             something:
> >             rtnl_lock()               device_pm_lock()
> >                                        -> mutex_lock(&dpm_list_mtx)
> > 
> >             mutex_lock(&dpm_list_mtx)
> > 
> > linkwatch_work
> >  -> rtnl_lock()
> >                                       disable_nonboot_cpus()
> >                                        -> flush CPU 3 workqueue
> > 
> > Fortunately, device drivers are supposed to stop any activities that
> > might lead to the registration of new device objects and/or to the
> > removal of the existing ones way before disable_nonboot_cpus() is
> > called, so it shouldn't be necessary to hold dpm_list_mtx over the
> > entire late part of device suspend and early part of device resume.
> > 
> > Thus, during the late suspend and the early resume of devices acquire
> > dpm_list_mtx only when dpm_list is going to be traversed and release
> > it right after that.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  drivers/base/power/main.c |    4 ++++
> >  kernel/kexec.c            |    2 --
> >  kernel/power/disk.c       |   21 +++------------------
> >  kernel/power/main.c       |    7 +------
> >  4 files changed, 8 insertions(+), 26 deletions(-)
> > 
> 
> I try to apply the patch against lastest next tree(2009-05-22), but
> "patch -p1" is failured:
> 
> 
> [lm@linux-lm linux-2.6]$ patch -p1 <  ../patch_rx/INFO_possible_circular_locking_dependency_at_cleanup_workqueue_thread.patch 
> patching file kernel/power/disk.c
> Hunk #1 succeeded at 215 with fuzz 2.
> Hunk #3 succeeded at 278 with fuzz 1.
> Hunk #4 FAILED at 343.
> Hunk #5 succeeded at 396 with fuzz 2 (offset -4 lines).
> Hunk #6 FAILED at 454.
> Hunk #7 succeeded at 485 with fuzz 2.
> 2 out of 7 hunks FAILED -- saving rejects to file kernel/power/disk.c.rej
> patching file kernel/power/main.c
> Hunk #1 succeeded at 289 with fuzz 1 (offset 18 lines).
> patching file drivers/base/power/main.c
> Hunk #3 succeeded at 616 with fuzz 2.
> Hunk #4 succeeded at 625 with fuzz 2.
> patching file kernel/kexec.c
> Hunk #1 succeeded at 1451 with fuzz 2.
> Hunk #2 succeeded at 1488 with fuzz 2.

The patch applies to the mainline, since it'll be a 2.6.30 candidate if it's
confirmed to fix the problem.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-24 11:09                             ` Rafael J. Wysocki
@ 2009-05-24 12:48                               ` Ming Lei
  2009-05-24 19:09                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 43+ messages in thread
From: Ming Lei @ 2009-05-24 12:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Alan Stern, Oleg Nesterov, Ingo Molnar,
	Zdenek Kabelac, Peter Zijlstra, Linux Kernel Mailing List,
	pm list

[-- Attachment #1: Type: text/plain, Size: 1779 bytes --]

On Sun, 24 May 2009 13:09:13 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Sunday 24 May 2009, Ming Lei wrote:
> > 于 Sun, 24 May 2009 01:20:29 +0200
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> > [lm@linux-lm linux-2.6]$ patch -p1
> > <  ../patch_rx/INFO_possible_circular_locking_dependency_at_cleanup_workqueue_thread.patch
> > patching file kernel/power/disk.c Hunk #1 succeeded at 215 with
> > fuzz 2. Hunk #3 succeeded at 278 with fuzz 1.
> > Hunk #4 FAILED at 343.
> > Hunk #5 succeeded at 396 with fuzz 2 (offset -4 lines).
> > Hunk #6 FAILED at 454.
> > Hunk #7 succeeded at 485 with fuzz 2.
> > 2 out of 7 hunks FAILED -- saving rejects to file
> > kernel/power/disk.c.rej patching file kernel/power/main.c
> > Hunk #1 succeeded at 289 with fuzz 1 (offset 18 lines).
> > patching file drivers/base/power/main.c
> > Hunk #3 succeeded at 616 with fuzz 2.
> > Hunk #4 succeeded at 625 with fuzz 2.
> > patching file kernel/kexec.c
> > Hunk #1 succeeded at 1451 with fuzz 2.
> > Hunk #2 succeeded at 1488 with fuzz 2.
> 
> The patch applies to the mainline, since it'll be a 2.6.30 candidate
> if it's confirmed to fix the problem.

After applying the patch against 2.6.30-rc7, my dell d630 box can resume
from suspend(s2ram) or hibernation successfully, and lockdep does not 
complain with the kind of message in

	http://bugzilla.kernel.org/show_bug.cgi?id=13245.

Another thing, during resume from hibernation, I can find the following
messages:

	[    1.849584] device: 'network_throughput': device_add
	[    1.849630] PM: Adding info for No Bus:network_throughput
	[    1.849788] PM: Resume from disk failed.
 
I do not known if there is really something wrong in PM, after all my
box can work as before. Attachment is the dmesg info.

Thanks.

-- 
Lei Ming

[-- Attachment #2: dmesg.tar.gz --]
[-- Type: application/x-gzip, Size: 58118 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-23 23:20                         ` Rafael J. Wysocki
  2009-05-24  3:29                           ` Ming Lei
@ 2009-05-24 14:30                           ` Alan Stern
  2009-05-24 19:06                             ` Rafael J. Wysocki
  1 sibling, 1 reply; 43+ messages in thread
From: Alan Stern @ 2009-05-24 14:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Johannes Berg, Oleg Nesterov, Ingo Molnar, Zdenek Kabelac,
	Peter Zijlstra, Linux Kernel Mailing List, pm list

On Sun, 24 May 2009, Rafael J. Wysocki wrote:

> The patch is appended for reference (Alan, please have a look; I can't recall
> why exactly we have called device_pm_lock() from the core suspend/hibernation
> code instead of acquiring the lock locally in drivers/base/power/main.c) and
> I'll attach it to the bug entry too.

I can't remember the reason either.  Probably there wasn't any.  The 
patch looks fine, and it has the nice added benefit that now the only 
user of device_pm_lock() will be device_move().

> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs
> 
> We shouldn't hold dpm_list_mtx while executing
> [disable|enable]_nonboot_cpus(), because theoretically this may lead
> to a deadlock as shown by the following example (provided by Johannes
> Berg):
> 
> CPU 3       CPU 2                     CPU 1
>                                       suspend/hibernate
>             something:
>             rtnl_lock()               device_pm_lock()
>                                        -> mutex_lock(&dpm_list_mtx)
> 
>             mutex_lock(&dpm_list_mtx)
> 
> linkwatch_work
>  -> rtnl_lock()
>                                       disable_nonboot_cpus()
>                                        -> flush CPU 3 workqueue
> 
> Fortunately, device drivers are supposed to stop any activities that
> might lead to the registration of new device objects and/or to the
> removal of the existing ones way before disable_nonboot_cpus() is

Strictly speaking, drivers are still allowed to unregister existing 
devices.  They are forbidden only to register new ones.  This shouldn't 
hurt anything, though.

> called, so it shouldn't be necessary to hold dpm_list_mtx over the
> entire late part of device suspend and early part of device resume.
> 
> Thus, during the late suspend and the early resume of devices acquire
> dpm_list_mtx only when dpm_list is going to be traversed and release
> it right after that.

Acked-by: Alan Stern <stern@rowland.harvard.edu>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-12  7:59 INFO: possible circular locking dependency at cleanup_workqueue_thread Zdenek Kabelac
  2009-05-17  7:18 ` Ingo Molnar
@ 2009-05-24 18:58 ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2009-05-24 18:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Johannes Berg, Ingo Molnar, Zdenek Kabelac, Rafael J. Wysocki,
	Linux Kernel Mailing List, Gautham R Shenoy, Oleg Nesterov

Below are the original lockdep output and the one generated by applying
Lei Ming's BFS shortest cycle patch.

It appears to find a slightly shorter variant, removing setup_lock from
the cycle -- but it might be a difference in setup or userland.

Looking again at Oleg's example, I think this again falls short of
finding the L1-L2 inversion, simply because we establish (and therefore
find) the longer cycle first.

Simply because we warn at the first cycle detected, it means we'll never
continue to build dependencies to build shorter ones,.. I think?

/me goes trying to construct a scenario to disprove the above.

On Tue, 2009-05-12 at 09:59 +0200, Zdenek Kabelac wrote:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.30-rc5-00097-gd665355 #59
> -------------------------------------------------------
> pm-suspend/12129 is trying to acquire lock:
>  (events){+.+.+.}, at: [<ffffffff80259496>] cleanup_workqueue_thread+0x26/0xd0
> 
> but task is already holding lock:
>  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> cpu_maps_update_begin+0x17/0x20
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #5 (cpu_add_remove_lock){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff80246e57>] cpu_maps_update_begin+0x17/0x20
>        [<ffffffff80259c33>] __create_workqueue_key+0xc3/0x250
>        [<ffffffff80287b20>] stop_machine_create+0x40/0xb0
>        [<ffffffff8027a784>] sys_delete_module+0x84/0x270
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #4 (setup_lock){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff80287af7>] stop_machine_create+0x17/0xb0
>        [<ffffffff80246f06>] disable_nonboot_cpus+0x26/0x130
>        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>        [<ffffffff8027dff7>] enter_state+0x107/0x170
>        [<ffffffff8027e0f9>] state_store+0x99/0x100
>        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>        [<ffffffff802e2771>] sys_write+0x51/0x90
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #3 (dpm_list_mtx){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffff804532ff>] device_pm_add+0x1f/0xe0
>        [<ffffffff8044b9bf>] device_add+0x45f/0x570
>        [<ffffffffa007c578>] wiphy_register+0x158/0x280 [cfg80211]
>        [<ffffffffa017567c>] ieee80211_register_hw+0xbc/0x410 [mac80211]
>        [<ffffffffa01f7c5c>] iwl3945_pci_probe+0xa1c/0x1080 [iwl3945]
>        [<ffffffff803c4307>] local_pci_probe+0x17/0x20
>        [<ffffffff803c5458>] pci_device_probe+0x88/0xb0
>        [<ffffffff8044e1e9>] driver_probe_device+0x89/0x180
>        [<ffffffff8044e37b>] __driver_attach+0x9b/0xa0
>        [<ffffffff8044d67c>] bus_for_each_dev+0x6c/0xa0
>        [<ffffffff8044e03e>] driver_attach+0x1e/0x20
>        [<ffffffff8044d955>] bus_add_driver+0xd5/0x290
>        [<ffffffff8044e668>] driver_register+0x78/0x140
>        [<ffffffff803c56f6>] __pci_register_driver+0x66/0xe0
>        [<ffffffffa00bd05c>] 0xffffffffa00bd05c
>        [<ffffffff8020904f>] do_one_initcall+0x3f/0x1c0
>        [<ffffffff8027d071>] sys_init_module+0xb1/0x200
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #2 (cfg80211_mutex){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff8054e78c>] __mutex_lock_common+0x4c/0x3b0
>        [<ffffffff8054ebf6>] mutex_lock_nested+0x46/0x60
>        [<ffffffffa007e66a>] reg_todo+0x19a/0x590 [cfg80211]
>        [<ffffffff80258f18>] worker_thread+0x1e8/0x3a0
>        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
>        [<ffffffff8020d23a>] child_rip+0xa/0x20
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #1 (reg_work){+.+.+.}:
>        [<ffffffff80271a64>] __lock_acquire+0xc64/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff80258f12>] worker_thread+0x1e2/0x3a0
>        [<ffffffff8025dc3a>] kthread+0x5a/0xa0
>        [<ffffffff8020d23a>] child_rip+0xa/0x20
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #0 (events){+.+.+.}:
>        [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
>        [<ffffffff80271f38>] lock_acquire+0x98/0x140
>        [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
>        [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
>        [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
>        [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
>        [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
>        [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
>        [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>        [<ffffffff8027dff7>] enter_state+0x107/0x170
>        [<ffffffff8027e0f9>] state_store+0x99/0x100
>        [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>        [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>        [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>        [<ffffffff802e2771>] sys_write+0x51/0x90
>        [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b
>        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> other info that might help us debug this:
> 
> 4 locks held by pm-suspend/12129:
>  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8033f154>]
> sysfs_write_file+0x44/0x160
>  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8027df44>] enter_state+0x54/0x170
>  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff80452747>] device_pm_lock+0x17/0x20
>  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff80246e57>]
> cpu_maps_update_begin+0x17/0x20
> 
> stack backtrace:
> Pid: 12129, comm: pm-suspend Not tainted 2.6.30-rc5-00097-gd665355 #59
> Call Trace:
>  [<ffffffff8026fa3d>] print_circular_bug_tail+0x9d/0xe0
>  [<ffffffff80271b36>] __lock_acquire+0xd36/0x10a0
>  [<ffffffff80270000>] ? mark_lock+0x3e0/0x400
>  [<ffffffff80271f38>] lock_acquire+0x98/0x140
>  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
>  [<ffffffff802594bd>] cleanup_workqueue_thread+0x4d/0xd0
>  [<ffffffff80259496>] ? cleanup_workqueue_thread+0x26/0xd0
>  [<ffffffff802703ed>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff8053fe79>] workqueue_cpu_callback+0xc9/0x10f
>  [<ffffffff8054acf1>] ? cpu_callback+0x12/0x280
>  [<ffffffff805539a8>] notifier_call_chain+0x68/0xa0
>  [<ffffffff802630d6>] raw_notifier_call_chain+0x16/0x20
>  [<ffffffff8053d9ec>] _cpu_down+0x1cc/0x2d0
>  [<ffffffff80246f90>] disable_nonboot_cpus+0xb0/0x130
>  [<ffffffff8027dd8d>] suspend_devices_and_enter+0xbd/0x1b0
>  [<ffffffff8027dff7>] enter_state+0x107/0x170
>  [<ffffffff8027e0f9>] state_store+0x99/0x100
>  [<ffffffff803abe17>] kobj_attr_store+0x17/0x20
>  [<ffffffff8033f1e9>] sysfs_write_file+0xd9/0x160
>  [<ffffffff802e1c88>] vfs_write+0xb8/0x180
>  [<ffffffff8029088c>] ? audit_syscall_entry+0x21c/0x240
>  [<ffffffff802e2771>] sys_write+0x51/0x90
>  [<ffffffff8020c15b>] system_call_fastpath+0x16/0x1b


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.30-rc6-tip #9
-------------------------------------------------------
bash/6174 is trying to acquire lock:
 (events){+.+.+.}, at: [<ffffffff81059076>] cleanup_workqueue_thread+0x28/0x10a

but task is already holding lock:
 (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81046c26>] disable_nonboot_cpus+0x38/0x128

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (cpu_add_remove_lock){+.+.+.}:
       [<ffffffff8106f74d>] __lock_acquire+0xa80/0xc08
       [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
       [<ffffffff813553fe>] __mutex_lock_common+0x5e/0x494
       [<ffffffff81355883>] mutex_lock_nested+0x19/0x1b
       [<ffffffff81046c26>] disable_nonboot_cpus+0x38/0x128
       [<ffffffff8107b38e>] suspend_devices_and_enter+0xf5/0x1f4
       [<ffffffff8107b623>] enter_state+0x168/0x1ce
       [<ffffffff8107b745>] state_store+0xbc/0xdd
       [<ffffffff811737fb>] kobj_attr_store+0x17/0x19
       [<ffffffff81136620>] sysfs_write_file+0xe9/0x11e
       [<ffffffff810e2422>] vfs_write+0xb0/0x10a
       [<ffffffff810e254a>] sys_write+0x4c/0x75
       [<ffffffff8100bdf2>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #3 (dpm_list_mtx){+.+.+.}:
       [<ffffffff8106f74d>] __lock_acquire+0xa80/0xc08
       [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
       [<ffffffff813553fe>] __mutex_lock_common+0x5e/0x494
       [<ffffffff81355883>] mutex_lock_nested+0x19/0x1b
       [<ffffffff812312fe>] device_pm_add+0x23/0xcd
       [<ffffffff8122ad87>] device_add+0x38b/0x549
       [<ffffffff8130de59>] wiphy_register+0x139/0x1ee
       [<ffffffff81317dff>] ieee80211_register_hw+0xee/0x3bf
       [<ffffffff8126163a>] iwl_setup_mac+0x8b/0xd1
       [<ffffffff8126fbcd>] iwl_pci_probe+0x7f5/0x921
       [<ffffffff81188d21>] local_pci_probe+0x17/0x1b
       [<ffffffff81059262>] do_work_for_cpu+0x18/0x2a
       [<ffffffff8105dc98>] kthread+0x5b/0x88
       [<ffffffff8100cf7a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (cfg80211_mutex){+.+.+.}:
       [<ffffffff8106f74d>] __lock_acquire+0xa80/0xc08
       [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
       [<ffffffff813553fe>] __mutex_lock_common+0x5e/0x494
       [<ffffffff81355883>] mutex_lock_nested+0x19/0x1b
       [<ffffffff8130f52a>] reg_todo+0x53/0x490
       [<ffffffff81058870>] worker_thread+0x250/0x3dc
       [<ffffffff8105dc98>] kthread+0x5b/0x88
       [<ffffffff8100cf7a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (reg_work){+.+.+.}:
       [<ffffffff8106f74d>] __lock_acquire+0xa80/0xc08
       [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
       [<ffffffff810587f7>] worker_thread+0x1d7/0x3dc
       [<ffffffff8105dc98>] kthread+0x5b/0x88
       [<ffffffff8100cf7a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (events){+.+.+.}:
       [<ffffffff8106f641>] __lock_acquire+0x974/0xc08
       [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
       [<ffffffff8105909d>] cleanup_workqueue_thread+0x4f/0x10a
       [<ffffffff81343fe4>] workqueue_cpu_callback+0xc9/0x10d
       [<ffffffff81359cfd>] notifier_call_chain+0x33/0x5b
       [<ffffffff81062254>] raw_notifier_call_chain+0x14/0x16
       [<ffffffff81342624>] _cpu_down+0x283/0x2a0
       [<ffffffff81046c6b>] disable_nonboot_cpus+0x7d/0x128
       [<ffffffff8107b38e>] suspend_devices_and_enter+0xf5/0x1f4
       [<ffffffff8107b623>] enter_state+0x168/0x1ce
       [<ffffffff8107b745>] state_store+0xbc/0xdd
       [<ffffffff811737fb>] kobj_attr_store+0x17/0x19
       [<ffffffff81136620>] sysfs_write_file+0xe9/0x11e
       [<ffffffff810e2422>] vfs_write+0xb0/0x10a
       [<ffffffff810e254a>] sys_write+0x4c/0x75
       [<ffffffff8100bdf2>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by bash/6174:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81136574>] sysfs_write_file+0x3d/0x11e
 #1:  (pm_mutex){+.+.+.}, at: [<ffffffff8107b680>] enter_state+0x1c5/0x1ce
 #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff8123084d>] device_pm_lock+0x17/0x19
 #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81046c26>] disable_nonboot_cpus+0x38/0x128

stack backtrace:
Pid: 6174, comm: bash Not tainted 2.6.30-rc6-tip #9
Call Trace:
 [<ffffffff8106e92d>] print_circular_bug+0x1cc/0x201
 [<ffffffff8106f641>] __lock_acquire+0x974/0xc08
 [<ffffffff8106f9dd>] lock_acquire+0x108/0x134
 [<ffffffff81059076>] ? cleanup_workqueue_thread+0x28/0x10a
 [<ffffffff8105909d>] cleanup_workqueue_thread+0x4f/0x10a
 [<ffffffff81059076>] ? cleanup_workqueue_thread+0x28/0x10a
 [<ffffffff81343fe4>] workqueue_cpu_callback+0xc9/0x10d
 [<ffffffff81359cfd>] notifier_call_chain+0x33/0x5b
 [<ffffffff81062254>] raw_notifier_call_chain+0x14/0x16
 [<ffffffff81342624>] _cpu_down+0x283/0x2a0
 [<ffffffff81046c6b>] disable_nonboot_cpus+0x7d/0x128
 [<ffffffff8107b38e>] suspend_devices_and_enter+0xf5/0x1f4
 [<ffffffff8107b623>] enter_state+0x168/0x1ce
 [<ffffffff8107b745>] state_store+0xbc/0xdd
 [<ffffffff811737fb>] kobj_attr_store+0x17/0x19
 [<ffffffff81136620>] sysfs_write_file+0xe9/0x11e
 [<ffffffff810e2422>] vfs_write+0xb0/0x10a
 [<ffffffff810e254a>] sys_write+0x4c/0x75
 [<ffffffff8100bdf2>] system_call_fastpath+0x16/0x1b



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-24 14:30                           ` Alan Stern
@ 2009-05-24 19:06                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-05-24 19:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Johannes Berg, Oleg Nesterov, Ingo Molnar, Zdenek Kabelac,
	Peter Zijlstra, Linux Kernel Mailing List, pm list

On Sunday 24 May 2009, Alan Stern wrote:
> On Sun, 24 May 2009, Rafael J. Wysocki wrote:
> 
> > The patch is appended for reference (Alan, please have a look; I can't recall
> > why exactly we have called device_pm_lock() from the core suspend/hibernation
> > code instead of acquiring the lock locally in drivers/base/power/main.c) and
> > I'll attach it to the bug entry too.
> 
> I can't remember the reason either.  Probably there wasn't any.  The 
> patch looks fine, and it has the nice added benefit that now the only 
> user of device_pm_lock() will be device_move().
>
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs
> > 
> > We shouldn't hold dpm_list_mtx while executing
> > [disable|enable]_nonboot_cpus(), because theoretically this may lead
> > to a deadlock as shown by the following example (provided by Johannes
> > Berg):
> > 
> > CPU 3       CPU 2                     CPU 1
> >                                       suspend/hibernate
> >             something:
> >             rtnl_lock()               device_pm_lock()
> >                                        -> mutex_lock(&dpm_list_mtx)
> > 
> >             mutex_lock(&dpm_list_mtx)
> > 
> > linkwatch_work
> >  -> rtnl_lock()
> >                                       disable_nonboot_cpus()
> >                                        -> flush CPU 3 workqueue
> > 
> > Fortunately, device drivers are supposed to stop any activities that
> > might lead to the registration of new device objects and/or to the
> > removal of the existing ones way before disable_nonboot_cpus() is
> 
> Strictly speaking, drivers are still allowed to unregister existing 
> devices.  They are forbidden only to register new ones.  This shouldn't 
> hurt anything, though.

You're right, I'll fix the changelog.
 
> > called, so it shouldn't be necessary to hold dpm_list_mtx over the
> > entire late part of device suspend and early part of device resume.
> > 
> > Thus, during the late suspend and the early resume of devices acquire
> > dpm_list_mtx only when dpm_list is going to be traversed and release
> > it right after that.
> 
> Acked-by: Alan Stern <stern@rowland.harvard.edu>

Thanks!

Best,
Rafael

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: INFO: possible circular locking dependency at cleanup_workqueue_thread
  2009-05-24 12:48                               ` Ming Lei
@ 2009-05-24 19:09                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-05-24 19:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Johannes Berg, Alan Stern, Oleg Nesterov, Ingo Molnar,
	Zdenek Kabelac, Peter Zijlstra, Linux Kernel Mailing List,
	pm list

On Sunday 24 May 2009, Ming Lei wrote:
> On Sun, 24 May 2009 13:09:13 +0200
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Sunday 24 May 2009, Ming Lei wrote:
> > > 于 Sun, 24 May 2009 01:20:29 +0200
> > > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > > [lm@linux-lm linux-2.6]$ patch -p1
> > > <  ../patch_rx/INFO_possible_circular_locking_dependency_at_cleanup_workqueue_thread.patch
> > > patching file kernel/power/disk.c Hunk #1 succeeded at 215 with
> > > fuzz 2. Hunk #3 succeeded at 278 with fuzz 1.
> > > Hunk #4 FAILED at 343.
> > > Hunk #5 succeeded at 396 with fuzz 2 (offset -4 lines).
> > > Hunk #6 FAILED at 454.
> > > Hunk #7 succeeded at 485 with fuzz 2.
> > > 2 out of 7 hunks FAILED -- saving rejects to file
> > > kernel/power/disk.c.rej patching file kernel/power/main.c
> > > Hunk #1 succeeded at 289 with fuzz 1 (offset 18 lines).
> > > patching file drivers/base/power/main.c
> > > Hunk #3 succeeded at 616 with fuzz 2.
> > > Hunk #4 succeeded at 625 with fuzz 2.
> > > patching file kernel/kexec.c
> > > Hunk #1 succeeded at 1451 with fuzz 2.
> > > Hunk #2 succeeded at 1488 with fuzz 2.
> > 
> > The patch applies to the mainline, since it'll be a 2.6.30 candidate
> > if it's confirmed to fix the problem.
> 
> After applying the patch against 2.6.30-rc7, my dell d630 box can resume
> from suspend(s2ram) or hibernation successfully, and lockdep does not 
> complain with the kind of message in
> 
> 	http://bugzilla.kernel.org/show_bug.cgi?id=13245.

OK, great.  I'm going to push the patch to Linus.

> Another thing, during resume from hibernation, I can find the following
> messages:
> 
> 	[    1.849584] device: 'network_throughput': device_add
> 	[    1.849630] PM: Adding info for No Bus:network_throughput
> 	[    1.849788] PM: Resume from disk failed.
>  
> I do not known if there is really something wrong in PM,

No, this only means that there was no hibernation image present on the resume
partition while the kernel was starting.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2009-05-24 19:09 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-12  7:59 INFO: possible circular locking dependency at cleanup_workqueue_thread Zdenek Kabelac
2009-05-17  7:18 ` Ingo Molnar
2009-05-17 10:42   ` Ming Lei
2009-05-17 11:18   ` Johannes Berg
2009-05-17 13:10     ` Ingo Molnar
2009-05-18 19:47     ` Oleg Nesterov
2009-05-18 20:00       ` Peter Zijlstra
2009-05-18 20:16         ` Oleg Nesterov
2009-05-18 20:40           ` Peter Zijlstra
2009-05-18 22:14             ` Oleg Nesterov
2009-05-19  9:13               ` Peter Zijlstra
2009-05-19 10:49                 ` Peter Zijlstra
2009-05-19 14:53                   ` Oleg Nesterov
2009-05-19  8:51       ` Johannes Berg
2009-05-19 12:00         ` Oleg Nesterov
2009-05-19 15:33           ` Johannes Berg
2009-05-19 16:09             ` Oleg Nesterov
2009-05-19 16:27               ` Johannes Berg
2009-05-19 18:51                 ` Oleg Nesterov
2009-05-22 10:46                   ` Johannes Berg
2009-05-22 22:23                     ` Rafael J. Wysocki
2009-05-23  8:21                       ` Johannes Berg
2009-05-23 23:20                         ` Rafael J. Wysocki
2009-05-24  3:29                           ` Ming Lei
2009-05-24 11:09                             ` Rafael J. Wysocki
2009-05-24 12:48                               ` Ming Lei
2009-05-24 19:09                                 ` Rafael J. Wysocki
2009-05-24 14:30                           ` Alan Stern
2009-05-24 19:06                             ` Rafael J. Wysocki
2009-05-20  3:36             ` Ming Lei
2009-05-20  6:47               ` Johannes Berg
2009-05-20  7:09                 ` Ming Lei
2009-05-20  7:12                   ` Johannes Berg
2009-05-20  8:21                     ` Ming Lei
2009-05-20  8:45                       ` Johannes Berg
2009-05-22  8:03                 ` Ming Lei
2009-05-22  8:11                   ` Johannes Berg
2009-05-20 12:18   ` Peter Zijlstra
2009-05-20 13:18     ` Oleg Nesterov
2009-05-20 13:44       ` Peter Zijlstra
2009-05-20 13:55         ` Oleg Nesterov
2009-05-20 14:12           ` Peter Zijlstra
2009-05-24 18:58 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).