pm-hibernate : possible circular locking dependency detected

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* pm-hibernate : possible circular locking dependency detected
@ 2009-04-05  9:53 Ming Lei
  2009-04-05 10:12 ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2009-04-05  9:53 UTC (permalink / raw)
  To: Linux-pm mailing list, Linux Kernel List, Rafael J. Wysocki,
	Ingo Molnar

kernel version : one simple usb-serial patch against  commit
6bb597507f9839b13498781e481f5458aea33620.

Thanks.

[ 2276.031486] PM: Saving platform NVS memory
[ 2276.032410] Disabling non-boot CPUs ...
[ 2276.032879]
[ 2276.032880] =======================================================
[ 2276.032882] [ INFO: possible circular locking dependency detected ]
[ 2276.032885] 2.6.29-08003-g8b58977 #84
[ 2276.032886] -------------------------------------------------------
[ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
[ 2276.032891]  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
[<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.032900]
[ 2276.032900] but task is already holding lock:
[ 2276.032902]  (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
cpu_hotplug_begin+0x22/0x50
[ 2276.032908]
[ 2276.032909] which lock already depends on the new lock.
[ 2276.032910]
[ 2276.032911]
[ 2276.032912] the existing dependency chain (in reverse order) is:
[ 2276.032913]
[ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
[ 2276.032918]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.032922]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.032926]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.032930]        [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
[ 2276.032933]        [<ffffffff804728b0>] _cpu_up+0x58/0x12f
[ 2276.032937]        [<ffffffff804729ea>] cpu_up+0x63/0x78
[ 2276.032940]        [<ffffffff806585cb>] kernel_init+0xbd/0x189
[ 2276.032944]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.032948]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.032960]
[ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
[ 2276.032964]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.032968]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.032971]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.032974]        [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
[ 2276.032978]        [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
[ 2276.032982]        [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
[ 2276.032987]        [<ffffffff8026d959>] load_module+0x58/0x18c8
[ 2276.032990]        [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
[ 2276.032993]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.032998]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033001]
[ 2276.033001] -> #4 (setup_lock){+.+.+.}:
[ 2276.033005]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033008]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033012]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.033015]        [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
[ 2276.033018]        [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
[ 2276.033022]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033025]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033028]        [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033032]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033036]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033040]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033044]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033047]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033050]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033053]
[ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
[ 2276.033057]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
[ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
[ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
[ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
[ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
[ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
[ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
[ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
[ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
[ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
[ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033111]
[ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
[ 2276.033114]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033118]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033122]        [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
[ 2276.033125]        [<ffffffff80253d40>] kthread+0x49/0x76
[ 2276.033128]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
[ 2276.033131]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033134]
[ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
[ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
[ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
[ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
[ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
[ 2276.033157]        [<ffffffffa02b443a>]
get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
[ 2276.033161]        [<ffffffffa02b4b69>]
acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
[ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
[ 2276.033169]        [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
[ 2276.033173]        [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
[ 2276.033177]        [<ffffffffa00c60c5>] 0xffffffffa00c60c5
[ 2276.033180]        [<ffffffff80209076>] do_one_initcall+0x70/0x185
[ 2276.033183]        [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
[ 2276.033187]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033190]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033197]
[ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
[ 2276.033201]        [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
[ 2276.033204]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033208]        [<ffffffff80476d6c>] down_write+0x43/0x77
[ 2276.033211]        [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.033215]        [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
[ 2276.033218]        [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
[ 2276.033222]        [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
[ 2276.033226]        [<ffffffff80465097>] _cpu_down+0x86/0x298
[ 2276.033229]        [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
[ 2276.033233]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033236]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033239]        [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033242]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033246]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033249]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033252]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033255]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.033259]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2276.033262]
[ 2276.033262] other info that might help us debug this:
[ 2276.033263]
[ 2276.033265] 5 locks held by pm-hibernate/5061:
[ 2276.033267]  #0:  (&buffer->mutex){+.+.+.}, at:
[<ffffffff8030f724>] sysfs_write_file+0x38/0x119
[ 2276.033273]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
hibernate+0x17/0x1cf
[ 2276.033278]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
device_pm_lock+0x12/0x14
[ 2276.033284]  #3:  (cpu_add_remove_lock){+.+.+.}, at:
[<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
[ 2276.033289]  #4:  (&cpu_hotplug.lock){+.+.+.}, at:
[<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
[ 2276.033294]
[ 2276.033295] stack backtrace:
[ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
2.6.29-08003-g8b58977 #84
[ 2276.033300] Call Trace:
[ 2276.033304]  [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
[ 2276.033307]  [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
[ 2276.033312]  [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
[ 2276.033315]  [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
[ 2276.033319]  [<ffffffff80265996>] lock_acquire+0x93/0xbf
[ 2276.033322]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
[ 2276.033326]  [<ffffffff80476d6c>] down_write+0x43/0x77
[ 2276.033329]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
[ 2276.033333]  [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
[ 2276.033337]  [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
[ 2276.033340]  [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
[ 2276.033344]  [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
[ 2276.033347]  [<ffffffff80465097>] _cpu_down+0x86/0x298
[ 2276.033350]  [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
[ 2276.033350]  [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
[ 2276.033350]  [<ffffffff80271b52>] hibernate+0xab/0x1cf
[ 2276.033350]  [<ffffffff802705f0>] state_store+0x59/0xd7
[ 2276.033350]  [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
[ 2276.033350]  [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
[ 2276.033350]  [<ffffffff802c0d14>] vfs_write+0xae/0x137
[ 2276.033350]  [<ffffffff802c0e61>] sys_write+0x47/0x6e
[ 2276.033350]  [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
[ 2276.034478] Broke affinity for irq 21
[ 2276.034478] Broke affinity for irq 22
[ 2276.034478] Broke affinity for irq 29
[ 2276.035812] CPU 1 is now offline
[ 2276.035814] lockdep: fixing up alternatives.
[ 2276.035816] SMP alternatives: switching to UP code
[ 2276.158329] CPU0 attaching NULL sched-domain.
[ 2276.158334] CPU1 attaching NULL sched-domain.
[ 2276.158361] CPU0 attaching NULL sched-domain.
[ 2276.158841] CPU1 is down
[ 2276.158867] Checking wake-up interrupts
[ 2276.158883] Suspending System Devices
[ 2276.158885] Suspending type 'oprofile':
[ 2276.158886]  oprofile0
[ 2276.158888] Suspending type 'edac':
[ 2276.158889] Suspending type 'irqrouter':
[ 2276.158891]  irqrouter0
[ 2276.158892] Suspending type 'clocksource':
[ 2276.158894]  clocksource0
[ 2276.158895] Suspending type 'timekeeping':
[ 2276.158897]  timekeeping0
[ 2276.158929] Extended CMOS year: 2000
[ 2276.158937] Suspending type 'ioapic':
[ 2276.158938]  ioapic0
[ 2276.159007] Suspending type 'lapic':
[ 2276.159007]  lapic0
[ 2276.159007] Suspending type 'machinecheck':
[ 2276.159007]  machinecheck0
[ 2276.159007] Suspending type 'i8237':
[ 2276.159007]  i82370
[ 2276.159007] Suspending type 'i8259':
[ 2276.159007]  i82590
[ 2276.159007] Suspending type 'node':
[ 2276.159007]  node0
[ 2276.159007] Suspending type 'cpu':
[ 2276.159007]  cpu0
[ 2276.159007]  cpu1
[ 2276.159007] PM: Creating hibernation image:
[ 2276.159007] PM: Need to copy 112986 pages
[ 2276.159007] PM: Restoring platform NVS memory
[ 2276.159007] Resuming System Devices
[ 2276.159007] Resuming type 'cpu':
[ 2276.159007]  cpu0
[ 2276.159007]  cpu1
[ 2276.159007] Resuming type 'node':
[ 2276.159007]  node0
[ 2276.159007] Resuming type 'i8259':
[ 2276.159007]  i82590
[ 2276.159007] Resuming type 'i8237':
[ 2276.159007]  i82370
[ 2276.159007] Resuming type 'machinecheck':
[ 2276.159007]  machinecheck0
[ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
[ 2276.159007] Resuming type 'lapic':
[ 2276.159007]  lapic0
[ 2276.159007] Resuming type 'ioapic':
[ 2276.159007]  ioapic0
[ 2276.159007] Resuming type 'timekeeping':
[ 2276.159007]  timekeeping0
[ 2276.159007] Extended CMOS year: 2000
[ 2276.159007] Resuming type 'clocksource':
[ 2276.159007]  clocksource0
[ 2276.159007] Resuming type 'irqrouter':
[ 2276.159007]  irqrouter0
[ 2276.159007] Resuming type 'edac':
[ 2276.159007] Resuming type 'oprofile':
[ 2276.159007]  oprofile0



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-05  9:53 pm-hibernate : possible circular locking dependency detected Ming Lei
@ 2009-04-05 10:12 ` Rafael J. Wysocki
  2009-04-05 13:44   ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2009-04-05 10:12 UTC (permalink / raw)
  To: Ming Lei, Andrew Morton
  Cc: Linux-pm mailing list, Linux Kernel List, Ingo Molnar,
	Gautham R Shenoy

On Sunday 05 April 2009, Ming Lei wrote:
> kernel version : one simple usb-serial patch against  commit
> 6bb597507f9839b13498781e481f5458aea33620.
> 
> Thanks.

Hmm, CPU hotplug again, it seems.

I'm not sure who's the maintainer at the moment.  Andrew, is that Gautham?

Rafael


> [ 2276.031486] PM: Saving platform NVS memory
> [ 2276.032410] Disabling non-boot CPUs ...
> [ 2276.032879]
> [ 2276.032880] =======================================================
> [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> [ 2276.032885] 2.6.29-08003-g8b58977 #84
> [ 2276.032886] -------------------------------------------------------
> [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> [ 2276.032891]  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.032900]
> [ 2276.032900] but task is already holding lock:
> [ 2276.032902]  (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> cpu_hotplug_begin+0x22/0x50
> [ 2276.032908]
> [ 2276.032909] which lock already depends on the new lock.
> [ 2276.032910]
> [ 2276.032911]
> [ 2276.032912] the existing dependency chain (in reverse order) is:
> [ 2276.032913]
> [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> [ 2276.032918]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.032922]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.032926]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.032930]        [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> [ 2276.032933]        [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> [ 2276.032937]        [<ffffffff804729ea>] cpu_up+0x63/0x78
> [ 2276.032940]        [<ffffffff806585cb>] kernel_init+0xbd/0x189
> [ 2276.032944]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.032948]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.032960]
> [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> [ 2276.032964]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.032968]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.032971]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.032974]        [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> [ 2276.032978]        [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> [ 2276.032982]        [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> [ 2276.032987]        [<ffffffff8026d959>] load_module+0x58/0x18c8
> [ 2276.032990]        [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> [ 2276.032993]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.032998]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033001]
> [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> [ 2276.033005]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033008]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033012]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.033015]        [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> [ 2276.033018]        [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> [ 2276.033022]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033025]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033028]        [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033032]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033036]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033040]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033044]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033047]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033050]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033053]
> [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> [ 2276.033057]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> [ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
> [ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
> [ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
> [ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> [ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> [ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> [ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> [ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> [ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
> [ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033111]
> [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> [ 2276.033114]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033118]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033122]        [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> [ 2276.033125]        [<ffffffff80253d40>] kthread+0x49/0x76
> [ 2276.033128]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> [ 2276.033131]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033134]
> [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> [ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> [ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
> [ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> [ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> [ 2276.033157]        [<ffffffffa02b443a>]
> get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> [ 2276.033161]        [<ffffffffa02b4b69>]
> acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> [ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> [ 2276.033169]        [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> [ 2276.033173]        [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> [ 2276.033177]        [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> [ 2276.033180]        [<ffffffff80209076>] do_one_initcall+0x70/0x185
> [ 2276.033183]        [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> [ 2276.033187]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033190]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033197]
> [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> [ 2276.033201]        [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> [ 2276.033204]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033208]        [<ffffffff80476d6c>] down_write+0x43/0x77
> [ 2276.033211]        [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.033215]        [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> [ 2276.033218]        [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> [ 2276.033222]        [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> [ 2276.033226]        [<ffffffff80465097>] _cpu_down+0x86/0x298
> [ 2276.033229]        [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> [ 2276.033233]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033236]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033239]        [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033242]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033246]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033249]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033252]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033255]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.033259]        [<ffffffffffffffff>] 0xffffffffffffffff
> [ 2276.033262]
> [ 2276.033262] other info that might help us debug this:
> [ 2276.033263]
> [ 2276.033265] 5 locks held by pm-hibernate/5061:
> [ 2276.033267]  #0:  (&buffer->mutex){+.+.+.}, at:
> [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> [ 2276.033273]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> hibernate+0x17/0x1cf
> [ 2276.033278]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> device_pm_lock+0x12/0x14
> [ 2276.033284]  #3:  (cpu_add_remove_lock){+.+.+.}, at:
> [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> [ 2276.033289]  #4:  (&cpu_hotplug.lock){+.+.+.}, at:
> [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> [ 2276.033294]
> [ 2276.033295] stack backtrace:
> [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> 2.6.29-08003-g8b58977 #84
> [ 2276.033300] Call Trace:
> [ 2276.033304]  [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> [ 2276.033307]  [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> [ 2276.033312]  [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> [ 2276.033315]  [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> [ 2276.033319]  [<ffffffff80265996>] lock_acquire+0x93/0xbf
> [ 2276.033322]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> [ 2276.033326]  [<ffffffff80476d6c>] down_write+0x43/0x77
> [ 2276.033329]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> [ 2276.033333]  [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> [ 2276.033337]  [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> [ 2276.033340]  [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> [ 2276.033344]  [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> [ 2276.033347]  [<ffffffff80465097>] _cpu_down+0x86/0x298
> [ 2276.033350]  [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> [ 2276.033350]  [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> [ 2276.033350]  [<ffffffff80271b52>] hibernate+0xab/0x1cf
> [ 2276.033350]  [<ffffffff802705f0>] state_store+0x59/0xd7
> [ 2276.033350]  [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> [ 2276.033350]  [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> [ 2276.033350]  [<ffffffff802c0d14>] vfs_write+0xae/0x137
> [ 2276.033350]  [<ffffffff802c0e61>] sys_write+0x47/0x6e
> [ 2276.033350]  [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> [ 2276.034478] Broke affinity for irq 21
> [ 2276.034478] Broke affinity for irq 22
> [ 2276.034478] Broke affinity for irq 29
> [ 2276.035812] CPU 1 is now offline
> [ 2276.035814] lockdep: fixing up alternatives.
> [ 2276.035816] SMP alternatives: switching to UP code
> [ 2276.158329] CPU0 attaching NULL sched-domain.
> [ 2276.158334] CPU1 attaching NULL sched-domain.
> [ 2276.158361] CPU0 attaching NULL sched-domain.
> [ 2276.158841] CPU1 is down
> [ 2276.158867] Checking wake-up interrupts
> [ 2276.158883] Suspending System Devices
> [ 2276.158885] Suspending type 'oprofile':
> [ 2276.158886]  oprofile0
> [ 2276.158888] Suspending type 'edac':
> [ 2276.158889] Suspending type 'irqrouter':
> [ 2276.158891]  irqrouter0
> [ 2276.158892] Suspending type 'clocksource':
> [ 2276.158894]  clocksource0
> [ 2276.158895] Suspending type 'timekeeping':
> [ 2276.158897]  timekeeping0
> [ 2276.158929] Extended CMOS year: 2000
> [ 2276.158937] Suspending type 'ioapic':
> [ 2276.158938]  ioapic0
> [ 2276.159007] Suspending type 'lapic':
> [ 2276.159007]  lapic0
> [ 2276.159007] Suspending type 'machinecheck':
> [ 2276.159007]  machinecheck0
> [ 2276.159007] Suspending type 'i8237':
> [ 2276.159007]  i82370
> [ 2276.159007] Suspending type 'i8259':
> [ 2276.159007]  i82590
> [ 2276.159007] Suspending type 'node':
> [ 2276.159007]  node0
> [ 2276.159007] Suspending type 'cpu':
> [ 2276.159007]  cpu0
> [ 2276.159007]  cpu1
> [ 2276.159007] PM: Creating hibernation image:
> [ 2276.159007] PM: Need to copy 112986 pages
> [ 2276.159007] PM: Restoring platform NVS memory
> [ 2276.159007] Resuming System Devices
> [ 2276.159007] Resuming type 'cpu':
> [ 2276.159007]  cpu0
> [ 2276.159007]  cpu1
> [ 2276.159007] Resuming type 'node':
> [ 2276.159007]  node0
> [ 2276.159007] Resuming type 'i8259':
> [ 2276.159007]  i82590
> [ 2276.159007] Resuming type 'i8237':
> [ 2276.159007]  i82370
> [ 2276.159007] Resuming type 'machinecheck':
> [ 2276.159007]  machinecheck0
> [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> [ 2276.159007] Resuming type 'lapic':
> [ 2276.159007]  lapic0
> [ 2276.159007] Resuming type 'ioapic':
> [ 2276.159007]  ioapic0
> [ 2276.159007] Resuming type 'timekeeping':
> [ 2276.159007]  timekeeping0
> [ 2276.159007] Extended CMOS year: 2000
> [ 2276.159007] Resuming type 'clocksource':
> [ 2276.159007]  clocksource0
> [ 2276.159007] Resuming type 'irqrouter':
> [ 2276.159007]  irqrouter0
> [ 2276.159007] Resuming type 'edac':
> [ 2276.159007] Resuming type 'oprofile':
> [ 2276.159007]  oprofile0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-05 10:12 ` Rafael J. Wysocki
@ 2009-04-05 13:44   ` Ingo Molnar
  2009-04-06  0:55     ` Gautham R Shenoy
  2009-04-07  4:26     ` Rusty Russell
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-04-05 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra, Rusty Russell
  Cc: Ming Lei, Andrew Morton, Linux-pm mailing list, Linux Kernel List,
	Gautham R Shenoy


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Sunday 05 April 2009, Ming Lei wrote:
> > kernel version : one simple usb-serial patch against  commit
> > 6bb597507f9839b13498781e481f5458aea33620.
> > 
> > Thanks.
> 
> Hmm, CPU hotplug again, it seems.
> 
> I'm not sure who's the maintainer at the moment.  Andrew, is that 
> Gautham?

CPU hotplug tends to land on the scheduler people's desk normally.

But i'm not sure that's the real thing here - key appears to be this 
work_on_cpu() worklet by the cpufreq code:

> > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > [ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
> > [ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > [ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > [ 2276.033157]        [<ffffffffa02b443a>]
> > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > [ 2276.033161]        [<ffffffffa02b4b69>]
> > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > [ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676

Connecting things the wrong way around. Have not looked deeply 
though - Peter, Rusty, what's your take? (below is the message 
again)

	Ingo

> > [ 2276.031486] PM: Saving platform NVS memory
> > [ 2276.032410] Disabling non-boot CPUs ...
> > [ 2276.032879]
> > [ 2276.032880] =======================================================
> > [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> > [ 2276.032885] 2.6.29-08003-g8b58977 #84
> > [ 2276.032886] -------------------------------------------------------
> > [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> > [ 2276.032891]  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> > [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.032900]
> > [ 2276.032900] but task is already holding lock:
> > [ 2276.032902]  (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> > cpu_hotplug_begin+0x22/0x50
> > [ 2276.032908]
> > [ 2276.032909] which lock already depends on the new lock.
> > [ 2276.032910]
> > [ 2276.032911]
> > [ 2276.032912] the existing dependency chain (in reverse order) is:
> > [ 2276.032913]
> > [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> > [ 2276.032918]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.032922]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.032926]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.032930]        [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > [ 2276.032933]        [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> > [ 2276.032937]        [<ffffffff804729ea>] cpu_up+0x63/0x78
> > [ 2276.032940]        [<ffffffff806585cb>] kernel_init+0xbd/0x189
> > [ 2276.032944]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.032948]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.032960]
> > [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> > [ 2276.032964]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.032968]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.032971]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.032974]        [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> > [ 2276.032978]        [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> > [ 2276.032982]        [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> > [ 2276.032987]        [<ffffffff8026d959>] load_module+0x58/0x18c8
> > [ 2276.032990]        [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> > [ 2276.032993]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.032998]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033001]
> > [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> > [ 2276.033005]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033008]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033012]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033015]        [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> > [ 2276.033018]        [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> > [ 2276.033022]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033025]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033028]        [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033032]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033036]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033040]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033044]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033047]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033050]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033053]
> > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > [ 2276.033057]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > [ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
> > [ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
> > [ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > [ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > [ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > [ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > [ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > [ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033111]
> > [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> > [ 2276.033114]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033118]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033122]        [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> > [ 2276.033125]        [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033128]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033131]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033134]
> > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > [ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > [ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
> > [ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > [ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > [ 2276.033157]        [<ffffffffa02b443a>]
> > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > [ 2276.033161]        [<ffffffffa02b4b69>]
> > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > [ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> > [ 2276.033169]        [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> > [ 2276.033173]        [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> > [ 2276.033177]        [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> > [ 2276.033180]        [<ffffffff80209076>] do_one_initcall+0x70/0x185
> > [ 2276.033183]        [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> > [ 2276.033187]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033190]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033197]
> > [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> > [ 2276.033201]        [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > [ 2276.033204]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033208]        [<ffffffff80476d6c>] down_write+0x43/0x77
> > [ 2276.033211]        [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033215]        [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > [ 2276.033218]        [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > [ 2276.033222]        [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > [ 2276.033226]        [<ffffffff80465097>] _cpu_down+0x86/0x298
> > [ 2276.033229]        [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > [ 2276.033233]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033236]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033239]        [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033242]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033246]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033249]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033252]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033255]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.033259]        [<ffffffffffffffff>] 0xffffffffffffffff
> > [ 2276.033262]
> > [ 2276.033262] other info that might help us debug this:
> > [ 2276.033263]
> > [ 2276.033265] 5 locks held by pm-hibernate/5061:
> > [ 2276.033267]  #0:  (&buffer->mutex){+.+.+.}, at:
> > [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> > [ 2276.033273]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> > hibernate+0x17/0x1cf
> > [ 2276.033278]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> > device_pm_lock+0x12/0x14
> > [ 2276.033284]  #3:  (cpu_add_remove_lock){+.+.+.}, at:
> > [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> > [ 2276.033289]  #4:  (&cpu_hotplug.lock){+.+.+.}, at:
> > [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > [ 2276.033294]
> > [ 2276.033295] stack backtrace:
> > [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> > 2.6.29-08003-g8b58977 #84
> > [ 2276.033300] Call Trace:
> > [ 2276.033304]  [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> > [ 2276.033307]  [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > [ 2276.033312]  [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> > [ 2276.033315]  [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> > [ 2276.033319]  [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033322]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033326]  [<ffffffff80476d6c>] down_write+0x43/0x77
> > [ 2276.033329]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033333]  [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > [ 2276.033337]  [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > [ 2276.033340]  [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > [ 2276.033344]  [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > [ 2276.033347]  [<ffffffff80465097>] _cpu_down+0x86/0x298
> > [ 2276.033350]  [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > [ 2276.033350]  [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > [ 2276.033350]  [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > [ 2276.033350]  [<ffffffff802705f0>] state_store+0x59/0xd7
> > [ 2276.033350]  [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > [ 2276.033350]  [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > [ 2276.033350]  [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > [ 2276.033350]  [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > [ 2276.033350]  [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > [ 2276.034478] Broke affinity for irq 21
> > [ 2276.034478] Broke affinity for irq 22
> > [ 2276.034478] Broke affinity for irq 29
> > [ 2276.035812] CPU 1 is now offline
> > [ 2276.035814] lockdep: fixing up alternatives.
> > [ 2276.035816] SMP alternatives: switching to UP code
> > [ 2276.158329] CPU0 attaching NULL sched-domain.
> > [ 2276.158334] CPU1 attaching NULL sched-domain.
> > [ 2276.158361] CPU0 attaching NULL sched-domain.
> > [ 2276.158841] CPU1 is down
> > [ 2276.158867] Checking wake-up interrupts
> > [ 2276.158883] Suspending System Devices
> > [ 2276.158885] Suspending type 'oprofile':
> > [ 2276.158886]  oprofile0
> > [ 2276.158888] Suspending type 'edac':
> > [ 2276.158889] Suspending type 'irqrouter':
> > [ 2276.158891]  irqrouter0
> > [ 2276.158892] Suspending type 'clocksource':
> > [ 2276.158894]  clocksource0
> > [ 2276.158895] Suspending type 'timekeeping':
> > [ 2276.158897]  timekeeping0
> > [ 2276.158929] Extended CMOS year: 2000
> > [ 2276.158937] Suspending type 'ioapic':
> > [ 2276.158938]  ioapic0
> > [ 2276.159007] Suspending type 'lapic':
> > [ 2276.159007]  lapic0
> > [ 2276.159007] Suspending type 'machinecheck':
> > [ 2276.159007]  machinecheck0
> > [ 2276.159007] Suspending type 'i8237':
> > [ 2276.159007]  i82370
> > [ 2276.159007] Suspending type 'i8259':
> > [ 2276.159007]  i82590
> > [ 2276.159007] Suspending type 'node':
> > [ 2276.159007]  node0
> > [ 2276.159007] Suspending type 'cpu':
> > [ 2276.159007]  cpu0
> > [ 2276.159007]  cpu1
> > [ 2276.159007] PM: Creating hibernation image:
> > [ 2276.159007] PM: Need to copy 112986 pages
> > [ 2276.159007] PM: Restoring platform NVS memory
> > [ 2276.159007] Resuming System Devices
> > [ 2276.159007] Resuming type 'cpu':
> > [ 2276.159007]  cpu0
> > [ 2276.159007]  cpu1
> > [ 2276.159007] Resuming type 'node':
> > [ 2276.159007]  node0
> > [ 2276.159007] Resuming type 'i8259':
> > [ 2276.159007]  i82590
> > [ 2276.159007] Resuming type 'i8237':
> > [ 2276.159007]  i82370
> > [ 2276.159007] Resuming type 'machinecheck':
> > [ 2276.159007]  machinecheck0
> > [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> > [ 2276.159007] Resuming type 'lapic':
> > [ 2276.159007]  lapic0
> > [ 2276.159007] Resuming type 'ioapic':
> > [ 2276.159007]  ioapic0
> > [ 2276.159007] Resuming type 'timekeeping':
> > [ 2276.159007]  timekeeping0
> > [ 2276.159007] Extended CMOS year: 2000
> > [ 2276.159007] Resuming type 'clocksource':
> > [ 2276.159007]  clocksource0
> > [ 2276.159007] Resuming type 'irqrouter':
> > [ 2276.159007]  irqrouter0
> > [ 2276.159007] Resuming type 'edac':
> > [ 2276.159007] Resuming type 'oprofile':
> > [ 2276.159007]  oprofile0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-05 13:44   ` Ingo Molnar
@ 2009-04-06  0:55     ` Gautham R Shenoy
  2009-04-06 13:29       ` Rafael J. Wysocki
  2009-04-07  4:26     ` Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: Gautham R Shenoy @ 2009-04-06  0:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rusty Russell, Ming Lei,
	Andrew Morton, Linux-pm mailing list, Linux Kernel List,
	Venkatesh Pallipadi

On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Sunday 05 April 2009, Ming Lei wrote:
> > > kernel version : one simple usb-serial patch against  commit
> > > 6bb597507f9839b13498781e481f5458aea33620.
> > > 
> > > Thanks.
> > 
> > Hmm, CPU hotplug again, it seems.
> > 
> > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > Gautham?
> 
> CPU hotplug tends to land on the scheduler people's desk normally.
> 
> But i'm not sure that's the real thing here - key appears to be this 
> work_on_cpu() worklet by the cpufreq code:

Actually, there are two dependency chains here which can lead to a deadlock.
The one we're seeing here is the longer of the two.

If the relevant locks are numbered as follows:
[1]: cpu_policy_rwsem
[2]: work_on_cpu
[3]: cpu_hotplug.lock
[4]: dpm_list_mtx


The individual callpaths are:

1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
                                                                  |
                      work_on_cpu()[2] <-- get_measured_perf() <--|


2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
                                                                     |
                  [4] device_pm_add() <-- ..<-- local_pci_probe() <--|



3) hibernate() --> hibernatioin_snapshot() --> create_image()
                                                          |
       disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
       |
       |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]


The two chains which can deadlock are

a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
and
b) [3] --> [2] --> [4] --> [3]

Ingo,
do_dbs_timer() function of the ondemand governor is run from a per-cpu
workqueue. Hence it is already running on the cpu whose perf counters
we're interested in.

Does it make sense to introduce a get_this_measured_perf() API
for users who are already running on the relevant CPU ?
And have get_measured_perf(cpu) for other users (currently there are
none) ?

Thus, do_dbs_timer() can avoid calling work_on_cpu() thereby preventing
deadlock a) from occuring.

Rafael,
Sorry, I am not well versed with the hibernation code. But does the
following make sense:

create_image()
{
	device_pm_lock();
	 device_power_down(PMSG_FREEZE);
	 platform_pre_snapshot(platform_mode);

        device_pm_unlock();
	disable_nonboot_cpus()
	device_pm_lock();
	.
	.
	.
	.
}



> 
> > > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > > [ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
> > > [ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > > [ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > > [ 2276.033157]        [<ffffffffa02b443a>]
> > > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > > [ 2276.033161]        [<ffffffffa02b4b69>]
> > > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > > [ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> 
> Connecting things the wrong way around. Have not looked deeply 
> though - Peter, Rusty, what's your take? (below is the message 
> again)
> 
> 	Ingo
> 
> > > [ 2276.031486] PM: Saving platform NVS memory
> > > [ 2276.032410] Disabling non-boot CPUs ...
> > > [ 2276.032879]
> > > [ 2276.032880] =======================================================
> > > [ 2276.032882] [ INFO: possible circular locking dependency detected ]
> > > [ 2276.032885] 2.6.29-08003-g8b58977 #84
> > > [ 2276.032886] -------------------------------------------------------
> > > [ 2276.032888] pm-hibernate/5061 is trying to acquire lock:
> > > [ 2276.032891]  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
> > > [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.032900]
> > > [ 2276.032900] but task is already holding lock:
> > > [ 2276.032902]  (&cpu_hotplug.lock){+.+.+.}, at: [<ffffffff80240d38>]
> > > cpu_hotplug_begin+0x22/0x50
> > > [ 2276.032908]
> > > [ 2276.032909] which lock already depends on the new lock.
> > > [ 2276.032910]
> > > [ 2276.032911]
> > > [ 2276.032912] the existing dependency chain (in reverse order) is:
> > > [ 2276.032913]
> > > [ 2276.032914] -> #6 (&cpu_hotplug.lock){+.+.+.}:
> > > [ 2276.032918]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.032922]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.032926]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.032930]        [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > > [ 2276.032933]        [<ffffffff804728b0>] _cpu_up+0x58/0x12f
> > > [ 2276.032937]        [<ffffffff804729ea>] cpu_up+0x63/0x78
> > > [ 2276.032940]        [<ffffffff806585cb>] kernel_init+0xbd/0x189
> > > [ 2276.032944]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.032948]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.032960]
> > > [ 2276.032961] -> #5 (cpu_add_remove_lock){+.+.+.}:
> > > [ 2276.032964]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.032968]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.032971]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.032974]        [<ffffffff80240d02>] cpu_maps_update_begin+0x12/0x14
> > > [ 2276.032978]        [<ffffffff80250526>] __create_workqueue_key+0xdc/0x1a2
> > > [ 2276.032982]        [<ffffffff80276eaf>] stop_machine_create+0x3a/0x9b
> > > [ 2276.032987]        [<ffffffff8026d959>] load_module+0x58/0x18c8
> > > [ 2276.032990]        [<ffffffff8026f320>] sys_init_module+0x57/0x1cd
> > > [ 2276.032993]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.032998]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033001]
> > > [ 2276.033001] -> #4 (setup_lock){+.+.+.}:
> > > [ 2276.033005]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033008]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033012]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033015]        [<ffffffff80276e87>] stop_machine_create+0x12/0x9b
> > > [ 2276.033018]        [<ffffffff80240d7c>] disable_nonboot_cpus+0x16/0x120
> > > [ 2276.033022]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033025]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033028]        [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033032]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033036]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033040]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033044]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033047]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033050]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033053]
> > > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > > [ 2276.033057]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > > [ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
> > > [ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
> > > [ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > > [ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > > [ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > > [ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > > [ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > > [ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033111]
> > > [ 2276.033111] -> #2 (&wfc.work){+.+.+.}:
> > > [ 2276.033114]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033118]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033122]        [<ffffffff80250032>] worker_thread+0x1ac/0x2c9
> > > [ 2276.033125]        [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033128]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033131]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033134]
> > > [ 2276.033134] -> #1 (work_on_cpu){+.+.+.}:
> > > [ 2276.033138]        [<ffffffff80265579>] __lock_acquire+0x1402/0x178c
> > > [ 2276.033141]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033145]        [<ffffffff80250885>] flush_work+0x60/0xeb
> > > [ 2276.033148]        [<ffffffff80250c49>] work_on_cpu+0x73/0x84
> > > [ 2276.033151]        [<ffffffffa02b4399>] get_cur_val+0xb6/0xe3 [acpi_cpufreq]
> > > [ 2276.033157]        [<ffffffffa02b443a>]
> > > get_cur_freq_on_cpu+0x74/0xc5 [acpi_cpufreq]
> > > [ 2276.033161]        [<ffffffffa02b4b69>]
> > > acpi_cpufreq_cpu_init+0x381/0x4c4 [acpi_cpufreq]
> > > [ 2276.033166]        [<ffffffff803db697>] cpufreq_add_dev+0x1bb/0x676
> > > [ 2276.033169]        [<ffffffff803bea90>] sysdev_driver_register+0xc6/0x120
> > > [ 2276.033173]        [<ffffffff803da5c1>] cpufreq_register_driver+0xbc/0x1ae
> > > [ 2276.033177]        [<ffffffffa00c60c5>] 0xffffffffa00c60c5
> > > [ 2276.033180]        [<ffffffff80209076>] do_one_initcall+0x70/0x185
> > > [ 2276.033183]        [<ffffffff8026f374>] sys_init_module+0xab/0x1cd
> > > [ 2276.033187]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033190]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033197]
> > > [ 2276.033197] -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> > > [ 2276.033201]        [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > > [ 2276.033204]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033208]        [<ffffffff80476d6c>] down_write+0x43/0x77
> > > [ 2276.033211]        [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033215]        [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > > [ 2276.033218]        [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > > [ 2276.033222]        [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > > [ 2276.033226]        [<ffffffff80465097>] _cpu_down+0x86/0x298
> > > [ 2276.033229]        [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > > [ 2276.033233]        [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033236]        [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033239]        [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033242]        [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033246]        [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033249]        [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033252]        [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033255]        [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.033259]        [<ffffffffffffffff>] 0xffffffffffffffff
> > > [ 2276.033262]
> > > [ 2276.033262] other info that might help us debug this:
> > > [ 2276.033263]
> > > [ 2276.033265] 5 locks held by pm-hibernate/5061:
> > > [ 2276.033267]  #0:  (&buffer->mutex){+.+.+.}, at:
> > > [<ffffffff8030f724>] sysfs_write_file+0x38/0x119
> > > [ 2276.033273]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff80271abe>]
> > > hibernate+0x17/0x1cf
> > > [ 2276.033278]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff803c35c5>]
> > > device_pm_lock+0x12/0x14
> > > [ 2276.033284]  #3:  (cpu_add_remove_lock){+.+.+.}, at:
> > > [<ffffffff80240d99>] disable_nonboot_cpus+0x33/0x120
> > > [ 2276.033289]  #4:  (&cpu_hotplug.lock){+.+.+.}, at:
> > > [<ffffffff80240d38>] cpu_hotplug_begin+0x22/0x50
> > > [ 2276.033294]
> > > [ 2276.033295] stack backtrace:
> > > [ 2276.033297] Pid: 5061, comm: pm-hibernate Not tainted
> > > 2.6.29-08003-g8b58977 #84
> > > [ 2276.033300] Call Trace:
> > > [ 2276.033304]  [<ffffffff80263cbf>] print_circular_bug_tail+0xc5/0xd0
> > > [ 2276.033307]  [<ffffffff80265296>] __lock_acquire+0x111f/0x178c
> > > [ 2276.033312]  [<ffffffff802382b0>] ? sub_preempt_count+0xdc/0xf1
> > > [ 2276.033315]  [<ffffffff80263470>] ? trace_hardirqs_on+0xd/0xf
> > > [ 2276.033319]  [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033322]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033326]  [<ffffffff80476d6c>] down_write+0x43/0x77
> > > [ 2276.033329]  [<ffffffff803db46e>] ? lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033333]  [<ffffffff803db46e>] lock_policy_rwsem_write+0x48/0x79
> > > [ 2276.033337]  [<ffffffff80474678>] cpufreq_cpu_callback+0x52/0x7a
> > > [ 2276.033340]  [<ffffffff80257ebb>] notifier_call_chain+0x33/0x5b
> > > [ 2276.033344]  [<ffffffff80257f42>] __raw_notifier_call_chain+0x9/0xb
> > > [ 2276.033347]  [<ffffffff80465097>] _cpu_down+0x86/0x298
> > > [ 2276.033350]  [<ffffffff80240dde>] disable_nonboot_cpus+0x78/0x120
> > > [ 2276.033350]  [<ffffffff80271994>] hibernation_snapshot+0x99/0x1ac
> > > [ 2276.033350]  [<ffffffff80271b52>] hibernate+0xab/0x1cf
> > > [ 2276.033350]  [<ffffffff802705f0>] state_store+0x59/0xd7
> > > [ 2276.033350]  [<ffffffff8034788f>] kobj_attr_store+0x17/0x19
> > > [ 2276.033350]  [<ffffffff8030f7d0>] sysfs_write_file+0xe4/0x119
> > > [ 2276.033350]  [<ffffffff802c0d14>] vfs_write+0xae/0x137
> > > [ 2276.033350]  [<ffffffff802c0e61>] sys_write+0x47/0x6e
> > > [ 2276.033350]  [<ffffffff8020b0c2>] system_call_fastpath+0x16/0x1b
> > > [ 2276.034478] Broke affinity for irq 21
> > > [ 2276.034478] Broke affinity for irq 22
> > > [ 2276.034478] Broke affinity for irq 29
> > > [ 2276.035812] CPU 1 is now offline
> > > [ 2276.035814] lockdep: fixing up alternatives.
> > > [ 2276.035816] SMP alternatives: switching to UP code
> > > [ 2276.158329] CPU0 attaching NULL sched-domain.
> > > [ 2276.158334] CPU1 attaching NULL sched-domain.
> > > [ 2276.158361] CPU0 attaching NULL sched-domain.
> > > [ 2276.158841] CPU1 is down
> > > [ 2276.158867] Checking wake-up interrupts
> > > [ 2276.158883] Suspending System Devices
> > > [ 2276.158885] Suspending type 'oprofile':
> > > [ 2276.158886]  oprofile0
> > > [ 2276.158888] Suspending type 'edac':
> > > [ 2276.158889] Suspending type 'irqrouter':
> > > [ 2276.158891]  irqrouter0
> > > [ 2276.158892] Suspending type 'clocksource':
> > > [ 2276.158894]  clocksource0
> > > [ 2276.158895] Suspending type 'timekeeping':
> > > [ 2276.158897]  timekeeping0
> > > [ 2276.158929] Extended CMOS year: 2000
> > > [ 2276.158937] Suspending type 'ioapic':
> > > [ 2276.158938]  ioapic0
> > > [ 2276.159007] Suspending type 'lapic':
> > > [ 2276.159007]  lapic0
> > > [ 2276.159007] Suspending type 'machinecheck':
> > > [ 2276.159007]  machinecheck0
> > > [ 2276.159007] Suspending type 'i8237':
> > > [ 2276.159007]  i82370
> > > [ 2276.159007] Suspending type 'i8259':
> > > [ 2276.159007]  i82590
> > > [ 2276.159007] Suspending type 'node':
> > > [ 2276.159007]  node0
> > > [ 2276.159007] Suspending type 'cpu':
> > > [ 2276.159007]  cpu0
> > > [ 2276.159007]  cpu1
> > > [ 2276.159007] PM: Creating hibernation image:
> > > [ 2276.159007] PM: Need to copy 112986 pages
> > > [ 2276.159007] PM: Restoring platform NVS memory
> > > [ 2276.159007] Resuming System Devices
> > > [ 2276.159007] Resuming type 'cpu':
> > > [ 2276.159007]  cpu0
> > > [ 2276.159007]  cpu1
> > > [ 2276.159007] Resuming type 'node':
> > > [ 2276.159007]  node0
> > > [ 2276.159007] Resuming type 'i8259':
> > > [ 2276.159007]  i82590
> > > [ 2276.159007] Resuming type 'i8237':
> > > [ 2276.159007]  i82370
> > > [ 2276.159007] Resuming type 'machinecheck':
> > > [ 2276.159007]  machinecheck0
> > > [ 2276.159007] CPU0: Thermal LVT vector (0xfa) already installed
> > > [ 2276.159007] Resuming type 'lapic':
> > > [ 2276.159007]  lapic0
> > > [ 2276.159007] Resuming type 'ioapic':
> > > [ 2276.159007]  ioapic0
> > > [ 2276.159007] Resuming type 'timekeeping':
> > > [ 2276.159007]  timekeeping0
> > > [ 2276.159007] Extended CMOS year: 2000
> > > [ 2276.159007] Resuming type 'clocksource':
> > > [ 2276.159007]  clocksource0
> > > [ 2276.159007] Resuming type 'irqrouter':
> > > [ 2276.159007]  irqrouter0
> > > [ 2276.159007] Resuming type 'edac':
> > > [ 2276.159007] Resuming type 'oprofile':
> > > [ 2276.159007]  oprofile0

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-06  0:55     ` Gautham R Shenoy
@ 2009-04-06 13:29       ` Rafael J. Wysocki
  2009-04-06 14:24         ` Gautham R Shenoy
  2009-04-06 14:37         ` [linux-pm] " Alan Stern
  0 siblings, 2 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2009-04-06 13:29 UTC (permalink / raw)
  To: ego
  Cc: Ingo Molnar, Peter Zijlstra, Rusty Russell, Ming Lei,
	Andrew Morton, Linux-pm mailing list, Linux Kernel List,
	Venkatesh Pallipadi

On Monday 06 April 2009, Gautham R Shenoy wrote:
> On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > kernel version : one simple usb-serial patch against  commit
> > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > 
> > > > Thanks.
> > > 
> > > Hmm, CPU hotplug again, it seems.
> > > 
> > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > Gautham?
> > 
> > CPU hotplug tends to land on the scheduler people's desk normally.
> > 
> > But i'm not sure that's the real thing here - key appears to be this 
> > work_on_cpu() worklet by the cpufreq code:
> 
> Actually, there are two dependency chains here which can lead to a deadlock.
> The one we're seeing here is the longer of the two.
> 
> If the relevant locks are numbered as follows:
> [1]: cpu_policy_rwsem
> [2]: work_on_cpu
> [3]: cpu_hotplug.lock
> [4]: dpm_list_mtx
> 
> 
> The individual callpaths are:
> 
> 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
>                                                                   |
>                       work_on_cpu()[2] <-- get_measured_perf() <--|
> 
> 
> 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
>                                                                      |
>                   [4] device_pm_add() <-- ..<-- local_pci_probe() <--|

This should block on [4] held by hibernate().  That's why it calls
device_pm_lock() after all.

> 3) hibernate() --> hibernatioin_snapshot() --> create_image()
>                                                           |
>        disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
>        |
>        |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> 
> 
> The two chains which can deadlock are
> 
> a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> and
> b) [3] --> [2] --> [4] --> [3]

What exactly is the b) scenario?

> Ingo,
> do_dbs_timer() function of the ondemand governor is run from a per-cpu
> workqueue. Hence it is already running on the cpu whose perf counters
> we're interested in.
> 
> Does it make sense to introduce a get_this_measured_perf() API
> for users who are already running on the relevant CPU ?
> And have get_measured_perf(cpu) for other users (currently there are
> none) ?
> 
> Thus, do_dbs_timer() can avoid calling work_on_cpu() thereby preventing
> deadlock a) from occuring.
> 
> Rafael,
> Sorry, I am not well versed with the hibernation code. But does the
> following make sense:

Not really ->

> create_image()
> {
> 	device_pm_lock();
> 	 device_power_down(PMSG_FREEZE);
> 	 platform_pre_snapshot(platform_mode);
> 
>         device_pm_unlock();

-> because dpm_list is under control of the hibernation code at this point
and it should remain locked.

> 	disable_nonboot_cpus()

disable_nonboot_cpus() must not take dpm_list_mtx itself.

> 	device_pm_lock();
> 	.
> 	.
> 	.
> 	.
> }

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-06 13:29       ` Rafael J. Wysocki
@ 2009-04-06 14:24         ` Gautham R Shenoy
  2009-04-06 15:25           ` Rafael J. Wysocki
  2009-04-06 14:37         ` [linux-pm] " Alan Stern
  1 sibling, 1 reply; 20+ messages in thread
From: Gautham R Shenoy @ 2009-04-06 14:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Peter Zijlstra, Rusty Russell, Ming Lei,
	Andrew Morton, Linux-pm mailing list, Linux Kernel List,
	Venkatesh Pallipadi

On Mon, Apr 06, 2009 at 03:29:43PM +0200, Rafael J. Wysocki wrote:
> On Monday 06 April 2009, Gautham R Shenoy wrote:
> > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > 
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > 
> > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > kernel version : one simple usb-serial patch against  commit
> > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > 
> > > > > Thanks.
> > > > 
> > > > Hmm, CPU hotplug again, it seems.
> > > > 
> > > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > > Gautham?
> > > 
> > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > 
> > > But i'm not sure that's the real thing here - key appears to be this 
> > > work_on_cpu() worklet by the cpufreq code:
> > 
> > Actually, there are two dependency chains here which can lead to a deadlock.
> > The one we're seeing here is the longer of the two.
> > 
> > If the relevant locks are numbered as follows:
> > [1]: cpu_policy_rwsem
> > [2]: work_on_cpu
> > [3]: cpu_hotplug.lock
> > [4]: dpm_list_mtx
> > 
> > 
> > The individual callpaths are:
> > 
> > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> >                                                                   |
> >                       work_on_cpu()[2] <-- get_measured_perf() <--|
> > 
> > 
> > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> >                                                                      |
> >                   [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> 
> This should block on [4] held by hibernate().  That's why it calls
> device_pm_lock() after all.

Agreed. But it does so holding the cpu_hotplug.lock at pci_call_probe().
See below.

> 
> > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> >                                                           |
> >        disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> >        |
> >        |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > 
> > 
> > The two chains which can deadlock are
> > 
> > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > and
> > b) [3] --> [2] --> [4] --> [3]
> 
> What exactly is the b) scenario?

pci_call_probe() calls work_on_cpu() within
get_online_cpus()/put_online_cpus(), the cpu hotplug read path.
Thus we have a cpu_hotplug.lock --> work_on_cpu dependency here.

This work_on_cpu() calls local_pci_probe() which, in one of the
registration paths calls pcie_portdrv_probe(). This would
eventually end up calling device_pm_add() which takes the
dpm_list_mtx. Thus we have a work_on_cpu --> dpm_list_mtx
dependency here. This is reflected in the lockdep log for dpm_list_mtx:

> > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > [ 2276.033057]        [<ffffffff80265579>]  __lock_acquire+0x1402/0x178c
> > [ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > [ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > [ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > [ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
> > [ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
> > [ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > [ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > [ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > [ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > [ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > [ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
> > [ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > [ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff

The dependency chain on this device_registration path would be
cpu_hotplug.lock --> work_on_cpu --> dpm_list_mtx.

On the hibernate path, we hold the dpm_list_mtx and call
disable_nonboot_cpus() in create_image().
disable_nonboot_cpus() calls _cpu_down() which again takes the
cpu_hotplug.lock, this time the write-path. Thus we have a
dpm_list_mtx --> cpu_hotplug.lock dependency here.

These two dependency chains being in reverse order can cause a
dead-lock, right ? Or am I reading something wrong here?

> 
> > 
> > Rafael,
> > Sorry, I am not well versed with the hibernation code. But does the
> > following make sense:
> 
> Not really ->
> 
> > create_image()
> > {
> > 	device_pm_lock();
> > 	 device_power_down(PMSG_FREEZE);
> > 	 platform_pre_snapshot(platform_mode);
> > 
> >         device_pm_unlock();
> 
> -> because dpm_list is under control of the hibernation code at this point
> and it should remain locked.
> 
> > 	disable_nonboot_cpus()
> 
> disable_nonboot_cpus() must not take dpm_list_mtx itself.
> 
> > 	device_pm_lock();
> > 	.
> > 	.
> > 	.
> > 	.
> > }
> 
> Thanks,
> Rafael

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-06 14:24         ` Gautham R Shenoy
@ 2009-04-06 15:25           ` Rafael J. Wysocki
  0 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2009-04-06 15:25 UTC (permalink / raw)
  To: ego
  Cc: Ingo Molnar, Peter Zijlstra, Rusty Russell, Ming Lei,
	Andrew Morton, Linux-pm mailing list, Linux Kernel List,
	Venkatesh Pallipadi, Alan Stern

On Monday 06 April 2009, Gautham R Shenoy wrote:
> On Mon, Apr 06, 2009 at 03:29:43PM +0200, Rafael J. Wysocki wrote:
> > On Monday 06 April 2009, Gautham R Shenoy wrote:
> > > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > > 
> > > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > > kernel version : one simple usb-serial patch against  commit
> > > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > > 
> > > > > > Thanks.
> > > > > 
> > > > > Hmm, CPU hotplug again, it seems.
> > > > > 
> > > > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > > > Gautham?
> > > > 
> > > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > > 
> > > > But i'm not sure that's the real thing here - key appears to be this 
> > > > work_on_cpu() worklet by the cpufreq code:
> > > 
> > > Actually, there are two dependency chains here which can lead to a deadlock.
> > > The one we're seeing here is the longer of the two.
> > > 
> > > If the relevant locks are numbered as follows:
> > > [1]: cpu_policy_rwsem
> > > [2]: work_on_cpu
> > > [3]: cpu_hotplug.lock
> > > [4]: dpm_list_mtx
> > > 
> > > 
> > > The individual callpaths are:
> > > 
> > > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > >                                                                   |
> > >                       work_on_cpu()[2] <-- get_measured_perf() <--|
> > > 
> > > 
> > > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > >                                                                      |
> > >                   [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> > 
> > This should block on [4] held by hibernate().  That's why it calls
> > device_pm_lock() after all.
> 
> Agreed. But it does so holding the cpu_hotplug.lock at pci_call_probe().
> See below.
> 
> > 
> > > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > >                                                           |
> > >        disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > >        |
> > >        |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > > 
> > > 
> > > The two chains which can deadlock are
> > > 
> > > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > > and
> > > b) [3] --> [2] --> [4] --> [3]
> > 
> > What exactly is the b) scenario?
> 
> pci_call_probe() calls work_on_cpu() within
> get_online_cpus()/put_online_cpus(), the cpu hotplug read path.
> Thus we have a cpu_hotplug.lock --> work_on_cpu dependency here.
> 
> This work_on_cpu() calls local_pci_probe() which, in one of the
> registration paths calls pcie_portdrv_probe(). This would
> eventually end up calling device_pm_add() which takes the
> dpm_list_mtx. Thus we have a work_on_cpu --> dpm_list_mtx
> dependency here. This is reflected in the lockdep log for dpm_list_mtx:
> 
> > > [ 2276.033054] -> #3 (dpm_list_mtx){+.+.+.}:
> > > [ 2276.033057]        [<ffffffff80265579>]  __lock_acquire+0x1402/0x178c
> > > [ 2276.033061]        [<ffffffff80265996>] lock_acquire+0x93/0xbf
> > > [ 2276.033065]        [<ffffffff804763db>] mutex_lock_nested+0x6a/0x362
> > > [ 2276.033068]        [<ffffffff803c4339>] device_pm_add+0x46/0xed
> > > [ 2276.033073]        [<ffffffff803bdeee>] device_add+0x488/0x61a
> > > [ 2276.033077]        [<ffffffff803be099>] device_register+0x19/0x1d
> > > [ 2276.033080]        [<ffffffff8036031a>] pcie_port_device_register+0x450/0x4b6
> > > [ 2276.033085]        [<ffffffff80469999>] pcie_portdrv_probe+0x69/0x82
> > > [ 2276.033089]        [<ffffffff8035bf77>] local_pci_probe+0x12/0x16
> > > [ 2276.033093]        [<ffffffff8024fdf8>] do_work_for_cpu+0x13/0x1b
> > > [ 2276.033097]        [<ffffffff80250038>] worker_thread+0x1b2/0x2c9
> > > [ 2276.033100]        [<ffffffff80253d40>] kthread+0x49/0x76
> > > [ 2276.033104]        [<ffffffff8020c1fa>] child_rip+0xa/0x20
> > > [ 2276.033107]        [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The dependency chain on this device_registration path would be
> cpu_hotplug.lock --> work_on_cpu --> dpm_list_mtx.
> 
> On the hibernate path, we hold the dpm_list_mtx and call
> disable_nonboot_cpus() in create_image().
> disable_nonboot_cpus() calls _cpu_down() which again takes the
> cpu_hotplug.lock, this time the write-path. Thus we have a
> dpm_list_mtx --> cpu_hotplug.lock dependency here.
> 
> These two dependency chains being in reverse order can cause a
> dead-lock, right ? Or am I reading something wrong here?

Please read the Alan Stern's reply to my message.

In short, the two code paths that cause the lockdep violation to happen
cannot run siumltaneously, because the prepare() stage of hibernation blocks
device probing altogether.  So, this is a lockdep violation, but not a deadlock
scenario.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [linux-pm] pm-hibernate : possible circular locking dependency detected
  2009-04-06 13:29       ` Rafael J. Wysocki
  2009-04-06 14:24         ` Gautham R Shenoy
@ 2009-04-06 14:37         ` Alan Stern
  2009-04-06 15:20           ` Gautham R Shenoy
  1 sibling, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-04-06 14:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ego, Peter Zijlstra, Rusty Russell, Linux Kernel List,
	Linux-pm mailing list, Ingo Molnar, Andrew Morton

On Mon, 6 Apr 2009, Rafael J. Wysocki wrote:

> On Monday 06 April 2009, Gautham R Shenoy wrote:
> > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > 
> > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > 
> > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > kernel version : one simple usb-serial patch against  commit
> > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > 
> > > > > Thanks.
> > > > 
> > > > Hmm, CPU hotplug again, it seems.
> > > > 
> > > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > > Gautham?
> > > 
> > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > 
> > > But i'm not sure that's the real thing here - key appears to be this 
> > > work_on_cpu() worklet by the cpufreq code:
> > 
> > Actually, there are two dependency chains here which can lead to a deadlock.
> > The one we're seeing here is the longer of the two.
> > 
> > If the relevant locks are numbered as follows:
> > [1]: cpu_policy_rwsem
> > [2]: work_on_cpu
> > [3]: cpu_hotplug.lock
> > [4]: dpm_list_mtx
> > 
> > 
> > The individual callpaths are:
> > 
> > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> >                                                                   |
> >                       work_on_cpu()[2] <-- get_measured_perf() <--|
> > 
> > 
> > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> >                                                                      |
> >                   [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> 
> This should block on [4] held by hibernate().  That's why it calls
> device_pm_lock() after all.
> 
> > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> >                                                           |
> >        disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> >        |
> >        |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > 
> > 
> > The two chains which can deadlock are
> > 
> > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > and
> > b) [3] --> [2] --> [4] --> [3]
> 
> What exactly is the b) scenario?

If I understand correctly it isn't really a deadlock scenario, but it
is a lockdep violation.  The violation is:

	The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
	be acquired while cpu_hotplug.lock [3] is held;

	The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
	acquired while dpm_list_mtx [4] is held.

The two pathways cannot run simultaneously (and hence cannot deadlock) 
because the prepare() stage of hibernation is supposed to stop all 
device probing.  But lockdep will still report a problem.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [linux-pm] pm-hibernate : possible circular locking dependency detected
  2009-04-06 14:37         ` [linux-pm] " Alan Stern
@ 2009-04-06 15:20           ` Gautham R Shenoy
  2009-04-06 18:42             ` Alan Stern
  0 siblings, 1 reply; 20+ messages in thread
From: Gautham R Shenoy @ 2009-04-06 15:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rusty Russell,
	Linux Kernel List, Linux-pm mailing list, Ingo Molnar,
	Andrew Morton

On Mon, Apr 06, 2009 at 10:37:10AM -0400, Alan Stern wrote:
> On Mon, 6 Apr 2009, Rafael J. Wysocki wrote:
> 
> > On Monday 06 April 2009, Gautham R Shenoy wrote:
> > > On Sun, Apr 05, 2009 at 03:44:54PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > > 
> > > > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > > > kernel version : one simple usb-serial patch against  commit
> > > > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > > > 
> > > > > > Thanks.
> > > > > 
> > > > > Hmm, CPU hotplug again, it seems.
> > > > > 
> > > > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > > > Gautham?
> > > > 
> > > > CPU hotplug tends to land on the scheduler people's desk normally.
> > > > 
> > > > But i'm not sure that's the real thing here - key appears to be this 
> > > > work_on_cpu() worklet by the cpufreq code:
> > > 
> > > Actually, there are two dependency chains here which can lead to a deadlock.
> > > The one we're seeing here is the longer of the two.
> > > 
> > > If the relevant locks are numbered as follows:
> > > [1]: cpu_policy_rwsem
> > > [2]: work_on_cpu
> > > [3]: cpu_hotplug.lock
> > > [4]: dpm_list_mtx
> > > 
> > > 
> > > The individual callpaths are:
> > > 
> > > 1) do_dbs_timer()[1] --> dbs_check_cpu() --> __cpufreq_driver_getavg()
> > >                                                                   |
> > >                       work_on_cpu()[2] <-- get_measured_perf() <--|
> > > 
> > > 
> > > 2) pci_device_probe() --> .. --> pci_call_probe() [3] --> work_on_cpu()[2]
> > >                                                                      |
> > >                   [4] device_pm_add() <-- ..<-- local_pci_probe() <--|
> > 
> > This should block on [4] held by hibernate().  That's why it calls
> > device_pm_lock() after all.
> > 
> > > 3) hibernate() --> hibernatioin_snapshot() --> create_image()
> > >                                                           |
> > >        disable_nonboot_cpus() <-- [4] device_pm_lock() <--|
> > >        |
> > >        |--> _cpu_down() [3] --> cpufreq_cpu_callback() [1]
> > > 
> > > 
> > > The two chains which can deadlock are
> > > 
> > > a) [1] --> [2] --> [4] --> [3] --> [1] (The one in this log)
> > > and
> > > b) [3] --> [2] --> [4] --> [3]
> > 
> > What exactly is the b) scenario?
> 
> If I understand correctly it isn't really a deadlock scenario, but it
> is a lockdep violation.  The violation is:
> 
> 	The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> 	be acquired while cpu_hotplug.lock [3] is held;
> 
> 	The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> 	acquired while dpm_list_mtx [4] is held.
> 
> The two pathways cannot run simultaneously (and hence cannot deadlock) 
> because the prepare() stage of hibernation is supposed to stop all 
> device probing.  But lockdep will still report a problem.

Thanks for clarifying this Alan. I guess it boils down to teaching
lockdep about this false-positive.

> 
> Alan Stern

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [linux-pm] pm-hibernate : possible circular locking dependency detected
  2009-04-06 15:20           ` Gautham R Shenoy
@ 2009-04-06 18:42             ` Alan Stern
  2009-04-06 19:58               ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Stern @ 2009-04-06 18:42 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rusty Russell,
	Linux Kernel List, Linux-pm mailing list, Ingo Molnar,
	Andrew Morton

On Mon, 6 Apr 2009, Gautham R Shenoy wrote:

> > If I understand correctly it isn't really a deadlock scenario, but it
> > is a lockdep violation.  The violation is:
> > 
> > 	The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > 	be acquired while cpu_hotplug.lock [3] is held;
> > 
> > 	The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > 	acquired while dpm_list_mtx [4] is held.
> > 
> > The two pathways cannot run simultaneously (and hence cannot deadlock) 
> > because the prepare() stage of hibernation is supposed to stop all 
> > device probing.  But lockdep will still report a problem.
> 
> Thanks for clarifying this Alan. I guess it boils down to teaching
> lockdep about this false-positive.

Or else changing the code somehow to avoid the violation completely.  
But I have no idea how...  And AFAIK, teaching lockdep about special 
cases like this is not so easy to do.

Alan Stern


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [linux-pm] pm-hibernate : possible circular locking dependency detected
  2009-04-06 18:42             ` Alan Stern
@ 2009-04-06 19:58               ` Rafael J. Wysocki
  2009-04-07  8:43                 ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2009-04-06 19:58 UTC (permalink / raw)
  To: Alan Stern, Peter Zijlstra
  Cc: Gautham R Shenoy, Rusty Russell, Linux Kernel List,
	Linux-pm mailing list, Ingo Molnar, Andrew Morton

On Monday 06 April 2009, Alan Stern wrote:
> On Mon, 6 Apr 2009, Gautham R Shenoy wrote:
> 
> > > If I understand correctly it isn't really a deadlock scenario, but it
> > > is a lockdep violation.  The violation is:
> > > 
> > > 	The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > > 	be acquired while cpu_hotplug.lock [3] is held;
> > > 
> > > 	The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > > 	acquired while dpm_list_mtx [4] is held.
> > > 
> > > The two pathways cannot run simultaneously (and hence cannot deadlock) 
> > > because the prepare() stage of hibernation is supposed to stop all 
> > > device probing.  But lockdep will still report a problem.
> > 
> > Thanks for clarifying this Alan. I guess it boils down to teaching
> > lockdep about this false-positive.
> 
> Or else changing the code somehow to avoid the violation completely.  
> But I have no idea how...  And AFAIK, teaching lockdep about special 
> cases like this is not so easy to do.

Yeah, I've just wanted to ask about that.  Peter, how can we do it?

Rafael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [linux-pm] pm-hibernate : possible circular locking dependency detected
  2009-04-06 19:58               ` Rafael J. Wysocki
@ 2009-04-07  8:43                 ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2009-04-07  8:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Gautham R Shenoy, Rusty Russell, Linux Kernel List,
	Linux-pm mailing list, Ingo Molnar, Andrew Morton

On Mon, 2009-04-06 at 21:58 +0200, Rafael J. Wysocki wrote:
> On Monday 06 April 2009, Alan Stern wrote:
> > On Mon, 6 Apr 2009, Gautham R Shenoy wrote:
> > 
> > > > If I understand correctly it isn't really a deadlock scenario, but it
> > > > is a lockdep violation.  The violation is:
> > > > 
> > > > 	The pci_device_probe() path 2) proves that dpm_list_mtx [4] can
> > > > 	be acquired while cpu_hotplug.lock [3] is held;
> > > > 
> > > > 	The hibernate() path 3) proves that cpu_hotplug.lock [3] can be
> > > > 	acquired while dpm_list_mtx [4] is held.
> > > > 
> > > > The two pathways cannot run simultaneously (and hence cannot deadlock) 
> > > > because the prepare() stage of hibernation is supposed to stop all 
> > > > device probing.  But lockdep will still report a problem.
> > > 
> > > Thanks for clarifying this Alan. I guess it boils down to teaching
> > > lockdep about this false-positive.
> > 
> > Or else changing the code somehow to avoid the violation completely.  
> > But I have no idea how...  And AFAIK, teaching lockdep about special 
> > cases like this is not so easy to do.
> 
> Yeah, I've just wanted to ask about that.  Peter, how can we do it?

I think it would come down to modeling that blocking of probes as a lock
or something -- because that's basically what it is.

So on the regular probe path, take a read lock of this lock, and on the
suspend path take it as write or something.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-05 13:44   ` Ingo Molnar
  2009-04-06  0:55     ` Gautham R Shenoy
@ 2009-04-07  4:26     ` Rusty Russell
  2009-04-07  7:05       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-04-07  4:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

On Sunday 05 April 2009 23:14:54 Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Sunday 05 April 2009, Ming Lei wrote:
> > > kernel version : one simple usb-serial patch against  commit
> > > 6bb597507f9839b13498781e481f5458aea33620.
> > > 
> > > Thanks.
> > 
> > Hmm, CPU hotplug again, it seems.
> > 
> > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > Gautham?
> 
> CPU hotplug tends to land on the scheduler people's desk normally.
> 
> But i'm not sure that's the real thing here - key appears to be this 
> work_on_cpu() worklet by the cpufreq code:

Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
patch which I just put out the pull request for.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-07  4:26     ` Rusty Russell
@ 2009-04-07  7:05       ` Peter Zijlstra
  2009-04-08  3:17         ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-04-07  7:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> On Sunday 05 April 2009 23:14:54 Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > On Sunday 05 April 2009, Ming Lei wrote:
> > > > kernel version : one simple usb-serial patch against  commit
> > > > 6bb597507f9839b13498781e481f5458aea33620.
> > > > 
> > > > Thanks.
> > > 
> > > Hmm, CPU hotplug again, it seems.
> > > 
> > > I'm not sure who's the maintainer at the moment.  Andrew, is that 
> > > Gautham?
> > 
> > CPU hotplug tends to land on the scheduler people's desk normally.
> > 
> > But i'm not sure that's the real thing here - key appears to be this 
> > work_on_cpu() worklet by the cpufreq code:
> 
> Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> patch which I just put out the pull request for.

Would it make sense to teach it about a short-circuit like:

work_on_cpu() {

  if (cpumask_weight(current->cpus_allowed) == 1 && 
      smp_processor_id() == cpu)
    return do_work_right_here();

  queue_the_bugger();
}

?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-07  7:05       ` Peter Zijlstra
@ 2009-04-08  3:17         ` Rusty Russell
  2009-04-08 12:26           ` Ingo Molnar
  2009-04-08 12:48           ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Rusty Russell @ 2009-04-08  3:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > patch which I just put out the pull request for.
> 
> Would it make sense to teach it about a short-circuit like:
> 
> work_on_cpu() {
> 
>   if (cpumask_weight(current->cpus_allowed) == 1 && 
>       smp_processor_id() == cpu)
>     return do_work_right_here();

Does that happen much?  I guess put a counter in and see?

Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-08  3:17         ` Rusty Russell
@ 2009-04-08 12:26           ` Ingo Molnar
  2009-04-08 12:48           ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-04-08 12:26 UTC (permalink / raw)
  To: Rusty Russell, Frédéric Weisbecker, Steven Rostedt
  Cc: Peter Zijlstra, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > patch which I just put out the pull request for.
> > 
> > Would it make sense to teach it about a short-circuit like:
> > 
> > work_on_cpu() {
> > 
> >   if (cpumask_weight(current->cpus_allowed) == 1 && 
> >       smp_processor_id() == cpu)
> >     return do_work_right_here();
> 
> Does that happen much?  I guess put a counter in and see?

a temporary tracepoint or trace_printk() for the workqueue tracer 
would also tell this, without any long-term overhead (it will be 
easy to remove it).

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-08  3:17         ` Rusty Russell
  2009-04-08 12:26           ` Ingo Molnar
@ 2009-04-08 12:48           ` Peter Zijlstra
  2009-04-08 23:45             ` Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2009-04-08 12:48 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > patch which I just put out the pull request for.
> > 
> > Would it make sense to teach it about a short-circuit like:
> > 
> > work_on_cpu() {
> > 
> >   if (cpumask_weight(current->cpus_allowed) == 1 && 
> >       smp_processor_id() == cpu)
> >     return do_work_right_here();
> 
> Does that happen much?  I guess put a counter in and see?

Ego spotted the case where cpufreq calls it from an cpu-affine
workqueue, it seems to me in that case its desirable to have the
short-cut, and currently that's needed for correctness too as it will
generate this circular lock thingy.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-08 12:48           ` Peter Zijlstra
@ 2009-04-08 23:45             ` Rusty Russell
  2009-04-09  4:17               ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-04-08 23:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

On Wednesday 08 April 2009 22:18:26 Peter Zijlstra wrote:
> On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> > On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > > patch which I just put out the pull request for.
> > > 
> > > Would it make sense to teach it about a short-circuit like:
> > > 
> > > work_on_cpu() {
> > > 
> > >   if (cpumask_weight(current->cpus_allowed) == 1 && 
> > >       smp_processor_id() == cpu)
> > >     return do_work_right_here();
> > 
> > Does that happen much?  I guess put a counter in and see?
> 
> Ego spotted the case where cpufreq calls it from an cpu-affine
> workqueue, it seems to me in that case its desirable to have the
> short-cut, and currently that's needed for correctness too as it will
> generate this circular lock thingy.

Well, the correctness issue is fixed by Andrew's work_on_cpu-via-new-thread
patch (hmm, which Linus hasn't taken, re-xmitting).

So it's really a pure optimization.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-08 23:45             ` Rusty Russell
@ 2009-04-09  4:17               ` Ingo Molnar
  2009-04-09 14:45                 ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-04-09  4:17 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Peter Zijlstra, Rafael J. Wysocki, Ming Lei, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Wednesday 08 April 2009 22:18:26 Peter Zijlstra wrote:
> > On Wed, 2009-04-08 at 12:47 +0930, Rusty Russell wrote:
> > > On Tuesday 07 April 2009 16:35:53 Peter Zijlstra wrote:
> > > > On Tue, 2009-04-07 at 13:56 +0930, Rusty Russell wrote:
> > > > > Looks like this will be fixed by Andrew's work-on-cpu-in-own-thread
> > > > > patch which I just put out the pull request for.
> > > > 
> > > > Would it make sense to teach it about a short-circuit like:
> > > > 
> > > > work_on_cpu() {
> > > > 
> > > >   if (cpumask_weight(current->cpus_allowed) == 1 && 
> > > >       smp_processor_id() == cpu)
> > > >     return do_work_right_here();
> > > 
> > > Does that happen much?  I guess put a counter in and see?
> > 
> > Ego spotted the case where cpufreq calls it from an cpu-affine
> > workqueue, it seems to me in that case its desirable to have the
> > short-cut, and currently that's needed for correctness too as it will
> > generate this circular lock thingy.
> 
> Well, the correctness issue is fixed by Andrew's 
> work_on_cpu-via-new-thread patch (hmm, which Linus hasn't taken, 
> re-xmitting).

That's now upstream as per:

 6b44003: work_on_cpu(): rewrite it to create a kernel thread on demand

So re-checking whether the warning still triggers with latest -git 
would be nice.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pm-hibernate : possible circular locking dependency detected
  2009-04-09  4:17               ` Ingo Molnar
@ 2009-04-09 14:45                 ` Ming Lei
  0 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2009-04-09 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Peter Zijlstra, Rafael J. Wysocki, Andrew Morton,
	Linux-pm mailing list, Linux Kernel List, Gautham R Shenoy

2009/4/9 Ingo Molnar <mingo@elte.hu>:
>
> That's now upstream as per:
>
>  6b44003: work_on_cpu(): rewrite it to create a kernel thread on demand
>
> So re-checking whether the warning still triggers with latest -git
> would be nice.

I don't find this warning  any longer  during pm-hibernate 2.6.30-rc1, which
have __not__ include 6b44003.

>
>        Ingo
>



-- 
Lei Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-04-09 14:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-05  9:53 pm-hibernate : possible circular locking dependency detected Ming Lei
2009-04-05 10:12 ` Rafael J. Wysocki
2009-04-05 13:44   ` Ingo Molnar
2009-04-06  0:55     ` Gautham R Shenoy
2009-04-06 13:29       ` Rafael J. Wysocki
2009-04-06 14:24         ` Gautham R Shenoy
2009-04-06 15:25           ` Rafael J. Wysocki
2009-04-06 14:37         ` [linux-pm] " Alan Stern
2009-04-06 15:20           ` Gautham R Shenoy
2009-04-06 18:42             ` Alan Stern
2009-04-06 19:58               ` Rafael J. Wysocki
2009-04-07  8:43                 ` Peter Zijlstra
2009-04-07  4:26     ` Rusty Russell
2009-04-07  7:05       ` Peter Zijlstra
2009-04-08  3:17         ` Rusty Russell
2009-04-08 12:26           ` Ingo Molnar
2009-04-08 12:48           ` Peter Zijlstra
2009-04-08 23:45             ` Rusty Russell
2009-04-09  4:17               ` Ingo Molnar
2009-04-09 14:45                 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox