public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
@ 2006-11-29 15:24 Gautham R Shenoy
  2006-11-29 21:05 ` Andrew Morton
  2006-11-30  8:31 ` Ingo Molnar
  0 siblings, 2 replies; 23+ messages in thread
From: Gautham R Shenoy @ 2006-11-29 15:24 UTC (permalink / raw)
  To: akpm, mingo; +Cc: linux-kernel, torvalds, davej, dipankar, vatsa

Hi all,

Looks like the lockdep has resumed yelling about the cpufreq-cpu hotplug 
interactions! Again, it's the Ondemand governor that's the culprit.

On linux-2.6.19-rc6-mm2, this is what I got yesterday evening.

[root@llm05 tests]# echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[root@llm05 tests]# echo 0 > /sys/devices/system/cpu/cpu1/online

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.19-rc6-mm2 #14
-------------------------------------------------------
bash/4601 is trying to acquire lock:
 (&policy->lock){--..}, at: [<c045793d>] mutex_lock+0x12/0x15

but task is already holding lock:
 (cache_chain_mutex){--..}, at: [<c045793d>] mutex_lock+0x12/0x15

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (cache_chain_mutex){--..}:
       [<c013bddc>] __lock_acquire+0x8ef/0x9f3
       [<c013c1ec>] lock_acquire+0x68/0x82
       [<c04577da>] __mutex_lock_slowpath+0xd3/0x224
       [<c045793d>] mutex_lock+0x12/0x15
       [<c01642f1>] cpuup_callback+0x1a3/0x2d6
       [<c045a952>] notifier_call_chain+0x2b/0x5b
       [<c012efb6>] __raw_notifier_call_chain+0x18/0x1d
       [<c012efd5>] raw_notifier_call_chain+0x1a/0x1c
       [<c013f798>] _cpu_down+0x56/0x1ef
       [<c013fa5c>] cpu_down+0x26/0x3a
       [<c029dde2>] store_online+0x27/0x5a
       [<c029adec>] sysdev_store+0x20/0x25
       [<c0197284>] sysfs_write_file+0xb6/0xde
       [<c01674d4>] vfs_write+0x90/0x144
       [<c0167c74>] sys_write+0x3d/0x61
       [<c0103d36>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

-> #2 (workqueue_mutex){--..}:
       [<c013bddc>] __lock_acquire+0x8ef/0x9f3
       [<c013c1ec>] lock_acquire+0x68/0x82
       [<c04577da>] __mutex_lock_slowpath+0xd3/0x224
       [<c045793d>] mutex_lock+0x12/0x15
       [<c0131e91>] __create_workqueue+0x5b/0x11c
       [<c03c4a9a>] cpufreq_governor_dbs+0xa0/0x2e8
       [<c03c2cde>] __cpufreq_governor+0x64/0xac
       [<c03c30d5>] __cpufreq_set_policy+0x187/0x20b
       [<c03c33a1>] store_scaling_governor+0x132/0x16a
       [<c03c2af9>] store+0x35/0x46
       [<c0197284>] sysfs_write_file+0xb6/0xde
       [<c01674d4>] vfs_write+0x90/0x144
       [<c0167c74>] sys_write+0x3d/0x61
       [<c0103d36>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

-> #1 (dbs_mutex){--..}:
       [<c013bddc>] __lock_acquire+0x8ef/0x9f3
       [<c013c1ec>] lock_acquire+0x68/0x82
       [<c04577da>] __mutex_lock_slowpath+0xd3/0x224
       [<c045793d>] mutex_lock+0x12/0x15
       [<c03c4a7e>] cpufreq_governor_dbs+0x84/0x2e8
       [<c03c2cde>] __cpufreq_governor+0x64/0xac
       [<c03c30d5>] __cpufreq_set_policy+0x187/0x20b
       [<c03c33a1>] store_scaling_governor+0x132/0x16a
       [<c03c2af9>] store+0x35/0x46
       [<c0197284>] sysfs_write_file+0xb6/0xde
       [<c01674d4>] vfs_write+0x90/0x144
       [<c0167c74>] sys_write+0x3d/0x61
       [<c0103d36>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

-> #0 (&policy->lock){--..}:
       [<c013bce0>] __lock_acquire+0x7f3/0x9f3
       [<c013c1ec>] lock_acquire+0x68/0x82
       [<c04577da>] __mutex_lock_slowpath+0xd3/0x224
       [<c045793d>] mutex_lock+0x12/0x15
       [<c03c2c1f>] cpufreq_driver_target+0x2b/0x51
       [<c03c38db>] cpufreq_cpu_callback+0x42/0x52
       [<c045a952>] notifier_call_chain+0x2b/0x5b
       [<c012efb6>] __raw_notifier_call_chain+0x18/0x1d
       [<c012efd5>] raw_notifier_call_chain+0x1a/0x1c
       [<c013f798>] _cpu_down+0x56/0x1ef
       [<c013fa5c>] cpu_down+0x26/0x3a
       [<c029dde2>] store_online+0x27/0x5a
       [<c029adec>] sysdev_store+0x20/0x25
       [<c0197284>] sysfs_write_file+0xb6/0xde
       [<c01674d4>] vfs_write+0x90/0x144
       [<c0167c74>] sys_write+0x3d/0x61
       [<c0103d36>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

4 locks held by bash/4601:
 #0:  (cpu_add_remove_lock){--..}, at: [<c045793d>] mutex_lock+0x12/0x15
 #1:  (sched_hotcpu_mutex){--..}, at: [<c045793d>] mutex_lock+0x12/0x15
 #2:  (workqueue_mutex){--..}, at: [<c045793d>] mutex_lock+0x12/0x15
 #3:  (cache_chain_mutex){--..}, at: [<c045793d>] mutex_lock+0x12/0x15

stack backtrace:
 [<c0104c8f>] show_trace_log_lvl+0x19/0x2e
 [<c0104d8f>] show_trace+0x12/0x14
 [<c0104da5>] dump_stack+0x14/0x16
 [<c013a157>] print_circular_bug_tail+0x7c/0x85
 [<c013bce0>] __lock_acquire+0x7f3/0x9f3
 [<c013c1ec>] lock_acquire+0x68/0x82
 [<c04577da>] __mutex_lock_slowpath+0xd3/0x224
 [<c045793d>] mutex_lock+0x12/0x15
 [<c03c2c1f>] cpufreq_driver_target+0x2b/0x51
 [<c03c38db>] cpufreq_cpu_callback+0x42/0x52
 [<c045a952>] notifier_call_chain+0x2b/0x5b
 [<c012efb6>] __raw_notifier_call_chain+0x18/0x1d
 [<c012efd5>] raw_notifier_call_chain+0x1a/0x1c
 [<c013f798>] _cpu_down+0x56/0x1ef
 [<c013fa5c>] cpu_down+0x26/0x3a
 [<c029dde2>] store_online+0x27/0x5a
 [<c029adec>] sysdev_store+0x20/0x25
 [<c0197284>] sysfs_write_file+0xb6/0xde
 [<c01674d4>] vfs_write+0x90/0x144
 [<c0167c74>] sys_write+0x3d/0x61
 [<c0103d36>] sysenter_past_esp+0x5f/0x99
 =======================
Breaking affinity for irq 24
CPU 1 is now offline

Ok, so to cut the long story short, 
- While changing governor from anything to
ondemand, locks are taken in the following order

policy->lock ===> dbs_mutex ===> workqueue_mutex.

- While offlining a cpu, locks are taken in the following order

cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex ==
==> cache_chain_mutex ==> policy->lock.

The dependency graph built out of this info has a circular dependency
as pointed out by lockdep. However, I am not quite sure how seriously this
circular dependency warning should be taken.

One way to avoid these warnings is to take the policy->lock before the
rest of the locks, while offlining the cpu. 
For a moment I even thought of taking/releasing policy->lock under
CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE events in cpufreq_cpu_callback. 
But that's a bad idea since 'policy' is percpu data in the first place
and hence needs to be cpu-hotplug aware.

These circular-dependency warnings are emitted in 
2.6.19-rc6-mm1 as well as (2.6.19-rc6 + hotplug_locking_rework patches)

So do we
- Rethink the strategy of per-subsystem hotcpu-locks ?

  OR
  
- Think of a way to straighten out the super-convoluted cpufreq code ?

At the moment, I would suggest the latter :-)

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
@ 2006-12-06 18:27 Pallipadi, Venkatesh
  2006-12-07  7:06 ` Gautham R Shenoy
  0 siblings, 1 reply; 23+ messages in thread
From: Pallipadi, Venkatesh @ 2006-12-06 18:27 UTC (permalink / raw)
  To: ego, Ingo Molnar; +Cc: akpm, linux-kernel, torvalds, davej, dipankar, vatsa



>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>Gautham R Shenoy
>Sent: Thursday, November 30, 2006 3:44 AM
>To: Ingo Molnar
>Cc: Gautham R Shenoy; akpm@osdl.org; 
>linux-kernel@vger.kernel.org; torvalds@osdl.org; 
>davej@redhat.com; dipankar@in.ibm.com; vatsa@in.ibm.com
>Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>
>On Thu, Nov 30, 2006 at 12:03:15PM +0100, Ingo Molnar wrote:
>> 
>> * Gautham R Shenoy <ego@in.ibm.com> wrote:
>> 
>> > a) cpufreq maintain's it's own cpumask in the variable
>> > policy->affected_cpus and says : If a frequency change is issued to
>> > any one of the cpu's in the affected_cpus mask, you change 
>frequency
>> > on all cpus in the mask. So this needs to be consistent with
>> > cpu_online map and hence cpu hotplug aware. Furthermore, 
>we don't want
>> > cpus in this mask to go down when we are trying to change 
>frequencies
>> > on them. The function which drives the frequency change in
>> > cpufreq-core is cpufreq_driver_target and it needs cpu-hotplug
>> > protection.
>> 
>> couldnt this complexity be radically simplified by having new kernel
>> infrastructure that does something like:
>> 
>>   " 'gather' all CPUs mentioned in <mask> via scheduling a separate
>>     helper-kthread on every CPU that <mask> specifies, disable all
>>    interrupts, and execute function <fn> once all CPUs have been
>>    'gathered' - and release all CPUs once <fn> has executed 
>on each of
>>    them."
>> 
>> ?
>
>This is what is currently being done by cpufreq:
>
>a) get_some_cpu_hotplug_protection() [use either some global mechanism 
>					or a persubsystem mutex]
>
>b) actual_freq_change_driver_function(mask) 
>/* You can check out cpufreq_p4_target() in
> * arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
> */
>  
>   {
>	for_each_cpu_mask(i, mask) {
>		cpumask_t this_cpu = cpumask_of_cpu(i);
>         	set_cpus_allowed(current, this_cpu);
>		function_to_change_frequency();
>							
>	}
>  }
>

As there are many options being discussed here, let me propose one 
more option that can eliminate the need for hotplug lock in 
cpufreq_driver_target() path.

As Gautham clearly explained before, today we have cpufreq calling
cpufreq_driver_target() in each driver to change the CPU frequency
And the driver internally uses set_cpus_allowed to reschedule onto
different affected_cpus and change the frequency. That is the main
reason why we need to disable hotplug in this path.

But, if we make cpufreq more affected_cpus aware and have a per_cpu
target()
call by moving set_cpus_allowed() from driver into cpufreq core and
define
the target function to be atomic/non-sleeping type, then we really don't
need a hotplug lock for the driver any more. Driver can have
get_cpu/put_cpu
pair to disable preemption and then change the frequency.

This means a lot of changes as we need new interface changes to cpufreq
and
rewrite of bunch of drivers. But, this looks to me as the least
complicated solution.

Thanks,
Venki 

^ permalink raw reply	[flat|nested] 23+ messages in thread
* RE: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
@ 2006-12-07 12:50 Pallipadi, Venkatesh
  0 siblings, 0 replies; 23+ messages in thread
From: Pallipadi, Venkatesh @ 2006-12-07 12:50 UTC (permalink / raw)
  To: ego; +Cc: Ingo Molnar, akpm, linux-kernel, torvalds, davej, dipankar, vatsa

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of 
>Gautham R Shenoy
>Sent: Wednesday, December 06, 2006 11:07 PM
>To: Pallipadi, Venkatesh
>Cc: ego@in.ibm.com; Ingo Molnar; akpm@osdl.org; 
>linux-kernel@vger.kernel.org; torvalds@osdl.org; 
>davej@redhat.com; dipankar@in.ibm.com; vatsa@in.ibm.com
>Subject: Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
>
>Hi Venki,
>On Wed, Dec 06, 2006 at 10:27:01AM -0800, Pallipadi, Venkatesh wrote:
>
>> But, if we make cpufreq more affected_cpus aware and have a per_cpu
>> target()
>> call by moving set_cpus_allowed() from driver into cpufreq core and
>> define
>> the target function to be atomic/non-sleeping type, then we 
>really don't
>> need a hotplug lock for the driver any more. Driver can have
>> get_cpu/put_cpu
>> pair to disable preemption and then change the frequency.
>
>Well, we would still need to keep the affected_cpus map in 
>sync with the
>cpu_online map. That would still require hotplug protection, right?

Not really. Cpufreq can look at all the CPUs in affected_cpus and call
new_target() only for CPUs that are online. Or it can call for every CPU
and the actual implementation in driver can do something like

set_cpus_allowed(requested_processor_mask)
If (get_cpu() != requested_cpu) {
	/* CPU offline and nothing can be done */
	put_cpu();
	return 0;
}

This is what I did for new cpufreq interface I added for getavg().
It was easy to ensure the atomic driver call as only one driver is
using it today :-)


>Besides, I would love to see a way of implementing target 
>function to be
>atomic/non-sleeping type. But as of now, the target functions call
>cpufreq_notify_transition which might sleep.
>

That is the reason I don't have a patch for this now.. :-) There are
more
than 10 or so drivers that need to change for new interface. I haven't
checked
whether it is possible to do this without sleeping in all those drivers.


>That's not the last of my worries. The ondemand-workqueue interaction
>in the cpu_hotplug callback path can cause a deadlock if we go for
>per-subsystem hotcpu mutexes. Can you think of a way by which we can 
>avoid destroying the kondemand workqueue from the cpu-hotplug callback
>path ? Please see http://lkml.org/lkml/2006/11/30/9 for the
>culprit-callpath.
>

Yes. I was thinking about it. One possible solution is to move 
create_workqueue()/destroy_workqueue() as in attached patch.

Thanks,
Venki

Not fully tested at the moment.

Remove callbacks using workqueue callbacks in governor start and stop.
And move those call back to module init and exit time.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct
 {
 	dbs_info->enable = 0;
 	cancel_delayed_work(&dbs_info->work);
-	flush_workqueue(kondemand_wq);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c
 
 		mutex_lock(&dbs_mutex);
 		dbs_enable++;
-		if (dbs_enable == 1) {
-			kondemand_wq = create_workqueue("kondemand");
-			if (!kondemand_wq) {
-				printk(KERN_ERR
-					 "Creation of kondemand
failed\n");
-				dbs_enable--;
-				mutex_unlock(&dbs_mutex);
-				return -ENOSPC;
-			}
-		}
 
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
 		if (rc) {
@@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c
 		dbs_timer_exit(this_dbs_info);
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
 		dbs_enable--;
-		if (dbs_enable == 0)
-			destroy_workqueue(kondemand_wq);
 
 		mutex_unlock(&dbs_mutex);
 
@@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g
 
 static int __init cpufreq_gov_dbs_init(void)
 {
+	kondemand_wq = create_workqueue("kondemand");
+	if (!kondemand_wq) {
+		printk(KERN_ERR "Creation of kondemand failed\n");
+		return -EFAULT;
+	}
 	return cpufreq_register_governor(&cpufreq_gov_dbs);
 }
 
 static void __exit cpufreq_gov_dbs_exit(void)
 {
 	cpufreq_unregister_governor(&cpufreq_gov_dbs);
+	destroy_workqueue(kondemand_wq);
 }

[-- Attachment #2: ondemand_recursive_locking_fix.patch --]
[-- Type: application/octet-stream, Size: 1828 bytes --]


Remove callbacks using workqueue callbacks in governor start and stop.
And move those call back to module init and exit time.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
@@ -514,7 +514,6 @@ static inline void dbs_timer_exit(struct
 {
 	dbs_info->enable = 0;
 	cancel_delayed_work(&dbs_info->work);
-	flush_workqueue(kondemand_wq);
 }
 
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -543,16 +542,6 @@ static int cpufreq_governor_dbs(struct c
 
 		mutex_lock(&dbs_mutex);
 		dbs_enable++;
-		if (dbs_enable == 1) {
-			kondemand_wq = create_workqueue("kondemand");
-			if (!kondemand_wq) {
-				printk(KERN_ERR
-					 "Creation of kondemand failed\n");
-				dbs_enable--;
-				mutex_unlock(&dbs_mutex);
-				return -ENOSPC;
-			}
-		}
 
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
 		if (rc) {
@@ -601,8 +590,6 @@ static int cpufreq_governor_dbs(struct c
 		dbs_timer_exit(this_dbs_info);
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
 		dbs_enable--;
-		if (dbs_enable == 0)
-			destroy_workqueue(kondemand_wq);
 
 		mutex_unlock(&dbs_mutex);
 
@@ -632,12 +619,18 @@ static struct cpufreq_governor cpufreq_g
 
 static int __init cpufreq_gov_dbs_init(void)
 {
+	kondemand_wq = create_workqueue("kondemand");
+	if (!kondemand_wq) {
+		printk(KERN_ERR "Creation of kondemand failed\n");
+		return -EFAULT;
+	}
 	return cpufreq_register_governor(&cpufreq_gov_dbs);
 }
 
 static void __exit cpufreq_gov_dbs_exit(void)
 {
 	cpufreq_unregister_governor(&cpufreq_gov_dbs);
+	destroy_workqueue(kondemand_wq);
 }
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-12-07 12:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-29 15:24 CPUFREQ-CPUHOTPLUG: Possible circular locking dependency Gautham R Shenoy
2006-11-29 21:05 ` Andrew Morton
2006-11-30  4:28   ` Gautham R Shenoy
2006-11-30  6:35     ` Gautham R Shenoy
2006-11-30  8:29       ` Ingo Molnar
2006-11-30  8:52         ` Gautham R Shenoy
2006-12-01  1:43         ` Gautham R Shenoy
2006-12-01  8:55           ` Ingo Molnar
2006-11-30  8:31 ` Ingo Molnar
2006-11-30 10:24   ` Gautham R Shenoy
2006-11-30 11:03     ` Ingo Molnar
2006-11-30 11:19       ` Andrew Morton
2006-11-30 11:46         ` Ingo Molnar
2006-11-30 12:44           ` Gautham R Shenoy
2006-11-30 14:35             ` Ingo Molnar
2006-11-30 19:40           ` Andrew Morton
2006-11-30 20:24             ` Ingo Molnar
2006-11-30 11:43       ` Gautham R Shenoy
2006-11-30 11:53         ` Ingo Molnar
2006-11-30 12:19           ` Gautham R Shenoy
  -- strict thread matches above, loose matches on Subject: below --
2006-12-06 18:27 Pallipadi, Venkatesh
2006-12-07  7:06 ` Gautham R Shenoy
2006-12-07 12:50 Pallipadi, Venkatesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox