kernel-testers.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend)
@ 2009-07-03 15:25 Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 1/4] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) Mathieu Desnoyers
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 15:25 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Venkatesh Pallipadi,
	Dave Jones, Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA,
	kernel-te

(sorry, quilt mail problems in the previous post, here is the resend)

Hi,

Here is a patchset applying on 2.6.30 which should fix the cpufreq locking
dependency. Sadly, my two main test machines does not seem to support cpufreq,
so I have only been able to perform very light testing. (I can't afford to lock
down my laptop right now).

I went for the most straightforward fix I could think of and kept the current
locking structure. As you will see, the second patch cleans up the error
handling paths of cpufreq add dev, which were a total mess.

As a general recommendation, creating scripts which does, concurrently :

cpu hotplug up/down
cpufreq sysfs actions
change between cpufreq governors
add governors on random CPU numbers

will likely help stress-testing cpufreq locking. I really did not like what I've
seen there. It looked _very_ fragile.

Hopefully this will be better with this patches, but again, testing is welcome.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2.6.30 1/4] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)
  2009-07-03 15:25 [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend) Mathieu Desnoyers
@ 2009-07-03 15:25 ` Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess Mathieu Desnoyers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 15:25 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Venkatesh Pallipadi,
	Dave Jones, Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA,
	kernel-te
  Cc: Mathieu Desnoyers, Shaohua Li, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ

[-- Attachment #1: cpufreq-remove-rwsem-lock-from-CPUFREQ_GOV_STOP-call-missing-modification.patch --]
[-- Type: text/plain, Size: 2867 bytes --]

commit	42a06f2166f2f6f7bf04f32b4e823eacdceafdc9

Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the
teardown. To make a long story short, the rwlock write-lock causes a circular
dependency with cancel_delayed_work_sync(), because the timer handler takes the
read lock.

Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs
callers (writers) hold the write rwsem at the earliest sysfs calling stage.

However, the rwlock write-lock is not needed upon governor stop.

Change :
- Added comment from Venkyi at lock definition site.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: rjw-KKrjLPT3xs0@public.gmane.org
CC: mingo-X9Un+BFzKDI@public.gmane.org
CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
CC: trenn-l3A5Bk7waGM@public.gmane.org
CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 drivers/cpufreq/cpufreq.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	2009-06-08 12:47:22.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-06-08 12:48:38.000000000 -0400
@@ -61,6 +61,8 @@ static DEFINE_SPINLOCK(cpufreq_driver_lo
  *   are concerned with are online after they get the lock.
  * - Governor routines that can be called in cpufreq hotplug path should not
  *   take this sem as top level hotplug notifier handler takes this.
+ * - Lock should not be held across
+ *     __cpufreq_governor(data, CPUFREQ_GOV_STOP);
  */
 static DEFINE_PER_CPU(int, policy_cpu);
 static DEFINE_PER_CPU(struct rw_semaphore, cpu_policy_rwsem);
@@ -1697,8 +1699,17 @@ static int __cpufreq_set_policy(struct c
 			dprintk("governor switch\n");
 
 			/* end old governor */
-			if (data->governor)
+			if (data->governor) {
+				/*
+				 * Need to release the rwsem around governor
+				 * stop due to lock dependency between
+				 * cancel_delayed_work_sync and the read lock
+				 * taken in the delayed work handler.
+				 */
+				unlock_policy_rwsem_write(data->cpu);
 				__cpufreq_governor(data, CPUFREQ_GOV_STOP);
+				lock_policy_rwsem_write(data->cpu);
+			}
 
 			/* start new governor */
 			data->governor = policy->governor;

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
  2009-07-03 15:25 [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend) Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 1/4] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) Mathieu Desnoyers
@ 2009-07-03 15:25 ` Mathieu Desnoyers
       [not found]   ` <20090703152714.775719309-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
  2009-07-03 15:25 ` [patch 2.6.30 3/4] cpufreq add gov mutex Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 4/4] cpufreq ondemand and conservative remove dbs_mutex Mathieu Desnoyers
  3 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 15:25 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Venkatesh Pallipadi,
	Dave Jones, Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA,
	kernel-te
  Cc: Mathieu Desnoyers, Shaohua Li, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ

[-- Attachment #1: cpufreq-cleanup-cpufreq_add_dev.patch --]
[-- Type: text/plain, Size: 6458 bytes --]

OK, I've tried to clean it up the best I could, but please test this with
concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make
other interesting findings.

This is step one of fixing the overall locking dependency mess in cpufreq.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: rjw-KKrjLPT3xs0@public.gmane.org
CC: mingo-X9Un+BFzKDI@public.gmane.org
CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
---
 drivers/cpufreq/cpufreq.c |   65 ++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 25 deletions(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	2009-07-02 23:59:08.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-02 23:59:09.000000000 -0400
@@ -763,6 +763,10 @@ static struct kobj_type ktype_cpufreq = 
  * cpufreq_add_dev - add a CPU device
  *
  * Adds the cpufreq interface for a CPU device.
+ *
+ * The Oracle says: try running cpufreq registration/unregistration concurrently
+ * with with cpu hotplugging and all hell will break loose. Tried to clean this
+ * mess up, but more thorough testing is needed. - Mathieu
  */
 static int cpufreq_add_dev(struct sys_device *sys_dev)
 {
@@ -806,15 +810,12 @@ static int cpufreq_add_dev(struct sys_de
 		goto nomem_out;
 	}
 	if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL)) {
-		kfree(policy);
 		ret = -ENOMEM;
-		goto nomem_out;
+		goto err_free_policy;
 	}
 	if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) {
-		free_cpumask_var(policy->cpus);
-		kfree(policy);
 		ret = -ENOMEM;
-		goto nomem_out;
+		goto err_free_cpumask;
 	}
 
 	policy->cpu = cpu;
@@ -822,7 +823,8 @@ static int cpufreq_add_dev(struct sys_de
 
 	/* Initially set CPU itself as the policy_cpu */
 	per_cpu(policy_cpu, cpu) = cpu;
-	lock_policy_rwsem_write(cpu);
+	ret = (lock_policy_rwsem_write(cpu) < 0);
+	WARN_ON(ret);
 
 	init_completion(&policy->kobj_unregister);
 	INIT_WORK(&policy->update, handle_update);
@@ -835,7 +837,7 @@ static int cpufreq_add_dev(struct sys_de
 	ret = cpufreq_driver->init(policy);
 	if (ret) {
 		dprintk("initialization failed\n");
-		goto err_out;
+		goto err_unlock_policy;
 	}
 	policy->user_policy.min = policy->min;
 	policy->user_policy.max = policy->max;
@@ -860,15 +862,21 @@ static int cpufreq_add_dev(struct sys_de
 		/* Check for existing affected CPUs.
 		 * They may not be aware of it due to CPU Hotplug.
 		 */
-		managed_policy = cpufreq_cpu_get(j);		/* FIXME: Where is this released?  What about error paths? */
+		managed_policy = cpufreq_cpu_get(j);
 		if (unlikely(managed_policy)) {
 
 			/* Set proper policy_cpu */
 			unlock_policy_rwsem_write(cpu);
 			per_cpu(policy_cpu, cpu) = managed_policy->cpu;
 
-			if (lock_policy_rwsem_write(cpu) < 0)
-				goto err_out_driver_exit;
+			if (lock_policy_rwsem_write(cpu) < 0) {
+				/* Should not go through policy unlock path */
+				if (cpufreq_driver->exit)
+					cpufreq_driver->exit(policy);
+				ret = -EBUSY;
+				cpufreq_cpu_put(managed_policy);
+				goto err_free_cpumask;
+			}
 
 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
 			cpumask_copy(managed_policy->cpus, policy->cpus);
@@ -879,12 +887,14 @@ static int cpufreq_add_dev(struct sys_de
 			ret = sysfs_create_link(&sys_dev->kobj,
 						&managed_policy->kobj,
 						"cpufreq");
-			if (ret)
-				goto err_out_driver_exit;
-
-			cpufreq_debug_enable_ratelimit();
-			ret = 0;
-			goto err_out_driver_exit; /* call driver->exit() */
+			if (!ret)
+				cpufreq_cpu_put(managed_policy);
+			/*
+			 * Success. We only needed to be added to the mask.
+			 * Call driver->exit() because only the cpu parent of
+			 * the kobj needed to call init().
+			 */
+			goto out_driver_exit; /* call driver->exit() */
 		}
 	}
 #endif
@@ -894,25 +904,25 @@ static int cpufreq_add_dev(struct sys_de
 	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &sys_dev->kobj,
 				   "cpufreq");
 	if (ret)
-		goto err_out_driver_exit;
+		goto out_driver_exit;
 
 	/* set up files for this cpu device */
 	drv_attr = cpufreq_driver->attr;
 	while ((drv_attr) && (*drv_attr)) {
 		ret = sysfs_create_file(&policy->kobj, &((*drv_attr)->attr));
 		if (ret)
-			goto err_out_driver_exit;
+			goto err_out_kobj_put;
 		drv_attr++;
 	}
 	if (cpufreq_driver->get) {
 		ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr);
 		if (ret)
-			goto err_out_driver_exit;
+			goto err_out_kobj_put;
 	}
 	if (cpufreq_driver->target) {
 		ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
 		if (ret)
-			goto err_out_driver_exit;
+			goto err_out_kobj_put;
 	}
 
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
@@ -930,12 +940,14 @@ static int cpufreq_add_dev(struct sys_de
 			continue;
 
 		dprintk("CPU %u already managed, adding link\n", j);
-		cpufreq_cpu_get(cpu);
+		managed_policy = cpufreq_cpu_get(cpu);
 		cpu_sys_dev = get_cpu_sysdev(j);
 		ret = sysfs_create_link(&cpu_sys_dev->kobj, &policy->kobj,
 					"cpufreq");
-		if (ret)
+		if (ret) {
+			cpufreq_cpu_put(managed_policy);
 			goto err_out_unregister;
+		}
 	}
 
 	policy->governor = NULL; /* to assure that the starting sequence is
@@ -967,17 +979,20 @@ err_out_unregister:
 		per_cpu(cpufreq_cpu_data, j) = NULL;
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+err_out_kobj_put:
 	kobject_put(&policy->kobj);
 	wait_for_completion(&policy->kobj_unregister);
 
-err_out_driver_exit:
+out_driver_exit:
 	if (cpufreq_driver->exit)
 		cpufreq_driver->exit(policy);
 
-err_out:
+err_unlock_policy:
 	unlock_policy_rwsem_write(cpu);
+err_free_cpumask:
+	free_cpumask_var(policy->cpus);
+err_free_policy:
 	kfree(policy);
-
 nomem_out:
 	module_put(cpufreq_driver->owner);
 module_out:

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2.6.30 3/4] cpufreq add gov mutex
  2009-07-03 15:25 [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend) Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 1/4] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) Mathieu Desnoyers
  2009-07-03 15:25 ` [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess Mathieu Desnoyers
@ 2009-07-03 15:25 ` Mathieu Desnoyers
       [not found]   ` <20090703152714.953585501-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
  2009-07-03 15:25 ` [patch 2.6.30 4/4] cpufreq ondemand and conservative remove dbs_mutex Mathieu Desnoyers
  3 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 15:25 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Venkatesh Pallipadi,
	Dave Jones, Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA,
	kernel-te
  Cc: Mathieu Desnoyers, Shaohua Li, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ

[-- Attachment #1: cpufreq-add-gov-mutex.patch --]
[-- Type: text/plain, Size: 6075 bytes --]

Using the cpufreq_gov_mutex to protect internal governor data structures. The
policy rwsem write lock nests inside this mutex.  The policy rwsem is taken in
the timer handler, and therefore cannot be held while doing a sync teardown of
the timer.  This cpufreq_gov_mutex lock protects init/teardown of governor
timers. This is why this lock, although it protects a data structure located
within the governors, is located in cpufreq.c.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: rjw-KKrjLPT3xs0@public.gmane.org
CC: mingo-X9Un+BFzKDI@public.gmane.org
CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
---
 drivers/cpufreq/cpufreq.c |   38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	2009-07-03 09:48:35.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-03 10:12:44.000000000 -0400
@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no
 static LIST_HEAD(cpufreq_governor_list);
 static DEFINE_MUTEX(cpufreq_governor_mutex);
 
+/*
+ * Using the cpufreq_gov_mutex to protect internal governor
+ * data structures. The policy rwsem write lock nests inside this mutex.
+ * The policy rwsem is taken in the timer handler, and therefore cannot be
+ * held while doing a sync teardown of the timer.
+ * This cpufreq_gov_mutex lock protects init/teardown of governor timers.
+ * This is why this lock, although it protects a data structure located within
+ * the governors, is here.
+ */
+static DEFINE_MUTEX(cpufreq_gov_mutex);
+
 struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
 {
 	struct cpufreq_policy *data;
@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob
 	if (!policy)
 		goto no_policy;
 
+	mutex_lock(&cpufreq_gov_mutex);
 	if (lock_policy_rwsem_write(policy->cpu) < 0)
 		goto fail;
 
@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob
 
 	unlock_policy_rwsem_write(policy->cpu);
 fail:
+	mutex_unlock(&cpufreq_gov_mutex);
 	cpufreq_cpu_put(policy);
 no_policy:
 	return ret;
@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de
 
 	/* Initially set CPU itself as the policy_cpu */
 	per_cpu(policy_cpu, cpu) = cpu;
+	mutex_lock(&cpufreq_gov_mutex);
 	ret = (lock_policy_rwsem_write(cpu) < 0);
 	WARN_ON(ret);
 
@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de
 					cpufreq_driver->exit(policy);
 				ret = -EBUSY;
 				cpufreq_cpu_put(managed_policy);
-				goto err_free_cpumask;
+				goto err_unlock_gov_mutex;
 			}
 
 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de
 	}
 
 	unlock_policy_rwsem_write(cpu);
+	mutex_unlock(&cpufreq_gov_mutex);
 
 	kobject_uevent(&policy->kobj, KOBJ_ADD);
 	module_put(cpufreq_driver->owner);
@@ -989,6 +1004,8 @@ out_driver_exit:
 
 err_unlock_policy:
 	unlock_policy_rwsem_write(cpu);
+err_unlock_gov_mutex:
+	mutex_unlock(&cpufreq_gov_mutex);
 err_free_cpumask:
 	free_cpumask_var(policy->cpus);
 err_free_policy:
@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s
 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 		cpufreq_debug_enable_ratelimit();
 		unlock_policy_rwsem_write(cpu);
+		mutex_unlock(&cpufreq_gov_mutex);
 		return -EINVAL;
 	}
 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s
 		cpufreq_cpu_put(data);
 		cpufreq_debug_enable_ratelimit();
 		unlock_policy_rwsem_write(cpu);
+		mutex_unlock(&cpufreq_gov_mutex);
 		return 0;
 	}
 #endif
@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s
 #endif
 
 	unlock_policy_rwsem_write(cpu);
-
+	/*
+	 * Calling only with the cpufreq_gov_mutex held because
+	 * sync timer teardown has locking dependency with policy rwsem.
+	 */
 	if (cpufreq_driver->target)
 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
+	mutex_unlock(&cpufreq_gov_mutex);
 
 	kobject_put(&data->kobj);
 
@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys
 	if (cpu_is_offline(cpu))
 		return 0;
 
+	mutex_lock(&cpufreq_gov_mutex);
 	if (unlikely(lock_policy_rwsem_write(cpu)))
 		BUG();
 
@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq
 	if (!policy)
 		goto no_policy;
 
-	if (unlikely(lock_policy_rwsem_write(policy->cpu)))
+	mutex_lock(&cpufreq_gov_mutex);
+	if (unlikely(lock_policy_rwsem_write(policy->cpu))) {
+		mutex_unlock(&cpufreq_gov_mutex);
 		goto fail;
+	}
 
 	ret = __cpufreq_driver_target(policy, target_freq, relation);
 
 	unlock_policy_rwsem_write(policy->cpu);
+	mutex_unlock(&cpufreq_gov_mutex);
 
 fail:
 	cpufreq_cpu_put(policy);
@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c
 		goto no_policy;
 	}
 
+	mutex_lock(&cpufreq_gov_mutex);
 	if (unlikely(lock_policy_rwsem_write(cpu))) {
+		mutex_unlock(&cpufreq_gov_mutex);
 		ret = -EINVAL;
 		goto fail;
 	}
@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c
 	ret = __cpufreq_set_policy(data, &policy);
 
 	unlock_policy_rwsem_write(cpu);
+	mutex_unlock(&cpufreq_gov_mutex);
 
 fail:
 	cpufreq_cpu_put(data);
@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac
 			break;
 		case CPU_DOWN_PREPARE:
 		case CPU_DOWN_PREPARE_FROZEN:
+			mutex_lock(&cpufreq_gov_mutex);
 			if (unlikely(lock_policy_rwsem_write(cpu)))
 				BUG();
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2.6.30 4/4] cpufreq ondemand and conservative remove dbs_mutex
  2009-07-03 15:25 [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend) Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2009-07-03 15:25 ` [patch 2.6.30 3/4] cpufreq add gov mutex Mathieu Desnoyers
@ 2009-07-03 15:25 ` Mathieu Desnoyers
  3 siblings, 0 replies; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 15:25 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Venkatesh Pallipadi,
	Dave Jones, Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA,
	kernel-te
  Cc: Mathieu Desnoyers, Shaohua Li, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ

[-- Attachment #1: cpufreq-ondemand-and-conservative-remove-dbs-mutex.patch --]
[-- Type: text/plain, Size: 10840 bytes --]

The dbs_mutex is not needed given serialization is provided by the
cpufreq_gov_mutex wrapping all update calls in cpufreq.c.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: rjw-KKrjLPT3xs0@public.gmane.org
CC: mingo-X9Un+BFzKDI@public.gmane.org
CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
---
 drivers/cpufreq/cpufreq_conservative.c |   58 ++++++---------------------------
 drivers/cpufreq/cpufreq_ondemand.c     |   50 +++++++---------------------
 2 files changed, 25 insertions(+), 83 deletions(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq_ondemand.c	2009-07-03 10:29:56.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq_ondemand.c	2009-07-03 10:30:24.000000000 -0400
@@ -87,23 +87,12 @@ struct cpu_dbs_info_s {
 	unsigned int enable:1,
 		sample_type:1;
 };
+
+/* Protected by cpufreq_gov_mutex in cpufreq.c */
 static DEFINE_PER_CPU(struct cpu_dbs_info_s, cpu_dbs_info);
 
 static unsigned int dbs_enable;	/* number of CPUs using this policy */
 
-/*
- * DEADLOCK ALERT! There is a ordering requirement between cpu_hotplug
- * lock and dbs_mutex. cpu_hotplug lock should always be held before
- * dbs_mutex. If any function that can potentially take cpu_hotplug lock
- * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
- * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
- * is recursive for the same process. -Venki
- * DEADLOCK ALERT! (2) : do_dbs_timer() must not take the dbs_mutex, because it
- * would deadlock with cancel_delayed_work_sync(), which is needed for proper
- * raceless workqueue teardown.
- */
-static DEFINE_MUTEX(dbs_mutex);
-
 static struct workqueue_struct	*kondemand_wq;
 
 static struct dbs_tuners {
@@ -269,13 +258,10 @@ static ssize_t store_sampling_rate(struc
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
-	mutex_lock(&dbs_mutex);
-	if (ret != 1) {
-		mutex_unlock(&dbs_mutex);
+	if (ret != 1)
 		return -EINVAL;
-	}
+
 	dbs_tuners_ins.sampling_rate = max(input, minimum_sampling_rate());
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -287,15 +273,11 @@ static ssize_t store_up_threshold(struct
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
-	mutex_lock(&dbs_mutex);
 	if (ret != 1 || input > MAX_FREQUENCY_UP_THRESHOLD ||
-			input < MIN_FREQUENCY_UP_THRESHOLD) {
-		mutex_unlock(&dbs_mutex);
+			input < MIN_FREQUENCY_UP_THRESHOLD)
 		return -EINVAL;
-	}
 
 	dbs_tuners_ins.up_threshold = input;
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -315,11 +297,8 @@ static ssize_t store_ignore_nice_load(st
 	if (input > 1)
 		input = 1;
 
-	mutex_lock(&dbs_mutex);
-	if (input == dbs_tuners_ins.ignore_nice) { /* nothing to do */
-		mutex_unlock(&dbs_mutex);
+	if (input == dbs_tuners_ins.ignore_nice) /* nothing to do */
 		return count;
-	}
 	dbs_tuners_ins.ignore_nice = input;
 
 	/* we need to re-evaluate prev_cpu_idle */
@@ -332,7 +311,6 @@ static ssize_t store_ignore_nice_load(st
 			dbs_info->prev_cpu_nice = kstat_cpu(j).cpustat.nice;
 
 	}
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -350,10 +328,8 @@ static ssize_t store_powersave_bias(stru
 	if (input > 1000)
 		input = 1000;
 
-	mutex_lock(&dbs_mutex);
 	dbs_tuners_ins.powersave_bias = input;
 	ondemand_powersave_bias_init();
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -568,6 +544,12 @@ static inline void dbs_timer_exit(struct
 	cancel_delayed_work_sync(&dbs_info->work);
 }
 
+/*
+ * Always called with cpufreq_gov_mutex held.
+ * Called with lock_policy_rwsem write lock held except for GOV_STOP.
+ * Called with only cpufreq_gov_mutex held for GOV_STOP for timer sync
+ * teardown.
+ */
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 				   unsigned int event)
 {
@@ -586,13 +568,11 @@ static int cpufreq_governor_dbs(struct c
 		if (this_dbs_info->enable) /* Already enabled */
 			break;
 
-		mutex_lock(&dbs_mutex);
 		dbs_enable++;
 
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
 		if (rc) {
 			dbs_enable--;
-			mutex_unlock(&dbs_mutex);
 			return rc;
 		}
 
@@ -628,27 +608,23 @@ static int cpufreq_governor_dbs(struct c
 		}
 		dbs_timer_init(this_dbs_info);
 
-		mutex_unlock(&dbs_mutex);
 		break;
 
 	case CPUFREQ_GOV_STOP:
-		mutex_lock(&dbs_mutex);
 		dbs_timer_exit(this_dbs_info);
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
 		dbs_enable--;
-		mutex_unlock(&dbs_mutex);
 
 		break;
 
 	case CPUFREQ_GOV_LIMITS:
-		mutex_lock(&dbs_mutex);
 		if (policy->max < this_dbs_info->cur_policy->cur)
 			__cpufreq_driver_target(this_dbs_info->cur_policy,
 				policy->max, CPUFREQ_RELATION_H);
 		else if (policy->min > this_dbs_info->cur_policy->cur)
 			__cpufreq_driver_target(this_dbs_info->cur_policy,
 				policy->min, CPUFREQ_RELATION_L);
-		mutex_unlock(&dbs_mutex);
+
 		break;
 	}
 	return 0;
Index: linux-2.6-lttng/drivers/cpufreq/cpufreq_conservative.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq_conservative.c	2009-07-03 10:29:56.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq_conservative.c	2009-07-03 10:30:24.000000000 -0400
@@ -80,23 +80,12 @@ struct cpu_dbs_info_s {
 	int cpu;
 	unsigned int enable:1;
 };
+
+/* Protected by cpufreq_gov_mutex in cpufreq.c */
 static DEFINE_PER_CPU(struct cpu_dbs_info_s, cpu_dbs_info);
 
 static unsigned int dbs_enable;	/* number of CPUs using this policy */
 
-/*
- * DEADLOCK ALERT! There is a ordering requirement between cpu_hotplug
- * lock and dbs_mutex. cpu_hotplug lock should always be held before
- * dbs_mutex. If any function that can potentially take cpu_hotplug lock
- * (like __cpufreq_driver_target()) is being called with dbs_mutex taken, then
- * cpu_hotplug lock should be taken before that. Note that cpu_hotplug lock
- * is recursive for the same process. -Venki
- * DEADLOCK ALERT! (2) : do_dbs_timer() must not take the dbs_mutex, because it
- * would deadlock with cancel_delayed_work_sync(), which is needed for proper
- * raceless workqueue teardown.
- */
-static DEFINE_MUTEX(dbs_mutex);
-
 static struct workqueue_struct	*kconservative_wq;
 
 static struct dbs_tuners {
@@ -236,9 +225,7 @@ static ssize_t store_sampling_down_facto
 	if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1)
 		return -EINVAL;
 
-	mutex_lock(&dbs_mutex);
 	dbs_tuners_ins.sampling_down_factor = input;
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -253,9 +240,7 @@ static ssize_t store_sampling_rate(struc
 	if (ret != 1)
 		return -EINVAL;
 
-	mutex_lock(&dbs_mutex);
 	dbs_tuners_ins.sampling_rate = max(input, minimum_sampling_rate());
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -267,15 +252,11 @@ static ssize_t store_up_threshold(struct
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
-	mutex_lock(&dbs_mutex);
 	if (ret != 1 || input > 100 ||
-			input <= dbs_tuners_ins.down_threshold) {
-		mutex_unlock(&dbs_mutex);
+			input <= dbs_tuners_ins.down_threshold)
 		return -EINVAL;
-	}
 
 	dbs_tuners_ins.up_threshold = input;
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -287,16 +268,12 @@ static ssize_t store_down_threshold(stru
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
-	mutex_lock(&dbs_mutex);
 	/* cannot be lower than 11 otherwise freq will not fall */
 	if (ret != 1 || input < 11 || input > 100 ||
-			input >= dbs_tuners_ins.up_threshold) {
-		mutex_unlock(&dbs_mutex);
+			input >= dbs_tuners_ins.up_threshold)
 		return -EINVAL;
-	}
 
 	dbs_tuners_ins.down_threshold = input;
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -316,11 +293,8 @@ static ssize_t store_ignore_nice_load(st
 	if (input > 1)
 		input = 1;
 
-	mutex_lock(&dbs_mutex);
-	if (input == dbs_tuners_ins.ignore_nice) { /* nothing to do */
-		mutex_unlock(&dbs_mutex);
+	if (input == dbs_tuners_ins.ignore_nice) /* nothing to do */
 		return count;
-	}
 	dbs_tuners_ins.ignore_nice = input;
 
 	/* we need to re-evaluate prev_cpu_idle */
@@ -332,7 +306,6 @@ static ssize_t store_ignore_nice_load(st
 		if (dbs_tuners_ins.ignore_nice)
 			dbs_info->prev_cpu_nice = kstat_cpu(j).cpustat.nice;
 	}
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -352,9 +325,7 @@ static ssize_t store_freq_step(struct cp
 
 	/* no need to test here if freq_step is zero as the user might actually
 	 * want this, they would be crazy though :) */
-	mutex_lock(&dbs_mutex);
 	dbs_tuners_ins.freq_step = input;
-	mutex_unlock(&dbs_mutex);
 
 	return count;
 }
@@ -548,6 +519,12 @@ static inline void dbs_timer_exit(struct
 	cancel_delayed_work_sync(&dbs_info->work);
 }
 
+/*
+ * Always called with cpufreq_gov_mutex held.
+ * Called with lock_policy_rwsem write lock held except for GOV_STOP.
+ * Called with only cpufreq_gov_mutex held for GOV_STOP for timer sync
+ * teardown.
+ */
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 				   unsigned int event)
 {
@@ -566,13 +543,9 @@ static int cpufreq_governor_dbs(struct c
 		if (this_dbs_info->enable) /* Already enabled */
 			break;
 
-		mutex_lock(&dbs_mutex);
-
 		rc = sysfs_create_group(&policy->kobj, &dbs_attr_group);
-		if (rc) {
-			mutex_unlock(&dbs_mutex);
+		if (rc)
 			return rc;
-		}
 
 		for_each_cpu(j, policy->cpus) {
 			struct cpu_dbs_info_s *j_dbs_info;
@@ -613,12 +586,9 @@ static int cpufreq_governor_dbs(struct c
 		}
 		dbs_timer_init(this_dbs_info);
 
-		mutex_unlock(&dbs_mutex);
-
 		break;
 
 	case CPUFREQ_GOV_STOP:
-		mutex_lock(&dbs_mutex);
 		dbs_timer_exit(this_dbs_info);
 		sysfs_remove_group(&policy->kobj, &dbs_attr_group);
 		dbs_enable--;
@@ -632,12 +602,9 @@ static int cpufreq_governor_dbs(struct c
 					&dbs_cpufreq_notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
 
-		mutex_unlock(&dbs_mutex);
-
 		break;
 
 	case CPUFREQ_GOV_LIMITS:
-		mutex_lock(&dbs_mutex);
 		if (policy->max < this_dbs_info->cur_policy->cur)
 			__cpufreq_driver_target(
 					this_dbs_info->cur_policy,
@@ -646,7 +613,6 @@ static int cpufreq_governor_dbs(struct c
 			__cpufreq_driver_target(
 					this_dbs_info->cur_policy,
 					policy->min, CPUFREQ_RELATION_L);
-		mutex_unlock(&dbs_mutex);
 
 		break;
 	}

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [patch 2.6.30 3/4] cpufreq add gov mutex
       [not found]   ` <20090703152714.953585501-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
@ 2009-07-03 19:11     ` Pallipadi, Venkatesh
       [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0DE-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Pallipadi, Venkatesh @ 2009-07-03 19:11 UTC (permalink / raw)
  To: Mathieu Desnoyers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpu
  Cc: Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

 

>-----Original Message-----
>From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
>Sent: Friday, July 03, 2009 8:25 AM
>To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
>Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
>kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
>Young; Pekka Enberg
>Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
>sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
>Subject: [patch 2.6.30 3/4] cpufreq add gov mutex
>
>Using the cpufreq_gov_mutex to protect internal governor data 
>structures. The
>policy rwsem write lock nests inside this mutex. 

This makes the whole locking in cpufreq upside down. Takes away
most of the reason for existence of rwsem lock. Taking a smaller
Percpu rwsem lock inside a bigger global mutex doesn't seem right.

>The policy 
>rwsem is taken in
>the timer handler, and therefore cannot be held while doing a 
>sync teardown of
>the timer.  This cpufreq_gov_mutex lock protects init/teardown 
>of governor
>timers.

Why is the protection required for init/teardown of percpu
timer with a global mutex? cpufreq rwsem (when held correctly)
ensures that there can be only one store_scaling_governor for
a cpu at any point in time. I don't see any reason why we need
a global mutex to protect timer init teardown.


> This is why this lock, although it protects a data 
>structure located
>within the governors, is located in cpufreq.c.

I think this is taking a step back in terms of locking. A system
wide Cpufreq_gov_mutex is being held more frequently now and seems
to be providing all the serialization needed. The locks crossing
layers of cpufreq core and governor is just going to make
situation worse IMO.

If we really want to pursue this option, we should get rid
of rwsem altogether. It is not adding much value
and just providing lot more confusion with things like rwsem
should be held everywhere else other than CPUFREQ_GOV_STOP.


Thanks,
Venki

>
>Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
>CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>CC: rjw-KKrjLPT3xs0@public.gmane.org
>CC: mingo-X9Un+BFzKDI@public.gmane.org
>CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
>CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
>CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
>CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
>CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
>---
> drivers/cpufreq/cpufreq.c |   38 
>+++++++++++++++++++++++++++++++++++---
> 1 file changed, 35 insertions(+), 3 deletions(-)
>
>Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
>===================================================================
>--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
>2009-07-03 09:48:35.000000000 -0400
>+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-03 
>10:12:44.000000000 -0400
>@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no
> static LIST_HEAD(cpufreq_governor_list);
> static DEFINE_MUTEX(cpufreq_governor_mutex);
> 
>+/*
>+ * Using the cpufreq_gov_mutex to protect internal governor
>+ * data structures. The policy rwsem write lock nests inside 
>this mutex.
>+ * The policy rwsem is taken in the timer handler, and 
>therefore cannot be
>+ * held while doing a sync teardown of the timer.
>+ * This cpufreq_gov_mutex lock protects init/teardown of 
>governor timers.
>+ * This is why this lock, although it protects a data 
>structure located within
>+ * the governors, is here.
>+ */
>+static DEFINE_MUTEX(cpufreq_gov_mutex);
>+
> struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
> {
> 	struct cpufreq_policy *data;
>@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob
> 	if (!policy)
> 		goto no_policy;
> 
>+	mutex_lock(&cpufreq_gov_mutex);
> 	if (lock_policy_rwsem_write(policy->cpu) < 0)
> 		goto fail;
> 
>@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob
> 
> 	unlock_policy_rwsem_write(policy->cpu);
> fail:
>+	mutex_unlock(&cpufreq_gov_mutex);
> 	cpufreq_cpu_put(policy);
> no_policy:
> 	return ret;
>@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de
> 
> 	/* Initially set CPU itself as the policy_cpu */
> 	per_cpu(policy_cpu, cpu) = cpu;
>+	mutex_lock(&cpufreq_gov_mutex);
> 	ret = (lock_policy_rwsem_write(cpu) < 0);
> 	WARN_ON(ret);
> 
>@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de
> 					cpufreq_driver->exit(policy);
> 				ret = -EBUSY;
> 				cpufreq_cpu_put(managed_policy);
>-				goto err_free_cpumask;
>+				goto err_unlock_gov_mutex;
> 			}
> 
> 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
>@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de
> 	}
> 
> 	unlock_policy_rwsem_write(cpu);
>+	mutex_unlock(&cpufreq_gov_mutex);
> 
> 	kobject_uevent(&policy->kobj, KOBJ_ADD);
> 	module_put(cpufreq_driver->owner);
>@@ -989,6 +1004,8 @@ out_driver_exit:
> 
> err_unlock_policy:
> 	unlock_policy_rwsem_write(cpu);
>+err_unlock_gov_mutex:
>+	mutex_unlock(&cpufreq_gov_mutex);
> err_free_cpumask:
> 	free_cpumask_var(policy->cpus);
> err_free_policy:
>@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s
> 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> 		cpufreq_debug_enable_ratelimit();
> 		unlock_policy_rwsem_write(cpu);
>+		mutex_unlock(&cpufreq_gov_mutex);
> 		return -EINVAL;
> 	}
> 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
>@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s
> 		cpufreq_cpu_put(data);
> 		cpufreq_debug_enable_ratelimit();
> 		unlock_policy_rwsem_write(cpu);
>+		mutex_unlock(&cpufreq_gov_mutex);
> 		return 0;
> 	}
> #endif
>@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s
> #endif
> 
> 	unlock_policy_rwsem_write(cpu);
>-
>+	/*
>+	 * Calling only with the cpufreq_gov_mutex held because
>+	 * sync timer teardown has locking dependency with policy rwsem.
>+	 */
> 	if (cpufreq_driver->target)
> 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
>+	mutex_unlock(&cpufreq_gov_mutex);
> 
> 	kobject_put(&data->kobj);
> 
>@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys
> 	if (cpu_is_offline(cpu))
> 		return 0;
> 
>+	mutex_lock(&cpufreq_gov_mutex);
> 	if (unlikely(lock_policy_rwsem_write(cpu)))
> 		BUG();
> 
>@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq
> 	if (!policy)
> 		goto no_policy;
> 
>-	if (unlikely(lock_policy_rwsem_write(policy->cpu)))
>+	mutex_lock(&cpufreq_gov_mutex);
>+	if (unlikely(lock_policy_rwsem_write(policy->cpu))) {
>+		mutex_unlock(&cpufreq_gov_mutex);
> 		goto fail;
>+	}
> 
> 	ret = __cpufreq_driver_target(policy, target_freq, relation);
> 
> 	unlock_policy_rwsem_write(policy->cpu);
>+	mutex_unlock(&cpufreq_gov_mutex);
> 
> fail:
> 	cpufreq_cpu_put(policy);
>@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c
> 		goto no_policy;
> 	}
> 
>+	mutex_lock(&cpufreq_gov_mutex);
> 	if (unlikely(lock_policy_rwsem_write(cpu))) {
>+		mutex_unlock(&cpufreq_gov_mutex);
> 		ret = -EINVAL;
> 		goto fail;
> 	}
>@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c
> 	ret = __cpufreq_set_policy(data, &policy);
> 
> 	unlock_policy_rwsem_write(cpu);
>+	mutex_unlock(&cpufreq_gov_mutex);
> 
> fail:
> 	cpufreq_cpu_put(data);
>@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac
> 			break;
> 		case CPU_DOWN_PREPARE:
> 		case CPU_DOWN_PREPARE_FROZEN:
>+			mutex_lock(&cpufreq_gov_mutex);
> 			if (unlikely(lock_policy_rwsem_write(cpu)))
> 				BUG();
> 
>
>-- 
>Mathieu Desnoyers
>OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 
>A8FE 3BAE 9A68
>--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
       [not found]   ` <20090703152714.775719309-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
@ 2009-07-03 19:24     ` Pallipadi, Venkatesh
       [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0F5-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Pallipadi, Venkatesh @ 2009-07-03 19:24 UTC (permalink / raw)
  To: Mathieu Desnoyers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpu
  Cc: Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

 

>-----Original Message-----
>From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
>Sent: Friday, July 03, 2009 8:25 AM
>To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
>Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
>kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
>Young; Pekka Enberg
>Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
>sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
>Subject: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
>
>OK, I've tried to clean it up the best I could, but please 
>test this with
>concurrent cpu hotplug and cpufreq add/remove in loops. I'm 
>sure we will make
>other interesting findings.
>

This is a good and needed cleanup of cpufreq_add_dev.


>This is step one of fixing the overall locking dependency mess 
>in cpufreq.
>
>Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
>CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>CC: rjw-KKrjLPT3xs0@public.gmane.org
>CC: mingo-X9Un+BFzKDI@public.gmane.org
>CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
>CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
>CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
>CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
>CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
>---
> drivers/cpufreq/cpufreq.c |   65 
>++++++++++++++++++++++++++++------------------
> 1 file changed, 40 insertions(+), 25 deletions(-)
>
>Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
>===================================================================
>--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
>2009-07-02 23:59:08.000000000 -0400
>+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-02 
>23:59:09.000000000 -0400
>@@ -763,6 +763,10 @@ static struct kobj_type ktype_cpufreq = 
>  * cpufreq_add_dev - add a CPU device
>  *
>  * Adds the cpufreq interface for a CPU device.
>+ *
>+ * The Oracle says: try running cpufreq 
>registration/unregistration concurrently
>+ * with with cpu hotplugging and all hell will break loose. 
>Tried to clean this
>+ * mess up, but more thorough testing is needed. - Mathieu
>  */
> static int cpufreq_add_dev(struct sys_device *sys_dev)
> {
>@@ -806,15 +810,12 @@ static int cpufreq_add_dev(struct sys_de
> 		goto nomem_out;
> 	}
> 	if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL)) {
>-		kfree(policy);
> 		ret = -ENOMEM;
>-		goto nomem_out;
>+		goto err_free_policy;
> 	}
> 	if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) {
>-		free_cpumask_var(policy->cpus);
>-		kfree(policy);
> 		ret = -ENOMEM;
>-		goto nomem_out;
>+		goto err_free_cpumask;
> 	}
> 
> 	policy->cpu = cpu;
>@@ -822,7 +823,8 @@ static int cpufreq_add_dev(struct sys_de
> 
> 	/* Initially set CPU itself as the policy_cpu */
> 	per_cpu(policy_cpu, cpu) = cpu;
>-	lock_policy_rwsem_write(cpu);
>+	ret = (lock_policy_rwsem_write(cpu) < 0);
>+	WARN_ON(ret);
> 
> 	init_completion(&policy->kobj_unregister);
> 	INIT_WORK(&policy->update, handle_update);
>@@ -835,7 +837,7 @@ static int cpufreq_add_dev(struct sys_de
> 	ret = cpufreq_driver->init(policy);
> 	if (ret) {
> 		dprintk("initialization failed\n");
>-		goto err_out;
>+		goto err_unlock_policy;
> 	}
> 	policy->user_policy.min = policy->min;
> 	policy->user_policy.max = policy->max;
>@@ -860,15 +862,21 @@ static int cpufreq_add_dev(struct sys_de
> 		/* Check for existing affected CPUs.
> 		 * They may not be aware of it due to CPU Hotplug.
> 		 */
>-		managed_policy = cpufreq_cpu_get(j);		
>/* FIXME: Where is this released?  What about error paths? */
>+		managed_policy = cpufreq_cpu_get(j);
> 		if (unlikely(managed_policy)) {
> 
> 			/* Set proper policy_cpu */
> 			unlock_policy_rwsem_write(cpu);
> 			per_cpu(policy_cpu, cpu) = managed_policy->cpu;
> 
>-			if (lock_policy_rwsem_write(cpu) < 0)
>-				goto err_out_driver_exit;
>+			if (lock_policy_rwsem_write(cpu) < 0) {
>+				/* Should not go through policy 
>unlock path */
>+				if (cpufreq_driver->exit)
>+					cpufreq_driver->exit(policy);
>+				ret = -EBUSY;
>+				cpufreq_cpu_put(managed_policy);
>+				goto err_free_cpumask;
>+			}
> 
> 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> 			cpumask_copy(managed_policy->cpus, 
>policy->cpus);
>@@ -879,12 +887,14 @@ static int cpufreq_add_dev(struct sys_de
> 			ret = sysfs_create_link(&sys_dev->kobj,
> 						&managed_policy->kobj,
> 						"cpufreq");
>-			if (ret)
>-				goto err_out_driver_exit;
>-
>-			cpufreq_debug_enable_ratelimit();
>-			ret = 0;
>-			goto err_out_driver_exit; /* call 
>driver->exit() */
>+			if (!ret)
>+				cpufreq_cpu_put(managed_policy);

Looks like cpufreq_cpu_put is needed both with ret and !ret. No?

Thanks,
Venki

>+			/*
>+			 * Success. We only needed to be added 
>to the mask.
>+			 * Call driver->exit() because only the 
>cpu parent of
>+			 * the kobj needed to call init().
>+			 */
>+			goto out_driver_exit; /* call driver->exit() */
> 		}
> 	}
> #endif
>@@ -894,25 +904,25 @@ static int cpufreq_add_dev(struct sys_de
> 	ret = kobject_init_and_add(&policy->kobj, 
>&ktype_cpufreq, &sys_dev->kobj,
> 				   "cpufreq");
> 	if (ret)
>-		goto err_out_driver_exit;
>+		goto out_driver_exit;
> 
> 	/* set up files for this cpu device */
> 	drv_attr = cpufreq_driver->attr;
> 	while ((drv_attr) && (*drv_attr)) {
> 		ret = sysfs_create_file(&policy->kobj, 
>&((*drv_attr)->attr));
> 		if (ret)
>-			goto err_out_driver_exit;
>+			goto err_out_kobj_put;
> 		drv_attr++;
> 	}
> 	if (cpufreq_driver->get) {
> 		ret = sysfs_create_file(&policy->kobj, 
>&cpuinfo_cur_freq.attr);
> 		if (ret)
>-			goto err_out_driver_exit;
>+			goto err_out_kobj_put;
> 	}
> 	if (cpufreq_driver->target) {
> 		ret = sysfs_create_file(&policy->kobj, 
>&scaling_cur_freq.attr);
> 		if (ret)
>-			goto err_out_driver_exit;
>+			goto err_out_kobj_put;
> 	}
> 
> 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
>@@ -930,12 +940,14 @@ static int cpufreq_add_dev(struct sys_de
> 			continue;
> 
> 		dprintk("CPU %u already managed, adding link\n", j);
>-		cpufreq_cpu_get(cpu);
>+		managed_policy = cpufreq_cpu_get(cpu);
> 		cpu_sys_dev = get_cpu_sysdev(j);
> 		ret = sysfs_create_link(&cpu_sys_dev->kobj, 
>&policy->kobj,
> 					"cpufreq");
>-		if (ret)
>+		if (ret) {
>+			cpufreq_cpu_put(managed_policy);
> 			goto err_out_unregister;
>+		}
> 	}
> 
> 	policy->governor = NULL; /* to assure that the starting 
>sequence is
>@@ -967,17 +979,20 @@ err_out_unregister:
> 		per_cpu(cpufreq_cpu_data, j) = NULL;
> 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> 
>+err_out_kobj_put:
> 	kobject_put(&policy->kobj);
> 	wait_for_completion(&policy->kobj_unregister);
> 
>-err_out_driver_exit:
>+out_driver_exit:
> 	if (cpufreq_driver->exit)
> 		cpufreq_driver->exit(policy);
> 
>-err_out:
>+err_unlock_policy:
> 	unlock_policy_rwsem_write(cpu);
>+err_free_cpumask:
>+	free_cpumask_var(policy->cpus);
>+err_free_policy:
> 	kfree(policy);
>-
> nomem_out:
> 	module_put(cpufreq_driver->owner);
> module_out:
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2.6.30 3/4] cpufreq add gov mutex
       [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0DE-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2009-07-03 19:36         ` Mathieu Desnoyers
  2009-07-06 18:50           ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 19:36 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ingo Molnar, rjw-KKrjLPT3xs0@public.gmane.org, Dave Young,
	Pekka Enberg, Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

* Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
>  
> 
> >-----Original Message-----
> >From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
> >Sent: Friday, July 03, 2009 8:25 AM
> >To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
> >Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
> >kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
> >Young; Pekka Enberg
> >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
> >sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> >Subject: [patch 2.6.30 3/4] cpufreq add gov mutex
> >
> >Using the cpufreq_gov_mutex to protect internal governor data 
> >structures. The
> >policy rwsem write lock nests inside this mutex. 
> 
> This makes the whole locking in cpufreq upside down. Takes away
> most of the reason for existence of rwsem lock. Taking a smaller
> Percpu rwsem lock inside a bigger global mutex doesn't seem right.
> 

It permits the timer handler to only take the rwsem write lock. My
understanding is that the most frequent operation is the timer handler
being executed, but maybe I'm wrong ?

I think it ends up being a questions of clearly identifying how
frequent each data structure access are, and why this per-governor rwsem
is needed at all. (does it bring any perceivable performance improvement
on a critical path ? Then maybe we should use RCU ?)

I really wonder if the fact that you switch the timer handler from rwsem
to a mutex in your patchset will not hurt performance in some way or
have some unforeseen side-effect. This is why I made my patchset as dumb
and simple as possible, because this is a bugfix for 2.6.30, not a
reingineering. Let's make it work, then make it fast.

> >The policy 
> >rwsem is taken in
> >the timer handler, and therefore cannot be held while doing a 
> >sync teardown of
> >the timer.  This cpufreq_gov_mutex lock protects init/teardown 
> >of governor
> >timers.
> 
> Why is the protection required for init/teardown of percpu
> timer with a global mutex? cpufreq rwsem (when held correctly)
> ensures that there can be only one store_scaling_governor for
> a cpu at any point in time. I don't see any reason why we need
> a global mutex to protect timer init teardown.
> 

rwsem was previously held in the timer handler. I am not yet convinced
that simply holding a local dbs_mutex will ensure proper synchronization
of *all* call paths in the timer handler.

So given I prefer to do the least intrusive modification, I leave the
rwsem write lock in the timer handler, but it leaves no choice but to
take a mutex _outside_ of the rwsem lock to synchronize timer
init/teardown.

> 
> > This is why this lock, although it protects a data 
> >structure located
> >within the governors, is located in cpufreq.c.
> 
> I think this is taking a step back in terms of locking. A system
> wide Cpufreq_gov_mutex is being held more frequently now and seems
> to be providing all the serialization needed. The locks crossing
> layers of cpufreq core and governor is just going to make
> situation worse IMO.

Can you prove that my simple bugfix will cause a significant performance
regression ?

> 
> If we really want to pursue this option, we should get rid
> of rwsem altogether. It is not adding much value
> and just providing lot more confusion with things like rwsem
> should be held everywhere else other than CPUFREQ_GOV_STOP.
> 

I agree that this rwsem is just odd. But please keep its removal for
2.6.32, and consider using RCU then if performance really matters.

And please don't try to overengineer a simple bugfix. The bogus mutex
removal you proposed a few weeks ago turned me into the "cautious" mode.
And looking at the general code quality of cpufreq (timer teardown
races, error path not handled, unbalanced locks, cpu hotplug support
broken) makes me vote for the most utterly simple solution. I currently
don't care about performance _at all_. I care about something as simple
as making sure this thing stop blowing up.

Mathieu

> 
> Thanks,
> Venki
> 
> >
> >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
> >CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >CC: rjw-KKrjLPT3xs0@public.gmane.org
> >CC: mingo-X9Un+BFzKDI@public.gmane.org
> >CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> >CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
> >CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
> >CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> >CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
> >---
> > drivers/cpufreq/cpufreq.c |   38 
> >+++++++++++++++++++++++++++++++++++---
> > 1 file changed, 35 insertions(+), 3 deletions(-)
> >
> >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
> >===================================================================
> >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
> >2009-07-03 09:48:35.000000000 -0400
> >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-03 
> >10:12:44.000000000 -0400
> >@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no
> > static LIST_HEAD(cpufreq_governor_list);
> > static DEFINE_MUTEX(cpufreq_governor_mutex);
> > 
> >+/*
> >+ * Using the cpufreq_gov_mutex to protect internal governor
> >+ * data structures. The policy rwsem write lock nests inside 
> >this mutex.
> >+ * The policy rwsem is taken in the timer handler, and 
> >therefore cannot be
> >+ * held while doing a sync teardown of the timer.
> >+ * This cpufreq_gov_mutex lock protects init/teardown of 
> >governor timers.
> >+ * This is why this lock, although it protects a data 
> >structure located within
> >+ * the governors, is here.
> >+ */
> >+static DEFINE_MUTEX(cpufreq_gov_mutex);
> >+
> > struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
> > {
> > 	struct cpufreq_policy *data;
> >@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob
> > 	if (!policy)
> > 		goto no_policy;
> > 
> >+	mutex_lock(&cpufreq_gov_mutex);
> > 	if (lock_policy_rwsem_write(policy->cpu) < 0)
> > 		goto fail;
> > 
> >@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob
> > 
> > 	unlock_policy_rwsem_write(policy->cpu);
> > fail:
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > 	cpufreq_cpu_put(policy);
> > no_policy:
> > 	return ret;
> >@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de
> > 
> > 	/* Initially set CPU itself as the policy_cpu */
> > 	per_cpu(policy_cpu, cpu) = cpu;
> >+	mutex_lock(&cpufreq_gov_mutex);
> > 	ret = (lock_policy_rwsem_write(cpu) < 0);
> > 	WARN_ON(ret);
> > 
> >@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de
> > 					cpufreq_driver->exit(policy);
> > 				ret = -EBUSY;
> > 				cpufreq_cpu_put(managed_policy);
> >-				goto err_free_cpumask;
> >+				goto err_unlock_gov_mutex;
> > 			}
> > 
> > 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> >@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de
> > 	}
> > 
> > 	unlock_policy_rwsem_write(cpu);
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > 
> > 	kobject_uevent(&policy->kobj, KOBJ_ADD);
> > 	module_put(cpufreq_driver->owner);
> >@@ -989,6 +1004,8 @@ out_driver_exit:
> > 
> > err_unlock_policy:
> > 	unlock_policy_rwsem_write(cpu);
> >+err_unlock_gov_mutex:
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > err_free_cpumask:
> > 	free_cpumask_var(policy->cpus);
> > err_free_policy:
> >@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s
> > 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> > 		cpufreq_debug_enable_ratelimit();
> > 		unlock_policy_rwsem_write(cpu);
> >+		mutex_unlock(&cpufreq_gov_mutex);
> > 		return -EINVAL;
> > 	}
> > 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> >@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s
> > 		cpufreq_cpu_put(data);
> > 		cpufreq_debug_enable_ratelimit();
> > 		unlock_policy_rwsem_write(cpu);
> >+		mutex_unlock(&cpufreq_gov_mutex);
> > 		return 0;
> > 	}
> > #endif
> >@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s
> > #endif
> > 
> > 	unlock_policy_rwsem_write(cpu);
> >-
> >+	/*
> >+	 * Calling only with the cpufreq_gov_mutex held because
> >+	 * sync timer teardown has locking dependency with policy rwsem.
> >+	 */
> > 	if (cpufreq_driver->target)
> > 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > 
> > 	kobject_put(&data->kobj);
> > 
> >@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys
> > 	if (cpu_is_offline(cpu))
> > 		return 0;
> > 
> >+	mutex_lock(&cpufreq_gov_mutex);
> > 	if (unlikely(lock_policy_rwsem_write(cpu)))
> > 		BUG();
> > 
> >@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq
> > 	if (!policy)
> > 		goto no_policy;
> > 
> >-	if (unlikely(lock_policy_rwsem_write(policy->cpu)))
> >+	mutex_lock(&cpufreq_gov_mutex);
> >+	if (unlikely(lock_policy_rwsem_write(policy->cpu))) {
> >+		mutex_unlock(&cpufreq_gov_mutex);
> > 		goto fail;
> >+	}
> > 
> > 	ret = __cpufreq_driver_target(policy, target_freq, relation);
> > 
> > 	unlock_policy_rwsem_write(policy->cpu);
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > 
> > fail:
> > 	cpufreq_cpu_put(policy);
> >@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c
> > 		goto no_policy;
> > 	}
> > 
> >+	mutex_lock(&cpufreq_gov_mutex);
> > 	if (unlikely(lock_policy_rwsem_write(cpu))) {
> >+		mutex_unlock(&cpufreq_gov_mutex);
> > 		ret = -EINVAL;
> > 		goto fail;
> > 	}
> >@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c
> > 	ret = __cpufreq_set_policy(data, &policy);
> > 
> > 	unlock_policy_rwsem_write(cpu);
> >+	mutex_unlock(&cpufreq_gov_mutex);
> > 
> > fail:
> > 	cpufreq_cpu_put(data);
> >@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac
> > 			break;
> > 		case CPU_DOWN_PREPARE:
> > 		case CPU_DOWN_PREPARE_FROZEN:
> >+			mutex_lock(&cpufreq_gov_mutex);
> > 			if (unlikely(lock_policy_rwsem_write(cpu)))
> > 				BUG();
> > 
> >
> >-- 
> >Mathieu Desnoyers
> >OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 
> >A8FE 3BAE 9A68
> >

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
       [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0F5-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2009-07-03 19:41         ` Mathieu Desnoyers
  2009-07-06 18:02           ` Pallipadi, Venkatesh
  0 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-03 19:41 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ingo Molnar, rjw-KKrjLPT3xs0@public.gmane.org, Dave Young,
	Pekka Enberg, Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

* Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
>  
> 
> >-----Original Message-----
> >From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
> >Sent: Friday, July 03, 2009 8:25 AM
> >To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
> >Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
> >kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
> >Young; Pekka Enberg
> >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
> >sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> >Subject: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
> >
> >OK, I've tried to clean it up the best I could, but please 
> >test this with
> >concurrent cpu hotplug and cpufreq add/remove in loops. I'm 
> >sure we will make
> >other interesting findings.
> >
> 
> This is a good and needed cleanup of cpufreq_add_dev.
> 
> 
> >This is step one of fixing the overall locking dependency mess 
> >in cpufreq.
> >
> >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
> >CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >CC: rjw-KKrjLPT3xs0@public.gmane.org
> >CC: mingo-X9Un+BFzKDI@public.gmane.org
> >CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> >CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
> >CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
> >CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> >CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
> >---
> > drivers/cpufreq/cpufreq.c |   65 
> >++++++++++++++++++++++++++++------------------
> > 1 file changed, 40 insertions(+), 25 deletions(-)
> >
> >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
> >===================================================================
> >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
> >2009-07-02 23:59:08.000000000 -0400
> >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-02 
> >23:59:09.000000000 -0400
> >@@ -763,6 +763,10 @@ static struct kobj_type ktype_cpufreq = 
> >  * cpufreq_add_dev - add a CPU device
> >  *
> >  * Adds the cpufreq interface for a CPU device.
> >+ *
> >+ * The Oracle says: try running cpufreq 
> >registration/unregistration concurrently
> >+ * with with cpu hotplugging and all hell will break loose. 
> >Tried to clean this
> >+ * mess up, but more thorough testing is needed. - Mathieu
> >  */
> > static int cpufreq_add_dev(struct sys_device *sys_dev)
> > {
> >@@ -806,15 +810,12 @@ static int cpufreq_add_dev(struct sys_de
> > 		goto nomem_out;
> > 	}
> > 	if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL)) {
> >-		kfree(policy);
> > 		ret = -ENOMEM;
> >-		goto nomem_out;
> >+		goto err_free_policy;
> > 	}
> > 	if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) {
> >-		free_cpumask_var(policy->cpus);
> >-		kfree(policy);
> > 		ret = -ENOMEM;
> >-		goto nomem_out;
> >+		goto err_free_cpumask;
> > 	}
> > 
> > 	policy->cpu = cpu;
> >@@ -822,7 +823,8 @@ static int cpufreq_add_dev(struct sys_de
> > 
> > 	/* Initially set CPU itself as the policy_cpu */
> > 	per_cpu(policy_cpu, cpu) = cpu;
> >-	lock_policy_rwsem_write(cpu);
> >+	ret = (lock_policy_rwsem_write(cpu) < 0);
> >+	WARN_ON(ret);
> > 
> > 	init_completion(&policy->kobj_unregister);
> > 	INIT_WORK(&policy->update, handle_update);
> >@@ -835,7 +837,7 @@ static int cpufreq_add_dev(struct sys_de
> > 	ret = cpufreq_driver->init(policy);
> > 	if (ret) {
> > 		dprintk("initialization failed\n");
> >-		goto err_out;
> >+		goto err_unlock_policy;
> > 	}
> > 	policy->user_policy.min = policy->min;
> > 	policy->user_policy.max = policy->max;
> >@@ -860,15 +862,21 @@ static int cpufreq_add_dev(struct sys_de
> > 		/* Check for existing affected CPUs.
> > 		 * They may not be aware of it due to CPU Hotplug.
> > 		 */
> >-		managed_policy = cpufreq_cpu_get(j);		
> >/* FIXME: Where is this released?  What about error paths? */
> >+		managed_policy = cpufreq_cpu_get(j);
> > 		if (unlikely(managed_policy)) {
> > 
> > 			/* Set proper policy_cpu */
> > 			unlock_policy_rwsem_write(cpu);
> > 			per_cpu(policy_cpu, cpu) = managed_policy->cpu;
> > 
> >-			if (lock_policy_rwsem_write(cpu) < 0)
> >-				goto err_out_driver_exit;
> >+			if (lock_policy_rwsem_write(cpu) < 0) {
> >+				/* Should not go through policy 
> >unlock path */
> >+				if (cpufreq_driver->exit)
> >+					cpufreq_driver->exit(policy);
> >+				ret = -EBUSY;
> >+				cpufreq_cpu_put(managed_policy);
> >+				goto err_free_cpumask;
> >+			}
> > 
> > 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> > 			cpumask_copy(managed_policy->cpus, 
> >policy->cpus);
> >@@ -879,12 +887,14 @@ static int cpufreq_add_dev(struct sys_de
> > 			ret = sysfs_create_link(&sys_dev->kobj,
> > 						&managed_policy->kobj,
> > 						"cpufreq");
> >-			if (ret)
> >-				goto err_out_driver_exit;
> >-
> >-			cpufreq_debug_enable_ratelimit();
> >-			ret = 0;
> >-			goto err_out_driver_exit; /* call 
> >driver->exit() */
> >+			if (!ret)
> >+				cpufreq_cpu_put(managed_policy);
> 
> Looks like cpufreq_cpu_put is needed both with ret and !ret. No?
> 

No. ret == 0 path is a "success path" only creating a symlink, and
therefore __cpufreq_remove_dev() will take care of calling the
cpufreq_cpu_put() to decrement the reference count :

static int __cpufreq_remove_dev(struct sys_device *sys_dev)
{
...

        if (unlikely(cpu != data->cpu)) {
                dprintk("removing link\n");
                cpumask_clear_cpu(cpu, data->cpus);
                spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
                sysfs_remove_link(&sys_dev->kobj, "cpufreq");
                cpufreq_cpu_put(data);
                cpufreq_debug_enable_ratelimit();
                unlock_policy_rwsem_write(cpu);
                return 0;
        }

This is, at least, how I understand what is happening here.

Mathieu


> Thanks,
> Venki
> 
> >+			/*
> >+			 * Success. We only needed to be added 
> >to the mask.
> >+			 * Call driver->exit() because only the 
> >cpu parent of
> >+			 * the kobj needed to call init().
> >+			 */
> >+			goto out_driver_exit; /* call driver->exit() */
> > 		}
> > 	}
> > #endif
> >@@ -894,25 +904,25 @@ static int cpufreq_add_dev(struct sys_de
> > 	ret = kobject_init_and_add(&policy->kobj, 
> >&ktype_cpufreq, &sys_dev->kobj,
> > 				   "cpufreq");
> > 	if (ret)
> >-		goto err_out_driver_exit;
> >+		goto out_driver_exit;
> > 
> > 	/* set up files for this cpu device */
> > 	drv_attr = cpufreq_driver->attr;
> > 	while ((drv_attr) && (*drv_attr)) {
> > 		ret = sysfs_create_file(&policy->kobj, 
> >&((*drv_attr)->attr));
> > 		if (ret)
> >-			goto err_out_driver_exit;
> >+			goto err_out_kobj_put;
> > 		drv_attr++;
> > 	}
> > 	if (cpufreq_driver->get) {
> > 		ret = sysfs_create_file(&policy->kobj, 
> >&cpuinfo_cur_freq.attr);
> > 		if (ret)
> >-			goto err_out_driver_exit;
> >+			goto err_out_kobj_put;
> > 	}
> > 	if (cpufreq_driver->target) {
> > 		ret = sysfs_create_file(&policy->kobj, 
> >&scaling_cur_freq.attr);
> > 		if (ret)
> >-			goto err_out_driver_exit;
> >+			goto err_out_kobj_put;
> > 	}
> > 
> > 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
> >@@ -930,12 +940,14 @@ static int cpufreq_add_dev(struct sys_de
> > 			continue;
> > 
> > 		dprintk("CPU %u already managed, adding link\n", j);
> >-		cpufreq_cpu_get(cpu);
> >+		managed_policy = cpufreq_cpu_get(cpu);
> > 		cpu_sys_dev = get_cpu_sysdev(j);
> > 		ret = sysfs_create_link(&cpu_sys_dev->kobj, 
> >&policy->kobj,
> > 					"cpufreq");
> >-		if (ret)
> >+		if (ret) {
> >+			cpufreq_cpu_put(managed_policy);
> > 			goto err_out_unregister;
> >+		}
> > 	}
> > 
> > 	policy->governor = NULL; /* to assure that the starting 
> >sequence is
> >@@ -967,17 +979,20 @@ err_out_unregister:
> > 		per_cpu(cpufreq_cpu_data, j) = NULL;
> > 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> > 
> >+err_out_kobj_put:
> > 	kobject_put(&policy->kobj);
> > 	wait_for_completion(&policy->kobj_unregister);
> > 
> >-err_out_driver_exit:
> >+out_driver_exit:
> > 	if (cpufreq_driver->exit)
> > 		cpufreq_driver->exit(policy);
> > 
> >-err_out:
> >+err_unlock_policy:
> > 	unlock_policy_rwsem_write(cpu);
> >+err_free_cpumask:
> >+	free_cpumask_var(policy->cpus);
> >+err_free_policy:
> > 	kfree(policy);
> >-
> > nomem_out:
> > 	module_put(cpufreq_driver->owner);
> > module_out:
> >
> 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
  2009-07-03 19:41         ` Mathieu Desnoyers
@ 2009-07-06 18:02           ` Pallipadi, Venkatesh
  0 siblings, 0 replies; 12+ messages in thread
From: Pallipadi, Venkatesh @ 2009-07-06 18:02 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ingo Molnar, rjw-KKrjLPT3xs0@public.gmane.org, Dave Young,
	Pekka Enberg, Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

On Fri, 2009-07-03 at 12:41 -0700, Mathieu Desnoyers wrote:
> * Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
> >  
> > 
> > >-----Original Message-----
> > >From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
> > >Sent: Friday, July 03, 2009 8:25 AM
> > >To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
> > >Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
> > >kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
> > >Young; Pekka Enberg
> > >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
> > >sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > >Subject: [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess
> > >
> > >OK, I've tried to clean it up the best I could, but please 
> > >test this with
> > >concurrent cpu hotplug and cpufreq add/remove in loops. I'm 
> > >sure we will make
> > >other interesting findings.
> > >
> > 
> > This is a good and needed cleanup of cpufreq_add_dev.
> > 
> > 
> > >This is step one of fixing the overall locking dependency mess 
> > >in cpufreq.
> > >
> > >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
> > >CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > >CC: rjw-KKrjLPT3xs0@public.gmane.org
> > >CC: mingo-X9Un+BFzKDI@public.gmane.org
> > >CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > >CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> > >CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > >CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
> > >CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
> > >CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > >CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
> > >---
> > > drivers/cpufreq/cpufreq.c |   65 
> > >++++++++++++++++++++++++++++------------------
> > > 1 file changed, 40 insertions(+), 25 deletions(-)
> > >
> > >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
> > >===================================================================
> > >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
> > >2009-07-02 23:59:08.000000000 -0400
> > >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-02 
> > >23:59:09.000000000 -0400
> > >@@ -763,6 +763,10 @@ static struct kobj_type ktype_cpufreq = 
> > >  * cpufreq_add_dev - add a CPU device
> > >  *
> > >  * Adds the cpufreq interface for a CPU device.
> > >+ *
> > >+ * The Oracle says: try running cpufreq 
> > >registration/unregistration concurrently
> > >+ * with with cpu hotplugging and all hell will break loose. 
> > >Tried to clean this
> > >+ * mess up, but more thorough testing is needed. - Mathieu
> > >  */
> > > static int cpufreq_add_dev(struct sys_device *sys_dev)
> > > {
> > >@@ -806,15 +810,12 @@ static int cpufreq_add_dev(struct sys_de
> > > 		goto nomem_out;
> > > 	}
> > > 	if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL)) {
> > >-		kfree(policy);
> > > 		ret = -ENOMEM;
> > >-		goto nomem_out;
> > >+		goto err_free_policy;
> > > 	}
> > > 	if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL)) {
> > >-		free_cpumask_var(policy->cpus);
> > >-		kfree(policy);
> > > 		ret = -ENOMEM;
> > >-		goto nomem_out;
> > >+		goto err_free_cpumask;
> > > 	}
> > > 
> > > 	policy->cpu = cpu;
> > >@@ -822,7 +823,8 @@ static int cpufreq_add_dev(struct sys_de
> > > 
> > > 	/* Initially set CPU itself as the policy_cpu */
> > > 	per_cpu(policy_cpu, cpu) = cpu;
> > >-	lock_policy_rwsem_write(cpu);
> > >+	ret = (lock_policy_rwsem_write(cpu) < 0);
> > >+	WARN_ON(ret);
> > > 
> > > 	init_completion(&policy->kobj_unregister);
> > > 	INIT_WORK(&policy->update, handle_update);
> > >@@ -835,7 +837,7 @@ static int cpufreq_add_dev(struct sys_de
> > > 	ret = cpufreq_driver->init(policy);
> > > 	if (ret) {
> > > 		dprintk("initialization failed\n");
> > >-		goto err_out;
> > >+		goto err_unlock_policy;
> > > 	}
> > > 	policy->user_policy.min = policy->min;
> > > 	policy->user_policy.max = policy->max;
> > >@@ -860,15 +862,21 @@ static int cpufreq_add_dev(struct sys_de
> > > 		/* Check for existing affected CPUs.
> > > 		 * They may not be aware of it due to CPU Hotplug.
> > > 		 */
> > >-		managed_policy = cpufreq_cpu_get(j);		
> > >/* FIXME: Where is this released?  What about error paths? */
> > >+		managed_policy = cpufreq_cpu_get(j);
> > > 		if (unlikely(managed_policy)) {
> > > 
> > > 			/* Set proper policy_cpu */
> > > 			unlock_policy_rwsem_write(cpu);
> > > 			per_cpu(policy_cpu, cpu) = managed_policy->cpu;
> > > 
> > >-			if (lock_policy_rwsem_write(cpu) < 0)
> > >-				goto err_out_driver_exit;
> > >+			if (lock_policy_rwsem_write(cpu) < 0) {
> > >+				/* Should not go through policy 
> > >unlock path */
> > >+				if (cpufreq_driver->exit)
> > >+					cpufreq_driver->exit(policy);
> > >+				ret = -EBUSY;
> > >+				cpufreq_cpu_put(managed_policy);
> > >+				goto err_free_cpumask;
> > >+			}
> > > 
> > > 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> > > 			cpumask_copy(managed_policy->cpus, 
> > >policy->cpus);
> > >@@ -879,12 +887,14 @@ static int cpufreq_add_dev(struct sys_de
> > > 			ret = sysfs_create_link(&sys_dev->kobj,
> > > 						&managed_policy->kobj,
> > > 						"cpufreq");
> > >-			if (ret)
> > >-				goto err_out_driver_exit;
> > >-
> > >-			cpufreq_debug_enable_ratelimit();
> > >-			ret = 0;
> > >-			goto err_out_driver_exit; /* call 
> > >driver->exit() */
> > >+			if (!ret)
> > >+				cpufreq_cpu_put(managed_policy);
> > 
> > Looks like cpufreq_cpu_put is needed both with ret and !ret. No?
> > 
> 
> No. ret == 0 path is a "success path" only creating a symlink, and
> therefore __cpufreq_remove_dev() will take care of calling the
> cpufreq_cpu_put() to decrement the reference count :
> 
> static int __cpufreq_remove_dev(struct sys_device *sys_dev)
> {
> ...
> 
>         if (unlikely(cpu != data->cpu)) {
>                 dprintk("removing link\n");
>                 cpumask_clear_cpu(cpu, data->cpus);
>                 spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
>                 sysfs_remove_link(&sys_dev->kobj, "cpufreq");
>                 cpufreq_cpu_put(data);
>                 cpufreq_debug_enable_ratelimit();
>                 unlock_policy_rwsem_write(cpu);
>                 return 0;
>         }
> 
> This is, at least, how I understand what is happening here.
> 

Agreed.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2.6.30 3/4] cpufreq add gov mutex
  2009-07-03 19:36         ` Mathieu Desnoyers
@ 2009-07-06 18:50           ` Pallipadi, Venkatesh
       [not found]             ` <1246906203.11545.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Pallipadi, Venkatesh @ 2009-07-06 18:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ingo Molnar, rjw-KKrjLPT3xs0@public.gmane.org, Dave Young,
	Pekka Enberg, Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

On Fri, 2009-07-03 at 12:36 -0700, Mathieu Desnoyers wrote:
> * Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
> >  
> > 
> > >-----Original Message-----
> > >From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
> > >Sent: Friday, July 03, 2009 8:25 AM
> > >To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
> > >Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
> > >kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
> > >Young; Pekka Enberg
> > >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
> > >sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > >Subject: [patch 2.6.30 3/4] cpufreq add gov mutex
> > >
> > >Using the cpufreq_gov_mutex to protect internal governor data 
> > >structures. The
> > >policy rwsem write lock nests inside this mutex. 
> > 
> > This makes the whole locking in cpufreq upside down. Takes away
> > most of the reason for existence of rwsem lock. Taking a smaller
> > Percpu rwsem lock inside a bigger global mutex doesn't seem right.
> > 
> 
> It permits the timer handler to only take the rwsem write lock. My
> understanding is that the most frequent operation is the timer handler
> being executed, but maybe I'm wrong ?
> 
> I think it ends up being a questions of clearly identifying how
> frequent each data structure access are, and why this per-governor rwsem
> is needed at all. (does it bring any perceivable performance improvement
> on a critical path ? Then maybe we should use RCU ?)
> 
> I really wonder if the fact that you switch the timer handler from rwsem
> to a mutex in your patchset will not hurt performance in some way or
> have some unforeseen side-effect. This is why I made my patchset as dumb
> and simple as possible, because this is a bugfix for 2.6.30, not a
> reingineering. Let's make it work, then make it fast.

timer handler is frequent and not short lived. I agree with 2.6.30
solution to be simple and not worrying about performance. That is the
reason I moved dbs_mutex into timer path., we need minimal and safe
solution for .30. That was with
 3 files changed, 24 insertions(+), 34 deletions(-)

Ones the rest of the cleanups gets enough testing and if they indeed
make into later releases, it can always be backported to .30.something
if timer locking causes some side effects.

The changes here is lot more and makes already complicated and ugly
locking uglier. After some time no one will know why we have locks
spread in so many different files getting called from some other file.

> > >The policy 
> > >rwsem is taken in
> > >the timer handler, and therefore cannot be held while doing a 
> > >sync teardown of
> > >the timer.  This cpufreq_gov_mutex lock protects init/teardown 
> > >of governor
> > >timers.
> > 
> > Why is the protection required for init/teardown of percpu
> > timer with a global mutex? cpufreq rwsem (when held correctly)
> > ensures that there can be only one store_scaling_governor for
> > a cpu at any point in time. I don't see any reason why we need
> > a global mutex to protect timer init teardown.
> > 
> 
> rwsem was previously held in the timer handler. I am not yet convinced
> that simply holding a local dbs_mutex will ensure proper synchronization
> of *all* call paths in the timer handler.

All call paths of timer handler? May be I am missing something. It is
only called from the delayed work. It gets sync stopped from GOV_STOP,
so no need of locking there. It gets started from GOV_START where timer
can never be active (STOP would have already stopped it if it was
running before) and no need of any locking there.
Or did you mean callpaths inside? ondemand will not depend on anythong
from cpufreq core other than policy limit changes. It obviously cannot
be running when add_dev or remove_dev stuff is going on. It doesn't
really care about someone reading scaling_available_frequencies or
scaling_available_governors etc from cpufreq core.
dbs_mutex in timer is just the simple and dumb fix to resolve the
deadlock that we have now.

> So given I prefer to do the least intrusive modification, I leave the
> rwsem write lock in the timer handler, but it leaves no choice but to
> take a mutex _outside_ of the rwsem lock to synchronize timer
> init/teardown.
> 
> > 
> > > This is why this lock, although it protects a data 
> > >structure located
> > >within the governors, is located in cpufreq.c.
> > 
> > I think this is taking a step back in terms of locking. A system
> > wide Cpufreq_gov_mutex is being held more frequently now and seems
> > to be providing all the serialization needed. The locks crossing
> > layers of cpufreq core and governor is just going to make
> > situation worse IMO.
> 
> Can you prove that my simple bugfix will cause a significant performance
> regression ?

I am not saying it is a performance regression. I am saying it is making
things more messy and confusing and can only add to the complications we
have. Given the other simple alternative patch we have...

> > 
> > If we really want to pursue this option, we should get rid
> > of rwsem altogether. It is not adding much value
> > and just providing lot more confusion with things like rwsem
> > should be held everywhere else other than CPUFREQ_GOV_STOP.
> > 
> 
> I agree that this rwsem is just odd. But please keep its removal for
> 2.6.32, and consider using RCU then if performance really matters.
> 
> And please don't try to overengineer a simple bugfix. The bogus mutex
> removal you proposed a few weeks ago turned me into the "cautious" mode.

<finger_pointing>
The whole deadlock problem started from this incomplete/bogus bug fix

commit b14893a62c73af0eca414cfed505b8c09efc613c
[CPUFREQ] fix timer teardown in ondemand governor

which created more problems than it fixed.
</finger_pointing>

> And looking at the general code quality of cpufreq (timer teardown
> races, error path not handled, unbalanced locks, cpu hotplug support
> broken) makes me vote for the most utterly simple solution. I currently
> don't care about performance _at all_. I care about something as simple
> as making sure this thing stop blowing up.
> 

I think we are on the same page with respect to simple fix and not
bothering about performance for now. But, our opinions differ on what
simple bug fix is in this case. Anyways, we have two alternatives here
and I will be happy to let Dave pick the solution he likes the best.

Thanks,
Venki

> > >
> > >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
> > >CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > >CC: rjw-KKrjLPT3xs0@public.gmane.org
> > >CC: mingo-X9Un+BFzKDI@public.gmane.org
> > >CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > >CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> > >CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > >CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
> > >CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
> > >CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > >CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
> > >---
> > > drivers/cpufreq/cpufreq.c |   38 
> > >+++++++++++++++++++++++++++++++++++---
> > > 1 file changed, 35 insertions(+), 3 deletions(-)
> > >
> > >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
> > >===================================================================
> > >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
> > >2009-07-03 09:48:35.000000000 -0400
> > >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-03 
> > >10:12:44.000000000 -0400
> > >@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no
> > > static LIST_HEAD(cpufreq_governor_list);
> > > static DEFINE_MUTEX(cpufreq_governor_mutex);
> > > 
> > >+/*
> > >+ * Using the cpufreq_gov_mutex to protect internal governor
> > >+ * data structures. The policy rwsem write lock nests inside 
> > >this mutex.
> > >+ * The policy rwsem is taken in the timer handler, and 
> > >therefore cannot be
> > >+ * held while doing a sync teardown of the timer.
> > >+ * This cpufreq_gov_mutex lock protects init/teardown of 
> > >governor timers.
> > >+ * This is why this lock, although it protects a data 
> > >structure located within
> > >+ * the governors, is here.
> > >+ */
> > >+static DEFINE_MUTEX(cpufreq_gov_mutex);
> > >+
> > > struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
> > > {
> > > 	struct cpufreq_policy *data;
> > >@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob
> > > 	if (!policy)
> > > 		goto no_policy;
> > > 
> > >+	mutex_lock(&cpufreq_gov_mutex);
> > > 	if (lock_policy_rwsem_write(policy->cpu) < 0)
> > > 		goto fail;
> > > 
> > >@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob
> > > 
> > > 	unlock_policy_rwsem_write(policy->cpu);
> > > fail:
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > 	cpufreq_cpu_put(policy);
> > > no_policy:
> > > 	return ret;
> > >@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de
> > > 
> > > 	/* Initially set CPU itself as the policy_cpu */
> > > 	per_cpu(policy_cpu, cpu) = cpu;
> > >+	mutex_lock(&cpufreq_gov_mutex);
> > > 	ret = (lock_policy_rwsem_write(cpu) < 0);
> > > 	WARN_ON(ret);
> > > 
> > >@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de
> > > 					cpufreq_driver->exit(policy);
> > > 				ret = -EBUSY;
> > > 				cpufreq_cpu_put(managed_policy);
> > >-				goto err_free_cpumask;
> > >+				goto err_unlock_gov_mutex;
> > > 			}
> > > 
> > > 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> > >@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de
> > > 	}
> > > 
> > > 	unlock_policy_rwsem_write(cpu);
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > 
> > > 	kobject_uevent(&policy->kobj, KOBJ_ADD);
> > > 	module_put(cpufreq_driver->owner);
> > >@@ -989,6 +1004,8 @@ out_driver_exit:
> > > 
> > > err_unlock_policy:
> > > 	unlock_policy_rwsem_write(cpu);
> > >+err_unlock_gov_mutex:
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > err_free_cpumask:
> > > 	free_cpumask_var(policy->cpus);
> > > err_free_policy:
> > >@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s
> > > 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> > > 		cpufreq_debug_enable_ratelimit();
> > > 		unlock_policy_rwsem_write(cpu);
> > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > 		return -EINVAL;
> > > 	}
> > > 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> > >@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s
> > > 		cpufreq_cpu_put(data);
> > > 		cpufreq_debug_enable_ratelimit();
> > > 		unlock_policy_rwsem_write(cpu);
> > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > 		return 0;
> > > 	}
> > > #endif
> > >@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s
> > > #endif
> > > 
> > > 	unlock_policy_rwsem_write(cpu);
> > >-
> > >+	/*
> > >+	 * Calling only with the cpufreq_gov_mutex held because
> > >+	 * sync timer teardown has locking dependency with policy rwsem.
> > >+	 */
> > > 	if (cpufreq_driver->target)
> > > 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > 
> > > 	kobject_put(&data->kobj);
> > > 
> > >@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys
> > > 	if (cpu_is_offline(cpu))
> > > 		return 0;
> > > 
> > >+	mutex_lock(&cpufreq_gov_mutex);
> > > 	if (unlikely(lock_policy_rwsem_write(cpu)))
> > > 		BUG();
> > > 
> > >@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq
> > > 	if (!policy)
> > > 		goto no_policy;
> > > 
> > >-	if (unlikely(lock_policy_rwsem_write(policy->cpu)))
> > >+	mutex_lock(&cpufreq_gov_mutex);
> > >+	if (unlikely(lock_policy_rwsem_write(policy->cpu))) {
> > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > 		goto fail;
> > >+	}
> > > 
> > > 	ret = __cpufreq_driver_target(policy, target_freq, relation);
> > > 
> > > 	unlock_policy_rwsem_write(policy->cpu);
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > 
> > > fail:
> > > 	cpufreq_cpu_put(policy);
> > >@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c
> > > 		goto no_policy;
> > > 	}
> > > 
> > >+	mutex_lock(&cpufreq_gov_mutex);
> > > 	if (unlikely(lock_policy_rwsem_write(cpu))) {
> > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > 		ret = -EINVAL;
> > > 		goto fail;
> > > 	}
> > >@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c
> > > 	ret = __cpufreq_set_policy(data, &policy);
> > > 
> > > 	unlock_policy_rwsem_write(cpu);
> > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > 
> > > fail:
> > > 	cpufreq_cpu_put(data);
> > >@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac
> > > 			break;
> > > 		case CPU_DOWN_PREPARE:
> > > 		case CPU_DOWN_PREPARE_FROZEN:
> > >+			mutex_lock(&cpufreq_gov_mutex);
> > > 			if (unlikely(lock_policy_rwsem_write(cpu)))
> > > 				BUG();
> > > 
> > >
> > >-- 
> > >Mathieu Desnoyers
> > >OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 
> > >A8FE 3BAE 9A68
> > >
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 2.6.30 3/4] cpufreq add gov mutex
       [not found]             ` <1246906203.11545.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-07-07 22:59               ` Mathieu Desnoyers
  0 siblings, 0 replies; 12+ messages in thread
From: Mathieu Desnoyers @ 2009-07-07 22:59 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Jones,
	Thomas Renninger, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ingo Molnar, rjw-KKrjLPT3xs0@public.gmane.org, Dave Young,
	Pekka Enberg, Li, Shaohua, Rusty Russell,
	sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org

* Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
> On Fri, 2009-07-03 at 12:36 -0700, Mathieu Desnoyers wrote:
> > * Pallipadi, Venkatesh (venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org) wrote:
> > >  
> > > 
> > > >-----Original Message-----
> > > >From: Mathieu Desnoyers [mailto:mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org] 
> > > >Sent: Friday, July 03, 2009 8:25 AM
> > > >To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Pallipadi, Venkatesh; Dave 
> > > >Jones; Thomas Renninger; cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 
> > > >kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Ingo Molnar; rjw-KKrjLPT3xs0@public.gmane.org; Dave 
> > > >Young; Pekka Enberg
> > > >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; 
> > > >sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > > >Subject: [patch 2.6.30 3/4] cpufreq add gov mutex
> > > >
> > > >Using the cpufreq_gov_mutex to protect internal governor data 
> > > >structures. The
> > > >policy rwsem write lock nests inside this mutex. 
> > > 
> > > This makes the whole locking in cpufreq upside down. Takes away
> > > most of the reason for existence of rwsem lock. Taking a smaller
> > > Percpu rwsem lock inside a bigger global mutex doesn't seem right.
> > > 
> > 
> > It permits the timer handler to only take the rwsem write lock. My
> > understanding is that the most frequent operation is the timer handler
> > being executed, but maybe I'm wrong ?
> > 
> > I think it ends up being a questions of clearly identifying how
> > frequent each data structure access are, and why this per-governor rwsem
> > is needed at all. (does it bring any perceivable performance improvement
> > on a critical path ? Then maybe we should use RCU ?)
> > 
> > I really wonder if the fact that you switch the timer handler from rwsem
> > to a mutex in your patchset will not hurt performance in some way or
> > have some unforeseen side-effect. This is why I made my patchset as dumb
> > and simple as possible, because this is a bugfix for 2.6.30, not a
> > reingineering. Let's make it work, then make it fast.
> 
> timer handler is frequent and not short lived. I agree with 2.6.30
> solution to be simple and not worrying about performance. That is the
> reason I moved dbs_mutex into timer path., we need minimal and safe
> solution for .30. That was with
>  3 files changed, 24 insertions(+), 34 deletions(-)
> 
> Ones the rest of the cleanups gets enough testing and if they indeed
> make into later releases, it can always be backported to .30.something
> if timer locking causes some side effects.
> 
> The changes here is lot more and makes already complicated and ugly
> locking uglier. After some time no one will know why we have locks
> spread in so many different files getting called from some other file.
> 
> > > >The policy 
> > > >rwsem is taken in
> > > >the timer handler, and therefore cannot be held while doing a 
> > > >sync teardown of
> > > >the timer.  This cpufreq_gov_mutex lock protects init/teardown 
> > > >of governor
> > > >timers.
> > > 
> > > Why is the protection required for init/teardown of percpu
> > > timer with a global mutex? cpufreq rwsem (when held correctly)
> > > ensures that there can be only one store_scaling_governor for
> > > a cpu at any point in time. I don't see any reason why we need
> > > a global mutex to protect timer init teardown.
> > > 
> > 
> > rwsem was previously held in the timer handler. I am not yet convinced
> > that simply holding a local dbs_mutex will ensure proper synchronization
> > of *all* call paths in the timer handler.
> 
> All call paths of timer handler? May be I am missing something. It is
> only called from the delayed work. It gets sync stopped from GOV_STOP,
> so no need of locking there. It gets started from GOV_START where timer
> can never be active (STOP would have already stopped it if it was
> running before) and no need of any locking there.
> Or did you mean callpaths inside?

Yep, see below,

> ondemand will not depend on anythong
> from cpufreq core other than policy limit changes. It obviously cannot
> be running when add_dev or remove_dev stuff is going on.

I am worried about potential races between add_dev/remove_dev, which
currently lock the rwsem as mean of protection, and execution of timer
handler that would not take the rwsem to protect itself anymore, due to
your changes.

I'm especially worried about the call to 

              __cpufreq_driver_target(dbs_info->cur_policy,
                        dbs_info->freq_lo, CPUFREQ_RELATION_H);

which seems to depend on policy-level information, protected at the
rwsem-level.

By removing the rwsem from the timer handler, I don't see how you plan
to protect this information from add_dev/remove_dev execution.

> It doesn't
> really care about someone reading scaling_available_frequencies or
> scaling_available_governors etc from cpufreq core.
> dbs_mutex in timer is just the simple and dumb fix to resolve the
> deadlock that we have now.
> 
> > So given I prefer to do the least intrusive modification, I leave the
> > rwsem write lock in the timer handler, but it leaves no choice but to
> > take a mutex _outside_ of the rwsem lock to synchronize timer
> > init/teardown.
> > 
> > > 
> > > > This is why this lock, although it protects a data 
> > > >structure located
> > > >within the governors, is located in cpufreq.c.
> > > 
> > > I think this is taking a step back in terms of locking. A system
> > > wide Cpufreq_gov_mutex is being held more frequently now and seems
> > > to be providing all the serialization needed. The locks crossing
> > > layers of cpufreq core and governor is just going to make
> > > situation worse IMO.
> > 
> > Can you prove that my simple bugfix will cause a significant performance
> > regression ?
> 
> I am not saying it is a performance regression. I am saying it is making
> things more messy and confusing and can only add to the complications we
> have. Given the other simple alternative patch we have...
> 
> > > 
> > > If we really want to pursue this option, we should get rid
> > > of rwsem altogether. It is not adding much value
> > > and just providing lot more confusion with things like rwsem
> > > should be held everywhere else other than CPUFREQ_GOV_STOP.
> > > 
> > 
> > I agree that this rwsem is just odd. But please keep its removal for
> > 2.6.32, and consider using RCU then if performance really matters.
> > 
> > And please don't try to overengineer a simple bugfix. The bogus mutex
> > removal you proposed a few weeks ago turned me into the "cautious" mode.
> 
> <finger_pointing>
> The whole deadlock problem started from this incomplete/bogus bug fix
> 
> commit b14893a62c73af0eca414cfed505b8c09efc613c
> [CPUFREQ] fix timer teardown in ondemand governor
> 
> which created more problems than it fixed.
> </finger_pointing>

My laptop stopped crashing randomly each week after this fix, but a
reverse locking dependency chain was created. To me, it fixed problems
and caused rarer problems for which warnings were clearer. Please let's
not fight on this, it's useless.  I'm just stating why I'm being
cautious here, and this is mainly because the code seems to be
spaghetti-like fragile. :-)

> 
> > And looking at the general code quality of cpufreq (timer teardown
> > races, error path not handled, unbalanced locks, cpu hotplug support
> > broken) makes me vote for the most utterly simple solution. I currently
> > don't care about performance _at all_. I care about something as simple
> > as making sure this thing stop blowing up.
> > 
> 
> I think we are on the same page with respect to simple fix and not
> bothering about performance for now. But, our opinions differ on what
> simple bug fix is in this case. Anyways, we have two alternatives here
> and I will be happy to let Dave pick the solution he likes the best.
> 

Yep.

Thanks,

Mathieu

> Thanks,
> Venki
> 
> > > >
> > > >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
> > > >CC: Venkatesh Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > >CC: rjw-KKrjLPT3xs0@public.gmane.org
> > > >CC: mingo-X9Un+BFzKDI@public.gmane.org
> > > >CC: Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > > >CC: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> > > >CC: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > > >CC: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
> > > >CC: Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
> > > >CC: sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org
> > > >CC: cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > >CC: Thomas Renninger <trenn-l3A5Bk7waGM@public.gmane.org>
> > > >---
> > > > drivers/cpufreq/cpufreq.c |   38 
> > > >+++++++++++++++++++++++++++++++++++---
> > > > 1 file changed, 35 insertions(+), 3 deletions(-)
> > > >
> > > >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
> > > >===================================================================
> > > >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	
> > > >2009-07-03 09:48:35.000000000 -0400
> > > >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-07-03 
> > > >10:12:44.000000000 -0400
> > > >@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no
> > > > static LIST_HEAD(cpufreq_governor_list);
> > > > static DEFINE_MUTEX(cpufreq_governor_mutex);
> > > > 
> > > >+/*
> > > >+ * Using the cpufreq_gov_mutex to protect internal governor
> > > >+ * data structures. The policy rwsem write lock nests inside 
> > > >this mutex.
> > > >+ * The policy rwsem is taken in the timer handler, and 
> > > >therefore cannot be
> > > >+ * held while doing a sync teardown of the timer.
> > > >+ * This cpufreq_gov_mutex lock protects init/teardown of 
> > > >governor timers.
> > > >+ * This is why this lock, although it protects a data 
> > > >structure located within
> > > >+ * the governors, is here.
> > > >+ */
> > > >+static DEFINE_MUTEX(cpufreq_gov_mutex);
> > > >+
> > > > struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
> > > > {
> > > > 	struct cpufreq_policy *data;
> > > >@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob
> > > > 	if (!policy)
> > > > 		goto no_policy;
> > > > 
> > > >+	mutex_lock(&cpufreq_gov_mutex);
> > > > 	if (lock_policy_rwsem_write(policy->cpu) < 0)
> > > > 		goto fail;
> > > > 
> > > >@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob
> > > > 
> > > > 	unlock_policy_rwsem_write(policy->cpu);
> > > > fail:
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > 	cpufreq_cpu_put(policy);
> > > > no_policy:
> > > > 	return ret;
> > > >@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de
> > > > 
> > > > 	/* Initially set CPU itself as the policy_cpu */
> > > > 	per_cpu(policy_cpu, cpu) = cpu;
> > > >+	mutex_lock(&cpufreq_gov_mutex);
> > > > 	ret = (lock_policy_rwsem_write(cpu) < 0);
> > > > 	WARN_ON(ret);
> > > > 
> > > >@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de
> > > > 					cpufreq_driver->exit(policy);
> > > > 				ret = -EBUSY;
> > > > 				cpufreq_cpu_put(managed_policy);
> > > >-				goto err_free_cpumask;
> > > >+				goto err_unlock_gov_mutex;
> > > > 			}
> > > > 
> > > > 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
> > > >@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de
> > > > 	}
> > > > 
> > > > 	unlock_policy_rwsem_write(cpu);
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > 
> > > > 	kobject_uevent(&policy->kobj, KOBJ_ADD);
> > > > 	module_put(cpufreq_driver->owner);
> > > >@@ -989,6 +1004,8 @@ out_driver_exit:
> > > > 
> > > > err_unlock_policy:
> > > > 	unlock_policy_rwsem_write(cpu);
> > > >+err_unlock_gov_mutex:
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > err_free_cpumask:
> > > > 	free_cpumask_var(policy->cpus);
> > > > err_free_policy:
> > > >@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s
> > > > 		spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> > > > 		cpufreq_debug_enable_ratelimit();
> > > > 		unlock_policy_rwsem_write(cpu);
> > > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > > 		return -EINVAL;
> > > > 	}
> > > > 	per_cpu(cpufreq_cpu_data, cpu) = NULL;
> > > >@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s
> > > > 		cpufreq_cpu_put(data);
> > > > 		cpufreq_debug_enable_ratelimit();
> > > > 		unlock_policy_rwsem_write(cpu);
> > > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > > 		return 0;
> > > > 	}
> > > > #endif
> > > >@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s
> > > > #endif
> > > > 
> > > > 	unlock_policy_rwsem_write(cpu);
> > > >-
> > > >+	/*
> > > >+	 * Calling only with the cpufreq_gov_mutex held because
> > > >+	 * sync timer teardown has locking dependency with policy rwsem.
> > > >+	 */
> > > > 	if (cpufreq_driver->target)
> > > > 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > 
> > > > 	kobject_put(&data->kobj);
> > > > 
> > > >@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys
> > > > 	if (cpu_is_offline(cpu))
> > > > 		return 0;
> > > > 
> > > >+	mutex_lock(&cpufreq_gov_mutex);
> > > > 	if (unlikely(lock_policy_rwsem_write(cpu)))
> > > > 		BUG();
> > > > 
> > > >@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq
> > > > 	if (!policy)
> > > > 		goto no_policy;
> > > > 
> > > >-	if (unlikely(lock_policy_rwsem_write(policy->cpu)))
> > > >+	mutex_lock(&cpufreq_gov_mutex);
> > > >+	if (unlikely(lock_policy_rwsem_write(policy->cpu))) {
> > > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > > 		goto fail;
> > > >+	}
> > > > 
> > > > 	ret = __cpufreq_driver_target(policy, target_freq, relation);
> > > > 
> > > > 	unlock_policy_rwsem_write(policy->cpu);
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > 
> > > > fail:
> > > > 	cpufreq_cpu_put(policy);
> > > >@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c
> > > > 		goto no_policy;
> > > > 	}
> > > > 
> > > >+	mutex_lock(&cpufreq_gov_mutex);
> > > > 	if (unlikely(lock_policy_rwsem_write(cpu))) {
> > > >+		mutex_unlock(&cpufreq_gov_mutex);
> > > > 		ret = -EINVAL;
> > > > 		goto fail;
> > > > 	}
> > > >@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c
> > > > 	ret = __cpufreq_set_policy(data, &policy);
> > > > 
> > > > 	unlock_policy_rwsem_write(cpu);
> > > >+	mutex_unlock(&cpufreq_gov_mutex);
> > > > 
> > > > fail:
> > > > 	cpufreq_cpu_put(data);
> > > >@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac
> > > > 			break;
> > > > 		case CPU_DOWN_PREPARE:
> > > > 		case CPU_DOWN_PREPARE_FROZEN:
> > > >+			mutex_lock(&cpufreq_gov_mutex);
> > > > 			if (unlikely(lock_policy_rwsem_write(cpu)))
> > > > 				BUG();
> > > > 
> > > >
> > > >-- 
> > > >Mathieu Desnoyers
> > > >OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 
> > > >A8FE 3BAE 9A68
> > > >
> > 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-07-07 22:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-03 15:25 [patch 2.6.30 0/4] Fix cpufreq locking dependency (resend) Mathieu Desnoyers
2009-07-03 15:25 ` [patch 2.6.30 1/4] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) Mathieu Desnoyers
2009-07-03 15:25 ` [patch 2.6.30 2/4] CPUFREQ: fix (utter) cpufreq_add_dev mess Mathieu Desnoyers
     [not found]   ` <20090703152714.775719309-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
2009-07-03 19:24     ` Pallipadi, Venkatesh
     [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0F5-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2009-07-03 19:41         ` Mathieu Desnoyers
2009-07-06 18:02           ` Pallipadi, Venkatesh
2009-07-03 15:25 ` [patch 2.6.30 3/4] cpufreq add gov mutex Mathieu Desnoyers
     [not found]   ` <20090703152714.953585501-scC8bbJcJLCw5LPnMra/2Q@public.gmane.org>
2009-07-03 19:11     ` Pallipadi, Venkatesh
     [not found]       ` <7E82351C108FA840AB1866AC776AEC4669BFF0DE-osO9UTpF0URqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2009-07-03 19:36         ` Mathieu Desnoyers
2009-07-06 18:50           ` Pallipadi, Venkatesh
     [not found]             ` <1246906203.11545.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-07-07 22:59               ` Mathieu Desnoyers
2009-07-03 15:25 ` [patch 2.6.30 4/4] cpufreq ondemand and conservative remove dbs_mutex Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).