public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue
@ 2008-06-26  7:56 Paul Menage
  2008-06-26  9:34 ` Vegard Nossum
  2008-06-27  3:22 ` Gautham R Shenoy
  0 siblings, 2 replies; 14+ messages in thread
From: Paul Menage @ 2008-06-26  7:56 UTC (permalink / raw)
  To: Vegard Nossum, Paul Jackson, a.p.zijlstra, maxk; +Cc: linux-kernel

CPUsets: Move most calls to rebuild_sched_domains() to the workqueue

In the current cpusets code the lock nesting between cgroup_mutex and
cpuhotplug.lock when calling rebuild_sched_domains is inconsistent -
in the CPU hotplug path cpuhotplug.lock nests outside cgroup_mutex,
and in all other paths that call rebuild_sched_domains() it nests
inside.

This patch makes most calls to rebuild_sched_domains() asynchronous
via the workqueue, which removes the nesting of the two locks in that
case. In the case of an actual hotplug event, cpuhotplug.lock nests
outside cgroup_mutex as now.

Signed-off-by: Paul Menage <menage@google.com>

---

Note that all I've done with this patch is verify that it compiles
without warnings; I'm not sure how to trigger a hotplug event to test
the lock dependencies or verify that scheduler domain support is still
behaving correctly. Vegard, does this fix the problems that you were
seeing? Paul/Max, does this still seem sane with regard to scheduler domains?


 kernel/cpuset.c |   35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

Index: lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
===================================================================
--- lockfix-2.6.26-rc5-mm3.orig/kernel/cpuset.c
+++ lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
@@ -522,13 +522,9 @@ update_domain_attr(struct sched_domain_a
  * domains when operating in the severe memory shortage situations
  * that could cause allocation failures below.
  *
- * Call with cgroup_mutex held.  May take callback_mutex during
- * call due to the kfifo_alloc() and kmalloc() calls.  May nest
- * a call to the get_online_cpus()/put_online_cpus() pair.
- * Must not be called holding callback_mutex, because we must not
- * call get_online_cpus() while holding callback_mutex.  Elsewhere
- * the kernel nests callback_mutex inside get_online_cpus() calls.
- * So the reverse nesting would risk an ABBA deadlock.
+ * Call with cgroup_mutex held, and inside get_online_cpus().  May
+ * take callback_mutex during call due to the kfifo_alloc() and
+ * kmalloc() calls.
  *
  * The three key local variables below are:
  *    q  - a kfifo queue of cpuset pointers, used to implement a
@@ -689,9 +685,7 @@ restart:
 
 rebuild:
 	/* Have scheduler rebuild sched domains */
-	get_online_cpus();
 	partition_sched_domains(ndoms, doms, dattr);
-	put_online_cpus();
 
 done:
 	if (q && !IS_ERR(q))
@@ -701,6 +695,21 @@ done:
 	/* Don't kfree(dattr) -- partition_sched_domains() does that. */
 }
 
+/*
+ * Due to the need to nest cgroup_mutex inside cpuhotplug.lock, most
+ * of our invocations of rebuild_sched_domains() are done
+ * asynchronously via the workqueue
+ */
+static void delayed_rebuild_sched_domains(struct work_struct *work)
+{
+	get_online_cpus();
+	cgroup_lock();
+	rebuild_sched_domains();
+	cgroup_unlock();
+	put_online_cpus();
+}
+static DECLARE_WORK(rebuild_sched_domains_work, delayed_rebuild_sched_domains);
+
 static inline int started_after_time(struct task_struct *t1,
 				     struct timespec *time,
 				     struct task_struct *t2)
@@ -853,7 +862,7 @@ static int update_cpumask(struct cpuset 
 		return retval;
 
 	if (is_load_balanced)
-		rebuild_sched_domains();
+		schedule_work(&rebuild_sched_domains_work);
 	return 0;
 }
 
@@ -1080,7 +1089,7 @@ static int update_relax_domain_level(str
 
 	if (val != cs->relax_domain_level) {
 		cs->relax_domain_level = val;
-		rebuild_sched_domains();
+		schedule_work(&rebuild_sched_domains_work);
 	}
 
 	return 0;
@@ -1121,7 +1130,7 @@ static int update_flag(cpuset_flagbits_t
 	mutex_unlock(&callback_mutex);
 
 	if (cpus_nonempty && balance_flag_changed)
-		rebuild_sched_domains();
+		schedule_work(&rebuild_sched_domains_work);
 
 	return 0;
 }
@@ -1929,6 +1938,7 @@ static void scan_for_empty_cpusets(const
 
 static void common_cpu_mem_hotplug_unplug(void)
 {
+	get_online_cpus();
 	cgroup_lock();
 
 	top_cpuset.cpus_allowed = cpu_online_map;
@@ -1942,6 +1952,7 @@ static void common_cpu_mem_hotplug_unplu
 	rebuild_sched_domains();
 
 	cgroup_unlock();
+	put_online_cpus();
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-06-27 17:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26  7:56 [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue Paul Menage
2008-06-26  9:34 ` Vegard Nossum
2008-06-26  9:50   ` Paul Menage
2008-06-26 18:49     ` Max Krasnyansky
2008-06-26 19:19       ` Peter Zijlstra
2008-06-26 20:34       ` Paul Menage
2008-06-26 21:17         ` Paul Menage
2008-06-27  5:10           ` Max Krasnyansky
2008-06-27  5:51             ` Paul Menage
2008-06-27 17:31               ` Max Krasnyansky
2008-06-27  3:22 ` Gautham R Shenoy
2008-06-27  3:23   ` Gautham R Shenoy
2008-06-27  4:53     ` Max Krasnyansky
2008-06-27 16:42     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox