From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Paul Jackson <pj@sgi.com>
Cc: Andrew Morton <akpm@osdl.org>, Martin Bligh <mbligh@google.com>,
Paul Menage <menage@google.com>,
Simon.Derr@bull.net, linux-kernel@vger.kernel.org,
Dinakar Guniguntala <dino@in.ibm.com>,
Rohit Seth <rohitseth@google.com>, Robin Holt <holt@sgi.com>,
dipankar@in.ibm.com, "Siddha,
Suresh B" <suresh.b.siddha@intel.com>
Subject: Re: [RFC] cpuset: remove sched domain hooks from cpusets
Date: Thu, 19 Oct 2006 20:24:59 +1000 [thread overview]
Message-ID: <4537527B.5050401@yahoo.com.au> (raw)
In-Reply-To: <20061019092358.17547.51425.sendpatchset@sam.engr.sgi.com>
[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]
Paul Jackson wrote:
> From: Paul Jackson <pj@sgi.com>
>
> Remove the cpuset hooks that defined sched domains depending on the
> setting of the 'cpu_exclusive' flag.
Before we chuck the baby out...
> The cpu_exclusive flag can only be set on a child if it is set on
> the parent.
>
> This made that flag painfully unsuitable for use as a flag defining
> a partitioning of a system.
Sigh, it isn't. That's simply how cpusets tried to use it.
> It was entirely unobvious to a cpuset user what partitioning of sched
> domains they would be causing when they set that one cpu_exclusive bit
> on one cpuset, because it depended on what CPUs were in the remainder
> of that cpusets siblings and child cpusets, after subtracting out
> other cpu_exclusive cpusets.
As far as a user is concerned, the cpusets is the interface. domain
partitioning is an implementation detail that just happens to make
it work better in some cases.
> Furthermore, there was no way on production systems to query the
> result.
You shouldn't need to, assuming cpusets doesn't mess it up.
Here is an untested patch. Apparently takes care of CPU hotplug too.
--
SUSE Labs, Novell Inc.
[-- Attachment #2: sched-domains-cpusets-fixes.patch --]
[-- Type: text/plain, Size: 8467 bytes --]
Fix sched-domains partitioning by cpusets. Walk the whole cpusets tree after
something interesting changes, and recreate all partitions.
Index: linux-2.6/kernel/cpuset.c
===================================================================
--- linux-2.6.orig/kernel/cpuset.c 2006-10-19 19:26:54.000000000 +1000
+++ linux-2.6/kernel/cpuset.c 2006-10-19 20:21:29.000000000 +1000
@@ -751,6 +751,24 @@ static int validate_change(const struct
return 0;
}
+static void update_cpu_domains_children(struct cpuset *par,
+ cpumask_t *non_partitioned)
+{
+ struct cpuset *c;
+
+ list_for_each_entry(c, &par->children, sibling) {
+ if (cpus_empty(c->cpus_allowed))
+ continue;
+ if (is_cpu_exclusive(c)) {
+ if (!partition_sched_domains(&c->cpus_allowed)) {
+ cpus_andnot(*non_partitioned,
+ *non_partitioned, c->cpus_allowed);
+ }
+ } else
+ update_cpu_domains_children(c, non_partitioned);
+ }
+}
+
/*
* For a given cpuset cur, partition the system as follows
* a. All cpus in the parent cpuset's cpus_allowed that are not part of any
@@ -760,53 +778,38 @@ static int validate_change(const struct
* Build these two partitions by calling partition_sched_domains
*
* Call with manage_mutex held. May nest a call to the
- * lock_cpu_hotplug()/unlock_cpu_hotplug() pair.
- * Must not be called holding callback_mutex, because we must
- * not call lock_cpu_hotplug() while holding callback_mutex.
+ * lock_cpu_hotplug()/unlock_cpu_hotplug() pair. Must not be called holding
+ * callback_mutex, because we must not call lock_cpu_hotplug() while holding
+ * callback_mutex.
*/
-static void update_cpu_domains(struct cpuset *cur)
+static void update_cpu_domains(void)
{
- struct cpuset *c, *par = cur->parent;
- cpumask_t pspan, cspan;
+ cpumask_t non_partitioned;
- if (par == NULL || cpus_empty(cur->cpus_allowed))
- return;
-
- /*
- * Get all cpus from parent's cpus_allowed not part of exclusive
- * children
- */
- pspan = par->cpus_allowed;
- list_for_each_entry(c, &par->children, sibling) {
- if (is_cpu_exclusive(c))
- cpus_andnot(pspan, pspan, c->cpus_allowed);
- }
- if (!is_cpu_exclusive(cur)) {
- cpus_or(pspan, pspan, cur->cpus_allowed);
- if (cpus_equal(pspan, cur->cpus_allowed))
- return;
- cspan = CPU_MASK_NONE;
- } else {
- if (cpus_empty(pspan))
- return;
- cspan = cur->cpus_allowed;
- /*
- * Get all cpus from current cpuset's cpus_allowed not part
- * of exclusive children
- */
- list_for_each_entry(c, &cur->children, sibling) {
- if (is_cpu_exclusive(c))
- cpus_andnot(cspan, cspan, c->cpus_allowed);
- }
- }
+ BUG_ON(!mutex_is_locked(&manage_mutex));
lock_cpu_hotplug();
- partition_sched_domains(&pspan, &cspan);
+ non_partitioned = top_cpuset.cpus_allowed;
+ update_cpu_domains_children(&top_cpuset, &non_partitioned);
+ partition_sched_domains(&non_partitioned);
unlock_cpu_hotplug();
}
/*
+ * Same as above except called with lock_cpu_hotplug and without manage_mutex.
+ */
+
+int cpuset_hotplug_update_sched_domains(void)
+{
+ cpumask_t non_partitioned;
+
+ non_partitioned = top_cpuset.cpus_allowed;
+ update_cpu_domains_children(&top_cpuset, &non_partitioned);
+ return partition_sched_domains(&non_partitioned);
+}
+
+/*
* Call with manage_mutex held. May take callback_mutex during call.
*/
@@ -833,8 +836,8 @@ static int update_cpumask(struct cpuset
mutex_lock(&callback_mutex);
cs->cpus_allowed = trialcs.cpus_allowed;
mutex_unlock(&callback_mutex);
- if (is_cpu_exclusive(cs) && !cpus_unchanged)
- update_cpu_domains(cs);
+ if (!cpus_unchanged)
+ update_cpu_domains();
return 0;
}
@@ -1067,7 +1070,7 @@ static int update_flag(cpuset_flagbits_t
mutex_unlock(&callback_mutex);
if (cpu_exclusive_changed)
- update_cpu_domains(cs);
+ update_cpu_domains();
return 0;
}
@@ -1931,19 +1934,9 @@ static int cpuset_mkdir(struct inode *di
return cpuset_create(c_parent, dentry->d_name.name, mode | S_IFDIR);
}
-/*
- * Locking note on the strange update_flag() call below:
- *
- * If the cpuset being removed is marked cpu_exclusive, then simulate
- * turning cpu_exclusive off, which will call update_cpu_domains().
- * The lock_cpu_hotplug() call in update_cpu_domains() must not be
- * made while holding callback_mutex. Elsewhere the kernel nests
- * callback_mutex inside lock_cpu_hotplug() calls. So the reverse
- * nesting would risk an ABBA deadlock.
- */
-
static int cpuset_rmdir(struct inode *unused_dir, struct dentry *dentry)
{
+ int is_exclusive;
struct cpuset *cs = dentry->d_fsdata;
struct dentry *d;
struct cpuset *parent;
@@ -1961,13 +1954,8 @@ static int cpuset_rmdir(struct inode *un
mutex_unlock(&manage_mutex);
return -EBUSY;
}
- if (is_cpu_exclusive(cs)) {
- int retval = update_flag(CS_CPU_EXCLUSIVE, cs, "0");
- if (retval < 0) {
- mutex_unlock(&manage_mutex);
- return retval;
- }
- }
+ is_exclusive = is_cpu_exclusive(cs);
+
parent = cs->parent;
mutex_lock(&callback_mutex);
set_bit(CS_REMOVED, &cs->flags);
@@ -1982,8 +1970,13 @@ static int cpuset_rmdir(struct inode *un
mutex_unlock(&callback_mutex);
if (list_empty(&parent->children))
check_for_release(parent, &pathbuf);
+
+ if (is_exclusive)
+ update_cpu_domains();
+
mutex_unlock(&manage_mutex);
cpuset_release_agent(pathbuf);
+
return 0;
}
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c 2006-10-19 19:24:48.000000000 +1000
+++ linux-2.6/kernel/sched.c 2006-10-19 20:21:50.000000000 +1000
@@ -6586,6 +6586,9 @@ error:
*/
static int arch_init_sched_domains(const cpumask_t *cpu_map)
{
+#ifdef CONFIG_CPUSETS
+ return cpuset_hotplug_update_sched_domains();
+#else
cpumask_t cpu_default_map;
int err;
@@ -6599,6 +6602,7 @@ static int arch_init_sched_domains(const
err = build_sched_domains(&cpu_default_map);
return err;
+#endif
}
static void arch_destroy_sched_domains(const cpumask_t *cpu_map)
@@ -6622,29 +6626,26 @@ static void detach_destroy_domains(const
/*
* Partition sched domains as specified by the cpumasks below.
- * This attaches all cpus from the cpumasks to the NULL domain,
+ * This attaches all cpus from the partition to the NULL domain,
* waits for a RCU quiescent period, recalculates sched
- * domain information and then attaches them back to the
- * correct sched domains
- * Call with hotplug lock held
+ * domain information and then attaches them back to their own
+ * isolated partition.
+ *
+ * Called with hotplug lock held
+ *
+ * Returns 0 on success.
*/
-int partition_sched_domains(cpumask_t *partition1, cpumask_t *partition2)
+int partition_sched_domains(cpumask_t *partition)
{
+ cpumask_t non_isolated_cpus;
cpumask_t change_map;
- int err = 0;
- cpus_and(*partition1, *partition1, cpu_online_map);
- cpus_and(*partition2, *partition2, cpu_online_map);
- cpus_or(change_map, *partition1, *partition2);
+ cpus_andnot(non_isolated_cpus, cpu_online_map, cpu_isolated_map);
+ cpus_and(change_map, *partition, non_isolated_cpus);
/* Detach sched domains from all of the affected cpus */
detach_destroy_domains(&change_map);
- if (!cpus_empty(*partition1))
- err = build_sched_domains(partition1);
- if (!err && !cpus_empty(*partition2))
- err = build_sched_domains(partition2);
-
- return err;
+ return build_sched_domains(&change_map);
}
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h 2006-10-19 20:02:24.000000000 +1000
+++ linux-2.6/include/linux/sched.h 2006-10-19 20:02:30.000000000 +1000
@@ -707,8 +707,7 @@ struct sched_domain {
#endif
};
-extern int partition_sched_domains(cpumask_t *partition1,
- cpumask_t *partition2);
+extern int partition_sched_domains(cpumask_t *partition);
/*
* Maximum cache size the migration-costs auto-tuning code will
Index: linux-2.6/include/linux/cpuset.h
===================================================================
--- linux-2.6.orig/include/linux/cpuset.h 2006-10-19 20:07:24.000000000 +1000
+++ linux-2.6/include/linux/cpuset.h 2006-10-19 20:21:08.000000000 +1000
@@ -14,6 +14,8 @@
#ifdef CONFIG_CPUSETS
+extern int cpuset_hotplug_update_sched_domains(void);
+
extern int number_of_cpusets; /* How many cpusets are defined in system? */
extern int cpuset_init_early(void);
next prev parent reply other threads:[~2006-10-19 10:25 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-19 9:23 [RFC] cpuset: remove sched domain hooks from cpusets Paul Jackson
2006-10-19 10:24 ` Nick Piggin [this message]
2006-10-19 19:03 ` Paul Jackson
2006-10-19 19:21 ` Nick Piggin
2006-10-19 19:50 ` Martin Bligh
2006-10-20 0:14 ` Paul Jackson
2006-10-20 16:03 ` Nick Piggin
2006-10-20 17:29 ` Siddha, Suresh B
2006-10-20 19:19 ` Paul Jackson
2006-10-20 19:00 ` Paul Jackson
2006-10-20 20:30 ` Dinakar Guniguntala
2006-10-20 21:41 ` Paul Jackson
2006-10-20 22:35 ` Dinakar Guniguntala
2006-10-20 23:14 ` Siddha, Suresh B
2006-10-21 5:37 ` Paul Jackson
2006-10-23 4:31 ` Siddha, Suresh B
2006-10-23 5:59 ` Paul Jackson
2006-10-21 23:05 ` Paul Jackson
2006-10-22 12:02 ` Paul Jackson
2006-10-23 3:09 ` Paul Jackson
2006-10-20 21:46 ` Paul Jackson
2006-10-21 18:23 ` Paul Menage
2006-10-21 20:55 ` Paul Jackson
2006-10-21 20:59 ` Paul Menage
2006-10-22 10:51 ` Paul Jackson
2006-10-23 5:26 ` Siddha, Suresh B
2006-10-23 5:54 ` Paul Jackson
2006-10-23 5:43 ` Siddha, Suresh B
2006-10-23 6:02 ` Nick Piggin
2006-10-23 6:16 ` Paul Jackson
2006-10-23 16:03 ` Christoph Lameter
2006-11-09 10:59 ` Paul Jackson
2006-10-23 16:01 ` Christoph Lameter
-- strict thread matches above, loose matches on Subject: below --
2006-10-30 21:26 [RFC] cpuset: Remove " Dinakar Guniguntala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4537527B.5050401@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=Simon.Derr@bull.net \
--cc=akpm@osdl.org \
--cc=dino@in.ibm.com \
--cc=dipankar@in.ibm.com \
--cc=holt@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=rohitseth@google.com \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox