public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] sched: trivial changes
@ 2004-09-08 12:50 Nick Piggin
  2004-09-08 12:52 ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Nick Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-09-08 12:50 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Rusty Russell, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 51 bytes --]

Couple of trival changes before the hotplug stuff.

[-- Attachment #2: sched-trivial.patch --]
[-- Type: text/x-patch, Size: 2070 bytes --]



Make a definition static and slightly sanitize ifdefs.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>


---

 linux-2.6-npiggin/kernel/sched.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff -puN kernel/sched.c~sched-trivial kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-trivial	2004-09-08 21:01:53.000000000 +1000
+++ linux-2.6-npiggin/kernel/sched.c	2004-09-08 22:39:38.000000000 +1000
@@ -247,9 +247,7 @@ struct sched_group {
 
 	/*
 	 * CPU power of this group, SCHED_LOAD_SCALE being max power for a
-	 * single CPU. This should be read only (except for setup). Although
-	 * it will need to be written to at cpu hot(un)plug time, perhaps the
-	 * cpucontrol semaphore will provide enough exclusion?
+	 * single CPU. This is read only (except for setup, hotplug CPU).
 	 */
 	unsigned long cpu_power;
 };
@@ -4053,7 +4051,8 @@ static void cpu_attach_domain(struct sch
  * in arch code. That defines the number of nearby nodes in a node's top
  * level scheduling domain.
  */
-#if defined(CONFIG_NUMA) && defined(SD_NODES_PER_DOMAIN)
+#ifdef CONFIG_NUMA
+#ifdef SD_NODES_PER_DOMAIN
 /**
  * find_next_best_node - find the next node to include in a sched_domain
  * @node: node whose sched_domain we're building
@@ -4100,7 +4099,7 @@ static int __init find_next_best_node(in
  * should be one that prevents unnecessary balancing, but also spreads tasks
  * out optimally.
  */
-cpumask_t __init sched_domain_node_span(int node)
+static cpumask_t __init sched_domain_node_span(int node)
 {
 	int i;
 	cpumask_t span;
@@ -4119,12 +4118,13 @@ cpumask_t __init sched_domain_node_span(
 
 	return span;
 }
-#else /* CONFIG_NUMA && SD_NODES_PER_DOMAIN */
-cpumask_t __init sched_domain_node_span(int node)
+#else /* SD_NODES_PER_DOMAIN */
+static cpumask_t __init sched_domain_node_span(int node)
 {
 	return cpu_possible_map;
 }
-#endif /* CONFIG_NUMA && SD_NODES_PER_DOMAIN */
+#endif /* SD_NODES_PER_DOMAIN */
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_SCHED_SMT
 static DEFINE_PER_CPU(struct sched_domain, cpu_domains);

_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier
  2004-09-08 12:50 [PATCH 1/3] sched: trivial changes Nick Piggin
@ 2004-09-08 12:52 ` Nick Piggin
  2004-09-08 12:57   ` [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains Nick Piggin
  2004-09-09 10:23   ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Rusty Russell
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Piggin @ 2004-09-08 12:52 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Rusty Russell, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 27 bytes --]

2/3

Rusty, can I do this?

[-- Attachment #2: hotplug-cpu_down_prepare-notifier.patch --]
[-- Type: text/x-patch, Size: 2162 bytes --]



Add a CPU_DOWN_PREPARE hotplug CPU notifier. This is needed so we can
dettach all sched-domains before a CPU goes down, thus we can build
domains from online cpumasks, and not have to check for the possibility
of a CPU coming up or going down.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>


---

 linux-2.6-npiggin/include/linux/notifier.h |    9 +++++----
 linux-2.6-npiggin/kernel/cpu.c             |    9 +++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff -puN kernel/cpu.c~hotplug-cpu_down_prepare-notifier kernel/cpu.c
--- linux-2.6/kernel/cpu.c~hotplug-cpu_down_prepare-notifier	2004-09-08 22:39:28.000000000 +1000
+++ linux-2.6-npiggin/kernel/cpu.c	2004-09-08 22:39:28.000000000 +1000
@@ -119,6 +119,15 @@ int cpu_down(unsigned int cpu)
 		goto out;
 	}
 
+	err = notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
+						(void *)(long)cpu);
+	if (err == NOTIFY_BAD) {
+		printk("%s: attempt to take down CPU %u failed\n",
+				__FUNCTION__, cpu);
+		err = -EINVAL;
+		goto out;
+	}
+
 	/* Ensure that we are not runnable on dying cpu */
 	old_allowed = current->cpus_allowed;
 	tmp = CPU_MASK_ALL;
diff -puN include/linux/notifier.h~hotplug-cpu_down_prepare-notifier include/linux/notifier.h
--- linux-2.6/include/linux/notifier.h~hotplug-cpu_down_prepare-notifier	2004-09-08 22:39:28.000000000 +1000
+++ linux-2.6-npiggin/include/linux/notifier.h	2004-09-08 22:39:28.000000000 +1000
@@ -60,10 +60,11 @@ extern int notifier_call_chain(struct no
 
 #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
-#define CPU_ONLINE	0x0002 /* CPU (unsigned)v is up */
-#define CPU_UP_PREPARE	0x0003 /* CPU (unsigned)v coming up */
-#define CPU_UP_CANCELED	0x0004 /* CPU (unsigned)v NOT coming up */
-#define CPU_DEAD	0x0006 /* CPU (unsigned)v dead */
+#define CPU_ONLINE		0x0002 /* CPU (unsigned)v is up */
+#define CPU_UP_PREPARE		0x0003 /* CPU (unsigned)v coming up */
+#define CPU_UP_CANCELED		0x0004 /* CPU (unsigned)v NOT coming up */
+#define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
+#define CPU_DEAD		0x0006 /* CPU (unsigned)v dead */
 
 #endif /* __KERNEL__ */
 #endif /* _LINUX_NOTIFIER_H */

_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains
  2004-09-08 12:52 ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Nick Piggin
@ 2004-09-08 12:57   ` Nick Piggin
  2004-09-08 15:43     ` Nathan Lynch
  2004-09-08 15:55     ` [PATCH] fix schedstats null deref in sched_exec Nathan Lynch
  2004-09-09 10:23   ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Rusty Russell
  1 sibling, 2 replies; 9+ messages in thread
From: Nick Piggin @ 2004-09-08 12:57 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Rusty Russell, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 216 bytes --]

3/3

This builds on (and is mostly) Nathan's work. I have messed with a few
things though. It survives some cpu hotplugging on i386. Nathan, can you
give this a look? Do you agree with the general direction?

Thanks

[-- Attachment #2: sched-domains-cpu-hotplug-notifier.patch --]
[-- Type: text/x-patch, Size: 13895 bytes --]



Register a cpu hotplug notifier which reinitializes the scheduler
domains hierarchy.  The notifier temporarily attaches all running cpus
to a "dummy" domain (like we currently do during boot) to avoid
balancing.  It then calls arch_init_sched_domains which rebuilds the
"real" domains and reattaches the cpus to them.

Also change __init attributes to __devinit where necessary.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>


Alterations from Nick Piggin:
* Detach all domains in CPU_UP|DOWN_PREPARE notifiers. Reinitialise and
  reattach in CPU_ONLINE|DEAD|UP_CANCELED. This ensures the domains as
  seen from the scheduler won't become out of synch with the cpu_online_map.

* This allows us to remove runtime cpu_online verifications. Do that.

* Dummy domains are __devinitdata.

* Remove the hackery in arch_init_sched_domains to work around the fact that
  the domains used to work with cpu_possible maps, but node_to_cpumask returned
  a cpu_online map.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>


---

 linux-2.6-npiggin/kernel/sched.c |  167 ++++++++++++++++++++++-----------------
 1 files changed, 97 insertions(+), 70 deletions(-)

diff -puN kernel/sched.c~sched-domains-cpu-hotplug-notifier kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-domains-cpu-hotplug-notifier	2004-09-08 22:39:30.000000000 +1000
+++ linux-2.6-npiggin/kernel/sched.c	2004-09-08 22:39:35.000000000 +1000
@@ -1015,8 +1015,7 @@ static int wake_idle(int cpu, task_t *p)
 	if (!(sd->flags & SD_WAKE_IDLE))
 		return cpu;
 
-	cpus_and(tmp, sd->span, cpu_online_map);
-	cpus_and(tmp, tmp, p->cpus_allowed);
+	cpus_and(tmp, sd->span, p->cpus_allowed);
 
 	for_each_cpu_mask(i, tmp) {
 		if (idle_cpu(i))
@@ -1530,8 +1529,7 @@ static int find_idlest_cpu(struct task_s
 	min_cpu = UINT_MAX;
 	min_load = ULONG_MAX;
 
-	cpus_and(mask, sd->span, cpu_online_map);
-	cpus_and(mask, mask, p->cpus_allowed);
+	cpus_and(mask, sd->span, p->cpus_allowed);
 
 	for_each_cpu_mask(i, mask) {
 		load = target_load(i);
@@ -1787,7 +1785,6 @@ find_busiest_group(struct sched_domain *
 	max_load = this_load = total_load = total_pwr = 0;
 
 	do {
-		cpumask_t tmp;
 		unsigned long load;
 		int local_group;
 		int i, nr_cpus = 0;
@@ -1796,11 +1793,8 @@ find_busiest_group(struct sched_domain *
 
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
-		cpus_and(tmp, group->cpumask, cpu_online_map);
-		if (unlikely(cpus_empty(tmp)))
-			goto nextgroup;
 
-		for_each_cpu_mask(i, tmp) {
+		for_each_cpu_mask(i, group->cpumask) {
 			/* Bias balancing toward cpus of our domain */
 			if (local_group)
 				load = target_load(i);
@@ -1919,13 +1913,11 @@ out_balanced:
  */
 static runqueue_t *find_busiest_queue(struct sched_group *group)
 {
-	cpumask_t tmp;
 	unsigned long load, max_load = 0;
 	runqueue_t *busiest = NULL;
 	int i;
 
-	cpus_and(tmp, group->cpumask, cpu_online_map);
-	for_each_cpu_mask(i, tmp) {
+	for_each_cpu_mask(i, group->cpumask) {
 		load = source_load(i);
 
 		if (load > max_load) {
@@ -2122,18 +2114,13 @@ static void active_load_balance(runqueue
 
 	group = sd->groups;
 	do {
-		cpumask_t tmp;
 		runqueue_t *rq;
 		int push_cpu = 0;
 
 		if (group == busy_group)
 			goto next_group;
 
-		cpus_and(tmp, group->cpumask, cpu_online_map);
-		if (!cpus_weight(tmp))
-			goto next_group;
-
-		for_each_cpu_mask(i, tmp) {
+		for_each_cpu_mask(i, group->cpumask) {
 			if (!idle_cpu(i))
 				goto next_group;
 			push_cpu = i;
@@ -2332,7 +2319,7 @@ static inline void wake_sleeping_depende
 	 */
 	spin_unlock(&this_rq->lock);
 
-	cpus_and(sibling_map, sd->span, cpu_online_map);
+	sibling_map = sd->span;
 
 	for_each_cpu_mask(i, sibling_map)
 		spin_lock(&cpu_rq(i)->lock);
@@ -2377,7 +2364,7 @@ static inline int dependent_sleeper(int 
 	 * wake_sleeping_dependent():
 	 */
 	spin_unlock(&this_rq->lock);
-	cpus_and(sibling_map, sd->span, cpu_online_map);
+	sibling_map = sd->span;
 	for_each_cpu_mask(i, sibling_map)
 		spin_lock(&cpu_rq(i)->lock);
 	cpu_clear(this_cpu, sibling_map);
@@ -4014,7 +4001,10 @@ spinlock_t kernel_flag __cacheline_align
 EXPORT_SYMBOL(kernel_flag);
 
 #ifdef CONFIG_SMP
-/* Attach the domain 'sd' to 'cpu' as its base domain */
+/*
+ * Attach the domain 'sd' to 'cpu' as its base domain.  Callers must
+ * hold the hotplug lock.
+ */
 static void cpu_attach_domain(struct sched_domain *sd, int cpu)
 {
 	migration_req_t req;
@@ -4022,8 +4012,6 @@ static void cpu_attach_domain(struct sch
 	runqueue_t *rq = cpu_rq(cpu);
 	int local = 1;
 
-	lock_cpu_hotplug();
-
 	spin_lock_irqsave(&rq->lock, flags);
 
 	if (cpu == smp_processor_id() || !cpu_online(cpu)) {
@@ -4042,8 +4030,6 @@ static void cpu_attach_domain(struct sch
 		wake_up_process(rq->migration_thread);
 		wait_for_completion(&req.done);
 	}
-
-	unlock_cpu_hotplug();
 }
 
 /*
@@ -4063,7 +4049,7 @@ static void cpu_attach_domain(struct sch
  *
  * Should use nodemask_t.
  */
-static int __init find_next_best_node(int node, unsigned long *used_nodes)
+static int __devinit find_next_best_node(int node, unsigned long *used_nodes)
 {
 	int i, n, val, min_val, best_node = 0;
 
@@ -4099,7 +4085,7 @@ static int __init find_next_best_node(in
  * should be one that prevents unnecessary balancing, but also spreads tasks
  * out optimally.
  */
-static cpumask_t __init sched_domain_node_span(int node)
+static cpumask_t __devinit sched_domain_node_span(int node)
 {
 	int i;
 	cpumask_t span;
@@ -4119,7 +4105,7 @@ static cpumask_t __init sched_domain_nod
 	return span;
 }
 #else /* SD_NODES_PER_DOMAIN */
-static cpumask_t __init sched_domain_node_span(int node)
+static cpumask_t __devinit sched_domain_node_span(int node)
 {
 	return cpu_possible_map;
 }
@@ -4129,7 +4115,7 @@ static cpumask_t __init sched_domain_nod
 #ifdef CONFIG_SCHED_SMT
 static DEFINE_PER_CPU(struct sched_domain, cpu_domains);
 static struct sched_group sched_group_cpus[NR_CPUS];
-__init static int cpu_to_cpu_group(int cpu)
+static int __devinit cpu_to_cpu_group(int cpu)
 {
 	return cpu;
 }
@@ -4137,7 +4123,7 @@ __init static int cpu_to_cpu_group(int c
 
 static DEFINE_PER_CPU(struct sched_domain, phys_domains);
 static struct sched_group sched_group_phys[NR_CPUS];
-__init static int cpu_to_phys_group(int cpu)
+static int __devinit cpu_to_phys_group(int cpu)
 {
 #ifdef CONFIG_SCHED_SMT
 	return first_cpu(cpu_sibling_map[cpu]);
@@ -4150,7 +4136,7 @@ __init static int cpu_to_phys_group(int 
 
 static DEFINE_PER_CPU(struct sched_domain, node_domains);
 static struct sched_group sched_group_nodes[MAX_NUMNODES];
-__init static int cpu_to_node_group(int cpu)
+static int __devinit cpu_to_node_group(int cpu)
 {
 	return cpu_to_node(cpu);
 }
@@ -4160,9 +4146,9 @@ __init static int cpu_to_node_group(int 
 static struct sched_group sched_group_isolated[NR_CPUS];
 
 /* cpus with isolated domains */
-cpumask_t __initdata cpu_isolated_map = CPU_MASK_NONE;
+cpumask_t __devinitdata cpu_isolated_map = CPU_MASK_NONE;
 
-__init static int cpu_to_isolated_group(int cpu)
+static int __devinit cpu_to_isolated_group(int cpu)
 {
 	return cpu;
 }
@@ -4192,7 +4178,7 @@ __setup ("isolcpus=", isolated_cpu_setup
  * covered by the given span, and will set each group's ->cpumask correctly,
  * and ->cpu_power to 0.
  */
-__init static void init_sched_build_groups(struct sched_group groups[],
+static void __devinit init_sched_build_groups(struct sched_group groups[],
 			cpumask_t span, int (*group_fn)(int cpu))
 {
 	struct sched_group *first = NULL, *last = NULL;
@@ -4226,10 +4212,16 @@ __init static void init_sched_build_grou
 	last->next = first;
 }
 
-__init static void arch_init_sched_domains(void)
+/*
+ * Set up scheduler domains and groups.  Callers must hold the hotplug lock.
+ */
+static void __devinit arch_init_sched_domains(void)
 {
 	int i;
 	cpumask_t cpu_default_map;
+	cpumask_t cpu_isolated_online_map;
+
+	cpus_and(cpu_isolated_online_map, cpu_isolated_map, cpu_online_map);
 
 	/*
 	 * Setup mask for cpus without special case scheduling requirements.
@@ -4237,10 +4229,10 @@ __init static void arch_init_sched_domai
 	 * exclude other special cases in the future.
 	 */
 	cpus_complement(cpu_default_map, cpu_isolated_map);
-	cpus_and(cpu_default_map, cpu_default_map, cpu_possible_map);
+	cpus_and(cpu_default_map, cpu_default_map, cpu_online_map);
 
 	/* Set up domains */
-	for_each_cpu(i) {
+	for_each_online_cpu(i) {
 		int group;
 		struct sched_domain *sd = NULL, *p;
 		cpumask_t nodemask = node_to_cpumask(cpu_to_node(i));
@@ -4252,7 +4244,7 @@ __init static void arch_init_sched_domai
 		 * Unlike those of other cpus, the domains and groups are
 		 * single level, and span a single cpu.
 		 */
-		if (cpu_isset(i, cpu_isolated_map)) {
+		if (cpu_isset(i, cpu_isolated_online_map)) {
 #ifdef CONFIG_SCHED_SMT
 			sd = &per_cpu(cpu_domains, i);
 #else
@@ -4283,11 +4275,7 @@ __init static void arch_init_sched_domai
 		sd = &per_cpu(phys_domains, i);
 		group = cpu_to_phys_group(i);
 		*sd = SD_CPU_INIT;
-#ifdef CONFIG_NUMA
 		sd->span = nodemask;
-#else
-		sd->span = cpu_possible_map;
-#endif
 		sd->parent = p;
 		sd->groups = &sched_group_phys[group];
 
@@ -4305,7 +4293,7 @@ __init static void arch_init_sched_domai
 
 #ifdef CONFIG_SCHED_SMT
 	/* Set up CPU (sibling) groups */
-	for_each_cpu(i) {
+	for_each_online_cpu(i) {
 		cpumask_t this_sibling_map = cpu_sibling_map[i];
 		cpus_and(this_sibling_map, this_sibling_map, cpu_default_map);
 		if (i != first_cpu(this_sibling_map))
@@ -4317,15 +4305,12 @@ __init static void arch_init_sched_domai
 #endif
 
 	/* Set up isolated groups */
-	for_each_cpu_mask(i, cpu_isolated_map) {
-		cpumask_t mask;
-		cpus_clear(mask);
-		cpu_set(i, mask);
+	for_each_cpu_mask(i, cpu_isolated_online_map) {
+		cpumask_t mask = cpumask_of_cpu(i);
 		init_sched_build_groups(sched_group_isolated, mask,
 						&cpu_to_isolated_group);
 	}
 
-#ifdef CONFIG_NUMA
 	/* Set up physical groups */
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		cpumask_t nodemask = node_to_cpumask(i);
@@ -4337,10 +4322,6 @@ __init static void arch_init_sched_domai
 		init_sched_build_groups(sched_group_phys, nodemask,
 						&cpu_to_phys_group);
 	}
-#else
-	init_sched_build_groups(sched_group_phys, cpu_possible_map,
-							&cpu_to_phys_group);
-#endif
 
 #ifdef CONFIG_NUMA
 	/* Set up node groups */
@@ -4373,7 +4354,7 @@ __init static void arch_init_sched_domai
 	}
 
 	/* Attach the domains */
-	for_each_cpu(i) {
+	for_each_online_cpu(i) {
 		struct sched_domain *sd;
 #ifdef CONFIG_SCHED_SMT
 		sd = &per_cpu(cpu_domains, i);
@@ -4390,15 +4371,14 @@ void sched_domain_debug(void)
 {
 	int i;
 
-	for_each_cpu(i) {
+	for_each_online_cpu(i) {
 		runqueue_t *rq = cpu_rq(i);
 		struct sched_domain *sd;
 		int level = 0;
 
 		sd = rq->sd;
 
-		printk(KERN_DEBUG "CPU%d: %s\n",
-				i, (cpu_online(i) ? " online" : "offline"));
+		printk(KERN_DEBUG "CPU%d:\n", i);
 
 		do {
 			int j;
@@ -4464,10 +4444,59 @@ void sched_domain_debug(void)
 #define sched_domain_debug() {}
 #endif
 
+#ifdef CONFIG_SMP
+/* Initial dummy domain for early boot and for hotplug cpu */
+static __devinitdata struct sched_domain sched_domain_dummy;
+static __devinitdata struct sched_group sched_group_dummy;
+#endif
+
+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Force a reinitialization of the sched domains hierarchy.  The domains
+ * and groups cannot be updated in place without racing with the balancing
+ * code, so we temporarily attach all running cpus to a "dummy" domain
+ * which will prevent rebalancing while the sched domains are recalculated.
+ */
+static int update_sched_domains(struct notifier_block *nfb,
+				unsigned long action, void *hcpu)
+{
+	int i;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_DOWN_PREPARE:
+		for_each_online_cpu(i)
+			cpu_attach_domain(&sched_domain_dummy, i);
+		return NOTIFY_OK;
+
+	case CPU_UP_CANCELED:
+	case CPU_ONLINE:
+	case CPU_DEAD:
+		/*
+		 * Fall through and re-initialise the domains.
+		 */
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	/* The hotplug lock is already held by cpu_up/cpu_down */
+	arch_init_sched_domains();
+
+	sched_domain_debug();
+
+	return NOTIFY_OK;
+}
+#endif
+
 void __init sched_init_smp(void)
 {
+	lock_cpu_hotplug();
 	arch_init_sched_domains();
 	sched_domain_debug();
+	unlock_cpu_hotplug();
+	/* XXX: Theoretical race here - CPU may be hotplugged now */
+	hotcpu_notifier(update_sched_domains, 0);
 }
 #else
 void __init sched_init_smp(void)
@@ -4490,20 +4519,18 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SMP
 	/* Set up an initial dummy domain for early boot */
-	static struct sched_domain sched_domain_init;
-	static struct sched_group sched_group_init;
 
-	memset(&sched_domain_init, 0, sizeof(struct sched_domain));
-	sched_domain_init.span = CPU_MASK_ALL;
-	sched_domain_init.groups = &sched_group_init;
-	sched_domain_init.last_balance = jiffies;
-	sched_domain_init.balance_interval = INT_MAX; /* Don't balance */
-	sched_domain_init.busy_factor = 1;
-
-	memset(&sched_group_init, 0, sizeof(struct sched_group));
-	sched_group_init.cpumask = CPU_MASK_ALL;
-	sched_group_init.next = &sched_group_init;
-	sched_group_init.cpu_power = SCHED_LOAD_SCALE;
+	memset(&sched_domain_dummy, 0, sizeof(struct sched_domain));
+	sched_domain_dummy.span = CPU_MASK_ALL;
+	sched_domain_dummy.groups = &sched_group_dummy;
+	sched_domain_dummy.last_balance = jiffies;
+	sched_domain_dummy.balance_interval = INT_MAX; /* Don't balance */
+	sched_domain_dummy.busy_factor = 1;
+
+	memset(&sched_group_dummy, 0, sizeof(struct sched_group));
+	sched_group_dummy.cpumask = CPU_MASK_ALL;
+	sched_group_dummy.next = &sched_group_dummy;
+	sched_group_dummy.cpu_power = SCHED_LOAD_SCALE;
 #endif
 
 	for (i = 0; i < NR_CPUS; i++) {
@@ -4515,7 +4542,7 @@ void __init sched_init(void)
 		rq->expired = rq->arrays + 1;
 
 #ifdef CONFIG_SMP
-		rq->sd = &sched_domain_init;
+		rq->sd = &sched_domain_dummy;
 		rq->cpu_load = 0;
 		rq->active_balance = 0;
 		rq->push_cpu = 0;

_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains
  2004-09-08 12:57   ` [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains Nick Piggin
@ 2004-09-08 15:43     ` Nathan Lynch
  2004-09-08 15:55     ` [PATCH] fix schedstats null deref in sched_exec Nathan Lynch
  1 sibling, 0 replies; 9+ messages in thread
From: Nathan Lynch @ 2004-09-08 15:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rusty Russell, linux-kernel

On Wed, 2004-09-08 at 07:57, Nick Piggin wrote:
> 3/3
> 
> This builds on (and is mostly) Nathan's work. I have messed with a few
> things though. It survives some cpu hotplugging on i386. Nathan, can you
> give this a look? Do you agree with the general direction?

Yep, looks good to me, and I gave it a sniff test on ppc64.  Thanks.

Nathan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] fix schedstats null deref in sched_exec
  2004-09-08 12:57   ` [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains Nick Piggin
  2004-09-08 15:43     ` Nathan Lynch
@ 2004-09-08 15:55     ` Nathan Lynch
  2004-09-08 23:45       ` Nick Piggin
  1 sibling, 1 reply; 9+ messages in thread
From: Nathan Lynch @ 2004-09-08 15:55 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rusty Russell, linux-kernel, Andrew Morton

Oops, I meant to send this one with my original patchset.

In sched_exec, schedstat_inc will dereference a null pointer if no
domain is found with the SD_BALANCE_EXEC flag set.  This was exposed
during testing of the previous patches where cpus are temporarily
attached to a dummy domain without SD_BALANCE_EXEC set.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>


---


diff -puN kernel/sched.c~sched_exec-schedstats-fix kernel/sched.c
--- 2.6.9-rc1-bk12/kernel/sched.c~sched_exec-schedstats-fix	2004-09-06 02:14:05.000000000 -0500
+++ 2.6.9-rc1-bk12-nathanl/kernel/sched.c	2004-09-06 19:00:12.000000000 -0500
@@ -1727,8 +1727,8 @@ void sched_exec(void)
 		if (tmp->flags & SD_BALANCE_EXEC)
 			sd = tmp;
 
-	schedstat_inc(sd, sbe_attempts);
 	if (sd) {
+		schedstat_inc(sd, sbe_attempts);
 		new_cpu = find_idlest_cpu(current, this_cpu, sd);
 		if (new_cpu != this_cpu) {
 			schedstat_inc(sd, sbe_pushed);

_



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix schedstats null deref in sched_exec
  2004-09-08 15:55     ` [PATCH] fix schedstats null deref in sched_exec Nathan Lynch
@ 2004-09-08 23:45       ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2004-09-08 23:45 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Rusty Russell, linux-kernel, Andrew Morton

Nathan Lynch wrote:
> Oops, I meant to send this one with my original patchset.
> 
> In sched_exec, schedstat_inc will dereference a null pointer if no
> domain is found with the SD_BALANCE_EXEC flag set.  This was exposed
> during testing of the previous patches where cpus are temporarily
> attached to a dummy domain without SD_BALANCE_EXEC set.
> 
> Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
> 

Thanks. I'll get everything together and send it to Andrew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier
  2004-09-08 12:52 ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Nick Piggin
  2004-09-08 12:57   ` [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains Nick Piggin
@ 2004-09-09 10:23   ` Rusty Russell
  2004-09-09 10:24     ` Nick Piggin
  1 sibling, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2004-09-09 10:23 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Nathan Lynch, linux-kernel

On Wed, 2004-09-08 at 22:52, Nick Piggin wrote:
> 2/3
> 
> Rusty, can I do this?
> 
> ______________________________________________________________________
> Add a CPU_DOWN_PREPARE hotplug CPU notifier. This is needed so we can
> dettach all sched-domains before a CPU goes down, thus we can build
> domains from online cpumasks, and not have to check for the possibility
> of a CPU coming up or going down.

And if taking the CPU down fails?  If you need this, you need the
CPU_DOWN_FAILED as well, unfortunately.  Hence I prefer the "do the
domain thing while machine is frozen" and sidestep it entirely.

Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier
  2004-09-09 10:23   ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Rusty Russell
@ 2004-09-09 10:24     ` Nick Piggin
  2004-09-09 11:00       ` Nick Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-09-09 10:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Nathan Lynch, linux-kernel

Rusty Russell wrote:
> On Wed, 2004-09-08 at 22:52, Nick Piggin wrote:
> 
>>2/3
>>
>>Rusty, can I do this?
>>
>>______________________________________________________________________
>>Add a CPU_DOWN_PREPARE hotplug CPU notifier. This is needed so we can
>>dettach all sched-domains before a CPU goes down, thus we can build
>>domains from online cpumasks, and not have to check for the possibility
>>of a CPU coming up or going down.
> 
> 
> And if taking the CPU down fails?  If you need this, you need the
> CPU_DOWN_FAILED as well, unfortunately.  Hence I prefer the "do the
> domain thing while machine is frozen" and sidestep it entirely.
> 

Really? It doesn't need to be run from the stop_machine_run
context at all - it can happily be done while the system is
running.

That said, if you really object to CPU_DOWN_PREPARE and CPU_DOWN_FAILED,
it probably shouldn't be too much work. Should it make the call from
take_cpu_down?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier
  2004-09-09 10:24     ` Nick Piggin
@ 2004-09-09 11:00       ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2004-09-09 11:00 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Nathan Lynch, linux-kernel

Nick Piggin wrote:
> Rusty Russell wrote:
> 
>> On Wed, 2004-09-08 at 22:52, Nick Piggin wrote:
>>
>>> 2/3
>>>
>>> Rusty, can I do this?
>>>
>>> ______________________________________________________________________
>>> Add a CPU_DOWN_PREPARE hotplug CPU notifier. This is needed so we can
>>> dettach all sched-domains before a CPU goes down, thus we can build
>>> domains from online cpumasks, and not have to check for the possibility
>>> of a CPU coming up or going down.
>>
>>
>>
>> And if taking the CPU down fails?  If you need this, you need the
>> CPU_DOWN_FAILED as well, unfortunately.  Hence I prefer the "do the
>> domain thing while machine is frozen" and sidestep it entirely.
>>
> 
> Really? It doesn't need to be run from the stop_machine_run
> context at all - it can happily be done while the system is
> running.
> 
> That said, if you really object to CPU_DOWN_PREPARE and CPU_DOWN_FAILED,
> it probably shouldn't be too much work. Should it make the call from
> take_cpu_down?
> 

The other thing is, it is actually a lot nicer to have this done while
the machine is running, because then cpu_attach_domain guarantees a
quiescent state, so the attach can be done completely atomically from
the point of view of the rest of the scheduler.

So you can always rely on domains and cpu_online_map staying in synch.
The alternative is more fastpath checking of the cpu_online_map, and
possibly more hotplug locks.

In short, I'd really like to have it done this way.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-09-09 11:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-08 12:50 [PATCH 1/3] sched: trivial changes Nick Piggin
2004-09-08 12:52 ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Nick Piggin
2004-09-08 12:57   ` [PATCH 3/3] sched: cpu hotplug notifier for updating sched domains Nick Piggin
2004-09-08 15:43     ` Nathan Lynch
2004-09-08 15:55     ` [PATCH] fix schedstats null deref in sched_exec Nathan Lynch
2004-09-08 23:45       ` Nick Piggin
2004-09-09 10:23   ` [PATCH 2/3] cpu: add a CPU_DOWN_PREPARE notifier Rusty Russell
2004-09-09 10:24     ` Nick Piggin
2004-09-09 11:00       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox