public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-08-24 22:18 Cliff Wickman
  2007-08-24 22:54 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Cliff Wickman @ 2007-08-24 22:18 UTC (permalink / raw)
  To: akpm, ego, mingo, vatsa, oleg, pj; +Cc: linux-kernel


When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system
to have its tasks confined to a cpuset so as to cluster them near the
memory that they share. Furthermore, it is typical to explicitly place such
a task on a specific cpu in that cpuset.  And in that case the task's
cpus_allowed includes only a single cpu.

This patch would insert a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed_lock(), a new variant of cpuset_cpus_allowed() that
will not block.
Calls are made to cpuset_lock() and cpuset_unlock() in migration_call()
to set the cpuset mutex during the whole migrate_live_tasks() and
migrate_dead_tasks() procedure.

This patch depends on 2 patches from Oleg Nesterov:
  [PATCH 1/2] do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)
  [PATCH 2/2] migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()

Diffed against 2.6.23-rc3

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 include/linux/cpuset.h |    5 +++++
 kernel/cpuset.c        |   19 +++++++++++++++++++
 kernel/sched.c         |   14 ++++++++++++++
 3 files changed, 38 insertions(+)

Index: linus.070821/kernel/sched.c
===================================================================
--- linus.070821.orig/kernel/sched.c
+++ linus.070821/kernel/sched.c
@@ -61,6 +61,7 @@
 #include <linux/delayacct.h>
 #include <linux/reciprocal_div.h>
 #include <linux/unistd.h>
+#include <linux/cpuset.h>
 
 #include <asm/tlb.h>
 
@@ -5091,6 +5092,17 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		/*
+		 * The cpuset_cpus_allowed_lock() variant of
+		 * cpuset_cpus_allowed() will not block
+		 * It must be called within calls to cpuset_lock/cpuset_unlock.
+		 */
+		p->cpus_allowed = cpuset_cpus_allowed_lock(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);
@@ -5412,6 +5424,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
+		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
 		kthread_stop(rq->migration_thread);
@@ -5425,6 +5438,7 @@ migration_call(struct notifier_block *nf
 		rq->idle->sched_class = &idle_sched_class;
 		migrate_dead_tasks(cpu);
 		spin_unlock_irq(&rq->lock);
+		cpuset_unlock();
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 
Index: linus.070821/include/linux/cpuset.h
===================================================================
--- linus.070821.orig/include/linux/cpuset.h
+++ linus.070821/include/linux/cpuset.h
@@ -22,6 +22,7 @@ extern void cpuset_init_smp(void);
 extern void cpuset_fork(struct task_struct *p);
 extern void cpuset_exit(struct task_struct *p);
 extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
+extern cpumask_t cpuset_cpus_allowed_lock(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -87,6 +88,10 @@ static inline cpumask_t cpuset_cpus_allo
 {
 	return cpu_possible_map;
 }
+static inline cpumask_t cpuset_cpus_allowed_lock(struct task_struct *p)
+{
+	return cpu_possible_map;
+}
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
 {
Index: linus.070821/kernel/cpuset.c
===================================================================
--- linus.070821.orig/kernel/cpuset.c
+++ linus.070821/kernel/cpuset.c
@@ -2333,6 +2333,25 @@ cpumask_t cpuset_cpus_allowed(struct tas
 	return mask;
 }
 
+/**
+ * cpuset_cpus_allowed_lock - return cpus_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
+ *
+ * Description: Same as cpuset_cpus_allowed, but called with callback_mutex
+ * already held.
+ **/
+
+cpumask_t cpuset_cpus_allowed_lock(struct task_struct *tsk)
+{
+	cpumask_t mask;
+
+	task_lock(tsk);
+	guarantee_online_cpus(tsk->cpuset, &mask);
+	task_unlock(tsk);
+
+	return mask;
+}
+
 void cpuset_init_current_mems_allowed(void)
 {
 	current->mems_allowed = NODE_MASK_ALL;
-- 
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-05-23 21:29 Oleg Nesterov
  2007-05-23 22:56 ` Cliff Wickman
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-05-23 21:29 UTC (permalink / raw)
  To: cpw; +Cc: Andrew Morton, Ingo Molnar, linux-kernel

Cliff Wickman wrote:
>
> In order to do this, move_task_off_dead_cpu() must make a call to
> cpuset_cpus_allowed(), which may block.
>
> move_task_off_dead_cpu() has been within a critical region when called
> from migrate_live_tasks().  So this patch also changes migrate_live_tasks()
> to enable interrupts before calling move_task_off_dead_cpu().
> Since the tasklist_lock is dropped, the list scan must be restarted from
> the top.
>
> [... snip ...]
>
> - * NOTE: interrupts should be disabled by the caller
> + * NOTE: interrupts are not disabled by the caller
>   */
>  static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
>  {
> @@ -5008,6 +5008,17 @@ restart:
>  	if (dest_cpu == NR_CPUS)
>  		dest_cpu = any_online_cpu(p->cpus_allowed);
>
> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		/*
> +		 * Call to cpuset_cpus_allowed may sleep, so we depend
> +		 * on move_task_off_dead_cpu() being called in a non-critical
> +		 * region.
> +		 */
> +		p->cpus_allowed = cpuset_cpus_allowed(p);
> +		dest_cpu = any_online_cpu(p->cpus_allowed);
> +	}

I know nothing about cpuset.c, a _very_ naive question.

Do we really need task_lock() (used by cpuset_cpus_allowed) here ?

If not, probably we can make this simpler. CPU_DEAD takes cpuset_lock(),
move_task_off_dead_cpu() uses guarantee_online_cpus() which doesn't sleep,
so we don't need other changes.

Possible?

If not, this patch should also change migrate_dead(), it still calls
move_task_off_dead_cpu() with irqs disabled, no?

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-05-21 20:08 Cliff Wickman
  2007-05-23 17:43 ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Cliff Wickman @ 2007-05-21 20:08 UTC (permalink / raw)
  To: linux-kernel



(this is a third submission -- corrects a locking/blocking issue pointed
 out by Nathan Lynch)

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system
to have its tasks confined to a cpuset so as to cluster them near the
memory that they share. Furthermore, it is typical to explicitly place such
a task on a specific cpu in that cpuset.  And in that case the task's
cpus_allowed includes only a single cpu.

This patch inserts a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed(), which may block.

move_task_off_dead_cpu() has been within a critical region when called
from migrate_live_tasks().  So this patch also changes migrate_live_tasks()
to enable interrupts before calling move_task_off_dead_cpu().
Since the tasklist_lock is dropped, the list scan must be restarted from
the top.
It locks the migrating task by bumping its usage count.
It disables interrupts in move_task_off_dead_cpu() before the
 call to __migrate_task().

This is the outline of the locking surrounding calls to
move_task_off_dead_cpu(), after applying this patch:

  migration_call()
  | case CPU_DEAD
  |   migrate_live_tasks(cpu)
  |   | recheck:
  |   | write_lock_irq(&tasklist_lock)
  |   | do_each_thread(t, p) {
  |   |         if (task_cpu(p) == src_cpu)
  |   |                 get_task_struct(p)
  |   |                 write_unlock_irq(&tasklist_lock)
  |   |                 move_task_off_dead_cpu(src_cpu, p) <<<< noncritical
  |   |                 put_task_struct(p);
  |   |                 goto recheck
  |   | } while_each_thread(t, p)
  |   | write_unlock_irq(&tasklist_lock)
  |
  |   rq = task_rq_lock(rq->idle, &flags)
  |
  |   migrate_dead_tasks(cpu)
  |   | for (arr = 0; arr < 2; arr++) {
  |   |   for (i = 0; i < MAX_PRIO; i++) {
  |   |     while (!list_empty(list))
  |   |       migrate_dead(dead_cpu
  |   |         get_task_struct(p)
  |   |         spin_unlock_irq(&rq->lock)
  |   |         move_task_off_dead_cpu(dead_cpu, p)        <<<< noncritcal
  |   |         spin_lock_irq(&rq->lock)
  |   |         put_task_struct(p)
  |
  |   task_rq_unlock(rq, &flags)

[Side note: a task may be migrated off of its cpuset, but is still attached to
 that cpuset (by pointer and reference count).  The cpuset will not be
 released.  This patch does not change that.]

Diffed against 2.6.21

Signed-off-by: Cliff Wickman <cpw@sgi.com>

 kernel/sched.c |   31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

Index: linus.070504/kernel/sched.c
===================================================================
--- linus.070504.orig/kernel/sched.c
+++ linus.070504/kernel/sched.c
@@ -4989,7 +4989,7 @@ wait_to_die:
 #ifdef CONFIG_HOTPLUG_CPU
 /*
  * Figure out where task on dead CPU should go, use force if neccessary.
- * NOTE: interrupts should be disabled by the caller
+ * NOTE: interrupts are not disabled by the caller
  */
 static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
 {
@@ -5008,6 +5008,17 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		/*
+		 * Call to cpuset_cpus_allowed may sleep, so we depend
+		 * on move_task_off_dead_cpu() being called in a non-critical
+		 * region.
+		 */
+		p->cpus_allowed = cpuset_cpus_allowed(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);
@@ -5025,8 +5036,16 @@ restart:
 			       "longer affine to cpu%d\n",
 			       p->pid, p->comm, dead_cpu);
 	}
-	if (!__migrate_task(p, dead_cpu, dest_cpu))
+	/*
+	 * __migrate_task() requires interrupts to be disabled
+	 */
+	local_irq_disable();
+	if (!__migrate_task(p, dead_cpu, dest_cpu)) {
+		local_irq_enable();
 		goto restart;
+	}
+	local_irq_enable();
+	return;
 }
 
 /*
@@ -5054,14 +5073,20 @@ static void migrate_live_tasks(int src_c
 {
 	struct task_struct *p, *t;
 
+restartlist:
 	write_lock_irq(&tasklist_lock);
 
 	do_each_thread(t, p) {
 		if (p == current)
 			continue;
 
-		if (task_cpu(p) == src_cpu)
+		if (task_cpu(p) == src_cpu) {
+			get_task_struct(p);
+			write_unlock_irq(&tasklist_lock);
 			move_task_off_dead_cpu(src_cpu, p);
+			put_task_struct(p);
+			goto restartlist;
+		}
 	} while_each_thread(t, p);
 
 	write_unlock_irq(&tasklist_lock);

^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-03-09 19:39 Cliff Wickman
  2007-03-09 23:58 ` Nathan Lynch
  2007-03-10  9:19 ` Ingo Molnar
  0 siblings, 2 replies; 25+ messages in thread
From: Cliff Wickman @ 2007-03-09 19:39 UTC (permalink / raw)
  To: linux-kernel


From: Cliff Wickman <cpw@sgi.com>

(this is a second submission -- the first was from a work area back
 porting to an older release)

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

But the task's cpus_allowed may have been a single cpu.

This patch would insert a preference to migrate such a task to a cpu within
its cpuset (and set its cpus_allowed to its cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


Diffed against 2.6.21-rc3 (Andrew's current top of tree)

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 kernel/sched.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: morton.070123/kernel/sched.c
===================================================================
--- morton.070123.orig/kernel/sched.c
+++ morton.070123/kernel/sched.c
@@ -5170,6 +5170,12 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		p->cpus_allowed = cpuset_cpus_allowed(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);

^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-03-07 14:24 Cliff Wickman
  2007-03-07 18:25 ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Cliff Wickman @ 2007-03-07 14:24 UTC (permalink / raw)
  To: linux-kernel


From: Cliff Wickman <cpw@sgi.com>

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

But the task's cpus_allowed may have been a single cpu.

This patch would insert a preference to migrate such a task to a cpu within
its cpuset (and set its cpus_allowed to its cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


Diffed against 2.6.16.37

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 kernel/sched.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4602,6 +4602,12 @@ static void move_task_off_dead_cpu(int d
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(tsk->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
+		dest_cpu = any_online_cpu(tsk->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		cpus_setall(tsk->cpus_allowed);

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-08-28 17:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-24 22:18 [PATCH 1/1] hotplug cpu: migrate a task within its cpuset Cliff Wickman
2007-08-24 22:54 ` Andrew Morton
2007-08-25  9:47   ` Oleg Nesterov
2007-08-26  0:16     ` Gautham R Shenoy
2007-08-26  0:47       ` Rusty Russell
2007-08-26  8:09         ` Andrew Morton
2007-08-27  7:01           ` Rusty Russell
2007-08-25  9:58 ` Oleg Nesterov
2007-08-25 11:18 ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2007-05-23 21:29 Oleg Nesterov
2007-05-23 22:56 ` Cliff Wickman
2007-05-23 23:32   ` Oleg Nesterov
2007-05-21 20:08 Cliff Wickman
2007-05-23 17:43 ` Andrew Morton
2007-05-24  7:56   ` Ingo Molnar
2007-05-29 19:32     ` Andrew Morton
2007-03-09 19:39 Cliff Wickman
2007-03-09 23:58 ` Nathan Lynch
2007-03-15  0:36   ` Robin Holt
2007-03-15  7:32     ` Nathan Lynch
2007-03-10  9:19 ` Ingo Molnar
2007-03-10 15:51   ` Nathan Lynch
2007-03-10 17:08     ` Ingo Molnar
2007-03-07 14:24 Cliff Wickman
2007-03-07 18:25 ` Randy Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox