public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-03-07 14:24 Cliff Wickman
  2007-03-07 18:25 ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Cliff Wickman @ 2007-03-07 14:24 UTC (permalink / raw)
  To: linux-kernel


From: Cliff Wickman <cpw@sgi.com>

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

But the task's cpus_allowed may have been a single cpu.

This patch would insert a preference to migrate such a task to a cpu within
its cpuset (and set its cpus_allowed to its cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


Diffed against 2.6.16.37

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 kernel/sched.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4602,6 +4602,12 @@ static void move_task_off_dead_cpu(int d
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(tsk->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
+		dest_cpu = any_online_cpu(tsk->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		cpus_setall(tsk->cpus_allowed);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-07 14:24 Cliff Wickman
@ 2007-03-07 18:25 ` Randy Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Dunlap @ 2007-03-07 18:25 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: linux-kernel

On Wed, 07 Mar 2007 08:24:44 -0600 Cliff Wickman wrote:

> 
> From: Cliff Wickman <cpw@sgi.com>
> 
> When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> that have been running on that cpu.
> 
> Currently, such a task is migrated:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any cpu which is both online and among that task's cpus_allowed
> 
> But the task's cpus_allowed may have been a single cpu.
> 
> This patch would insert a preference to migrate such a task to a cpu within
> its cpuset (and set its cpus_allowed to its cpuset).
> 
> With this patch, migrate the task to:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any online cpu within the task's cpuset
>  3) to any cpu which is both online and among that task's cpus_allowed
> 
> 
> Diffed against 2.6.16.37

Wow.  Why?
or did you want this applied only to the 2.6.16.* stable kernel?


|--- linux.orig/kernel/sched.c
|+++ linux/kernel/sched.c
--------------------------
Patching file kernel/sched.c using Plan A...
Hunk #1 succeeded at 5032 with fuzz 2 (offset 430 lines).
Hmm...  Ignoring the trailing garbage.
done


<me checks>

I'm surprised that none of Documentation/SubmittingPatches or
The Perfect Patch (http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt)
or Jeff Garzik's patch-format page (http://linux.yyz.us/patch-format.html)
(AFAICT) mention that patches should be made against "current" or
"recent" or "head of tree" or some such words.

I'll "fix" Doc/SubmittingPatches... and/or Doc/SubmitChecklist.



> Signed-off-by: Cliff Wickman <cpw@sgi.com>
> 
> ---
>  kernel/sched.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> Index: linux/kernel/sched.c
> ===================================================================
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -4602,6 +4602,12 @@ static void move_task_off_dead_cpu(int d
>  	if (dest_cpu == NR_CPUS)
>  		dest_cpu = any_online_cpu(tsk->cpus_allowed);
>  
> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
> +		dest_cpu = any_online_cpu(tsk->cpus_allowed);
> +	}
> +
>  	/* No more Mr. Nice Guy. */
>  	if (dest_cpu == NR_CPUS) {
>  		cpus_setall(tsk->cpus_allowed);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-03-09 19:39 Cliff Wickman
  2007-03-09 23:58 ` Nathan Lynch
  2007-03-10  9:19 ` Ingo Molnar
  0 siblings, 2 replies; 25+ messages in thread
From: Cliff Wickman @ 2007-03-09 19:39 UTC (permalink / raw)
  To: linux-kernel


From: Cliff Wickman <cpw@sgi.com>

(this is a second submission -- the first was from a work area back
 porting to an older release)

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

But the task's cpus_allowed may have been a single cpu.

This patch would insert a preference to migrate such a task to a cpu within
its cpuset (and set its cpus_allowed to its cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


Diffed against 2.6.21-rc3 (Andrew's current top of tree)

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 kernel/sched.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: morton.070123/kernel/sched.c
===================================================================
--- morton.070123.orig/kernel/sched.c
+++ morton.070123/kernel/sched.c
@@ -5170,6 +5170,12 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		p->cpus_allowed = cpuset_cpus_allowed(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-09 19:39 Cliff Wickman
@ 2007-03-09 23:58 ` Nathan Lynch
  2007-03-15  0:36   ` Robin Holt
  2007-03-10  9:19 ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: Nathan Lynch @ 2007-03-09 23:58 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: linux-kernel

Hello-

Cliff Wickman wrote:
> This patch would insert a preference to migrate such a task to a cpu within
> its cpuset (and set its cpus_allowed to its cpuset).
> 
> With this patch, migrate the task to:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any online cpu within the task's cpuset
>  3) to any cpu which is both online and among that task's cpus_allowed

I think I disagree with this change.

The kernel shouldn't have to be any smarter than it already is about
moving tasks off an offlined cpu.  The only way case 2) can be reached
is if the user has changed a task's cpu affinity.  If the user is
sophisticated enough to manipulate tasks' cpu affinity then they can
arrange to migrate tasks as they see fit before offlining a cpu.

Furthermore:

> --- morton.070123.orig/kernel/sched.c
> +++ morton.070123/kernel/sched.c
> @@ -5170,6 +5170,12 @@ restart:
>  	if (dest_cpu == NR_CPUS)
>  		dest_cpu = any_online_cpu(p->cpus_allowed);
>  
> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		p->cpus_allowed = cpuset_cpus_allowed(p);
> +		dest_cpu = any_online_cpu(p->cpus_allowed);
> +	}

It's not okay to call cpuset_cpus_allowed in this context -- local
irqs are supposed to have been disabled by the caller of
move_task_off_dead_cpu and cpuset_cpus_allowed acquires a mutex.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-09 19:39 Cliff Wickman
  2007-03-09 23:58 ` Nathan Lynch
@ 2007-03-10  9:19 ` Ingo Molnar
  2007-03-10 15:51   ` Nathan Lynch
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-03-10  9:19 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: linux-kernel, Andrew Morton


* Cliff Wickman <cpw@sgi.com> wrote:

> With this patch, migrate the task to:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any online cpu within the task's cpuset
>  3) to any cpu which is both online and among that task's cpus_allowed
> 
> Diffed against 2.6.21-rc3 (Andrew's current top of tree)

looks good to me.

Acked-by: Ingo Molnar <mingo@elte.hu>

> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		p->cpus_allowed = cpuset_cpus_allowed(p);
> +		dest_cpu = any_online_cpu(p->cpus_allowed);
> +	}

what's the practical effect of this - when moving the last CPU offline 
from a node you got jobs migrated to really alien nodes? Thus i think we 
should queue this up for v2.6.21 too, correct? It's a NOP on systems 
that do not set up cpusets, so it's low-risk.

btw., unrelated to your patch, there's this bit right after the code 
above:

        /* No more Mr. Nice Guy. */
        if (dest_cpu == NR_CPUS) {
                rq = task_rq_lock(p, &flags);
                cpus_setall(p->cpus_allowed);
                dest_cpu = any_online_cpu(p->cpus_allowed);

out of consistency, shouldnt the cpus_setall() rather be:

		p->cpus_allowed = cpu_possible_map;

? It shouldnt make any real difference but it looks more consistent.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-10  9:19 ` Ingo Molnar
@ 2007-03-10 15:51   ` Nathan Lynch
  2007-03-10 17:08     ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Nathan Lynch @ 2007-03-10 15:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Cliff Wickman, linux-kernel, Andrew Morton

Hi-

Ingo Molnar wrote:
> 
> * Cliff Wickman <cpw@sgi.com> wrote:
> 
> > With this patch, migrate the task to:
> >  1) to any cpu on the same node as the disabled cpu, which is both online
> >     and among that task's cpus_allowed
> >  2) to any online cpu within the task's cpuset
> >  3) to any cpu which is both online and among that task's cpus_allowed
> > 
> > Diffed against 2.6.21-rc3 (Andrew's current top of tree)
> 
> looks good to me.
> 
> Acked-by: Ingo Molnar <mingo@elte.hu>
> 
> > +	/* try to stay on the same cpuset */
> > +	if (dest_cpu == NR_CPUS) {
> > +		p->cpus_allowed = cpuset_cpus_allowed(p);
> > +		dest_cpu = any_online_cpu(p->cpus_allowed);
> > +	}
> 
> what's the practical effect of this - when moving the last CPU offline 
> from a node you got jobs migrated to really alien nodes? Thus i think we 
> should queue this up for v2.6.21 too, correct? It's a NOP on systems 
> that do not set up cpusets, so it's low-risk.

See my earlier reply to this patch.  Calling cpuset_cpus_allowed
(which takes a mutex) here is a bug, since move_task_off_dead_cpu must
be called with interrupts disabled.


> btw., unrelated to your patch, there's this bit right after the code 
> above:
> 
>         /* No more Mr. Nice Guy. */
>         if (dest_cpu == NR_CPUS) {
>                 rq = task_rq_lock(p, &flags);
>                 cpus_setall(p->cpus_allowed);
>                 dest_cpu = any_online_cpu(p->cpus_allowed);
> 
> out of consistency, shouldnt the cpus_setall() rather be:
> 
> 		p->cpus_allowed = cpu_possible_map;
> 
> ? It shouldnt make any real difference but it looks more consistent.

The default value of cpus_allowed is CPU_MASK_ALL, I thought -- at
least that's what we set init's to early on.  Though, as you say, it
shouldn't make any difference.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-10 15:51   ` Nathan Lynch
@ 2007-03-10 17:08     ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2007-03-10 17:08 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Cliff Wickman, linux-kernel, Andrew Morton


* Nathan Lynch <ntl@pobox.com> wrote:

> > > +	/* try to stay on the same cpuset */
> > > +	if (dest_cpu == NR_CPUS) {
> > > +		p->cpus_allowed = cpuset_cpus_allowed(p);
> > > +		dest_cpu = any_online_cpu(p->cpus_allowed);
> > > +	}
> > 
> > what's the practical effect of this - when moving the last CPU offline 
> > from a node you got jobs migrated to really alien nodes? Thus i think we 
> > should queue this up for v2.6.21 too, correct? It's a NOP on systems 
> > that do not set up cpusets, so it's low-risk.
> 
> See my earlier reply to this patch.  Calling cpuset_cpus_allowed 
> (which takes a mutex) here is a bug, since move_task_off_dead_cpu must 
> be called with interrupts disabled.

ouch. i only checked the !CONFIG_CPUSET case :-/ It's a really bad idea 
to have any locking there indeed. The name itself suggests some atomic 
action.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-09 23:58 ` Nathan Lynch
@ 2007-03-15  0:36   ` Robin Holt
  2007-03-15  7:32     ` Nathan Lynch
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Holt @ 2007-03-15  0:36 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Cliff Wickman, linux-kernel

On Fri, Mar 09, 2007 at 05:58:59PM -0600, Nathan Lynch wrote:
> Hello-
> 
> Cliff Wickman wrote:
> > This patch would insert a preference to migrate such a task to a cpu within
> > its cpuset (and set its cpus_allowed to its cpuset).
> > 
> > With this patch, migrate the task to:
> >  1) to any cpu on the same node as the disabled cpu, which is both online
> >     and among that task's cpus_allowed
> >  2) to any online cpu within the task's cpuset
> >  3) to any cpu which is both online and among that task's cpus_allowed
> 
> I think I disagree with this change.
> 
> The kernel shouldn't have to be any smarter than it already is about
> moving tasks off an offlined cpu.  The only way case 2) can be reached
> is if the user has changed a task's cpu affinity.  If the user is
> sophisticated enough to manipulate tasks' cpu affinity then they can
> arrange to migrate tasks as they see fit before offlining a cpu.

You are assuming some sort of interlock between the admin and the user.
While this may be true on your own personal desktop, I don't think you
can expect this to be true on a development machine shared by hundreds
of users and admin'd by a group of people.

Additionally, ia64 is gaining support for offlining a cpu which is giving
cache errors.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-03-15  0:36   ` Robin Holt
@ 2007-03-15  7:32     ` Nathan Lynch
  0 siblings, 0 replies; 25+ messages in thread
From: Nathan Lynch @ 2007-03-15  7:32 UTC (permalink / raw)
  To: Robin Holt; +Cc: Cliff Wickman, linux-kernel

Robin Holt wrote:
> On Fri, Mar 09, 2007 at 05:58:59PM -0600, Nathan Lynch wrote:
> > Hello-
> > 
> > Cliff Wickman wrote:
> > > This patch would insert a preference to migrate such a task to a cpu within
> > > its cpuset (and set its cpus_allowed to its cpuset).
> > > 
> > > With this patch, migrate the task to:
> > >  1) to any cpu on the same node as the disabled cpu, which is both online
> > >     and among that task's cpus_allowed
> > >  2) to any online cpu within the task's cpuset
> > >  3) to any cpu which is both online and among that task's cpus_allowed
> > 
> > I think I disagree with this change.
> > 
> > The kernel shouldn't have to be any smarter than it already is about
> > moving tasks off an offlined cpu.  The only way case 2) can be reached
> > is if the user has changed a task's cpu affinity.  If the user is
> > sophisticated enough to manipulate tasks' cpu affinity then they can
> > arrange to migrate tasks as they see fit before offlining a cpu.
> 
> You are assuming some sort of interlock between the admin and the user.

Oh, the horror!  ;-)

> While this may be true on your own personal desktop, I don't think you
> can expect this to be true on a development machine shared by hundreds
> of users and admin'd by a group of people.

Actually, I would hope that a large development environment where
users are given fine-grained control over their resource usage would
1) allow interested tasks to register for notification before a
resource is taken away, and 2) that this system of registration and
notification is implemented in userspace.  Use d-bus or something.

> Additionally, ia64 is gaining support for offlining a cpu which is giving
> cache errors.

Okay.  Consider the case where a task's cpuset consists only of threads
or cores that share cache.  Keeping the task in its cpuset when
offlining due to cache errors is exactly the wrong thing to do.

Anyway, it looks to me like task->cpus_allowed should already reflect
task's cpuset (cpuset.c:attach_task calls sched.c:set_cpus_allowed),
so now I'm missing the point of the patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-05-21 20:08 Cliff Wickman
  2007-05-23 17:43 ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Cliff Wickman @ 2007-05-21 20:08 UTC (permalink / raw)
  To: linux-kernel



(this is a third submission -- corrects a locking/blocking issue pointed
 out by Nathan Lynch)

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system
to have its tasks confined to a cpuset so as to cluster them near the
memory that they share. Furthermore, it is typical to explicitly place such
a task on a specific cpu in that cpuset.  And in that case the task's
cpus_allowed includes only a single cpu.

This patch inserts a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed(), which may block.

move_task_off_dead_cpu() has been within a critical region when called
from migrate_live_tasks().  So this patch also changes migrate_live_tasks()
to enable interrupts before calling move_task_off_dead_cpu().
Since the tasklist_lock is dropped, the list scan must be restarted from
the top.
It locks the migrating task by bumping its usage count.
It disables interrupts in move_task_off_dead_cpu() before the
 call to __migrate_task().

This is the outline of the locking surrounding calls to
move_task_off_dead_cpu(), after applying this patch:

  migration_call()
  | case CPU_DEAD
  |   migrate_live_tasks(cpu)
  |   | recheck:
  |   | write_lock_irq(&tasklist_lock)
  |   | do_each_thread(t, p) {
  |   |         if (task_cpu(p) == src_cpu)
  |   |                 get_task_struct(p)
  |   |                 write_unlock_irq(&tasklist_lock)
  |   |                 move_task_off_dead_cpu(src_cpu, p) <<<< noncritical
  |   |                 put_task_struct(p);
  |   |                 goto recheck
  |   | } while_each_thread(t, p)
  |   | write_unlock_irq(&tasklist_lock)
  |
  |   rq = task_rq_lock(rq->idle, &flags)
  |
  |   migrate_dead_tasks(cpu)
  |   | for (arr = 0; arr < 2; arr++) {
  |   |   for (i = 0; i < MAX_PRIO; i++) {
  |   |     while (!list_empty(list))
  |   |       migrate_dead(dead_cpu
  |   |         get_task_struct(p)
  |   |         spin_unlock_irq(&rq->lock)
  |   |         move_task_off_dead_cpu(dead_cpu, p)        <<<< noncritcal
  |   |         spin_lock_irq(&rq->lock)
  |   |         put_task_struct(p)
  |
  |   task_rq_unlock(rq, &flags)

[Side note: a task may be migrated off of its cpuset, but is still attached to
 that cpuset (by pointer and reference count).  The cpuset will not be
 released.  This patch does not change that.]

Diffed against 2.6.21

Signed-off-by: Cliff Wickman <cpw@sgi.com>

 kernel/sched.c |   31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

Index: linus.070504/kernel/sched.c
===================================================================
--- linus.070504.orig/kernel/sched.c
+++ linus.070504/kernel/sched.c
@@ -4989,7 +4989,7 @@ wait_to_die:
 #ifdef CONFIG_HOTPLUG_CPU
 /*
  * Figure out where task on dead CPU should go, use force if neccessary.
- * NOTE: interrupts should be disabled by the caller
+ * NOTE: interrupts are not disabled by the caller
  */
 static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
 {
@@ -5008,6 +5008,17 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		/*
+		 * Call to cpuset_cpus_allowed may sleep, so we depend
+		 * on move_task_off_dead_cpu() being called in a non-critical
+		 * region.
+		 */
+		p->cpus_allowed = cpuset_cpus_allowed(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);
@@ -5025,8 +5036,16 @@ restart:
 			       "longer affine to cpu%d\n",
 			       p->pid, p->comm, dead_cpu);
 	}
-	if (!__migrate_task(p, dead_cpu, dest_cpu))
+	/*
+	 * __migrate_task() requires interrupts to be disabled
+	 */
+	local_irq_disable();
+	if (!__migrate_task(p, dead_cpu, dest_cpu)) {
+		local_irq_enable();
 		goto restart;
+	}
+	local_irq_enable();
+	return;
 }
 
 /*
@@ -5054,14 +5073,20 @@ static void migrate_live_tasks(int src_c
 {
 	struct task_struct *p, *t;
 
+restartlist:
 	write_lock_irq(&tasklist_lock);
 
 	do_each_thread(t, p) {
 		if (p == current)
 			continue;
 
-		if (task_cpu(p) == src_cpu)
+		if (task_cpu(p) == src_cpu) {
+			get_task_struct(p);
+			write_unlock_irq(&tasklist_lock);
 			move_task_off_dead_cpu(src_cpu, p);
+			put_task_struct(p);
+			goto restartlist;
+		}
 	} while_each_thread(t, p);
 
 	write_unlock_irq(&tasklist_lock);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-05-21 20:08 Cliff Wickman
@ 2007-05-23 17:43 ` Andrew Morton
  2007-05-24  7:56   ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-05-23 17:43 UTC (permalink / raw)
  To: Cliff Wickman
  Cc: linux-kernel, Paul Jackson, Christoph Lameter, Paul Menage,
	Nick Piggin, Ingo Molnar

On Mon, 21 May 2007 15:08:56 -0500 cpw@sgi.com (Cliff Wickman) wrote:

> (this is a third submission -- corrects a locking/blocking issue pointed
>  out by Nathan Lynch)
> 
> When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> that have been running on that cpu.

So I still have these three patches in the pending queue but I was rather
hoping that the scheduler, sched-domains and cpuset people could take a
look at them, please.

They hit sched.c and cpuset.c mainly, and they might trash Ingo's CFS patch
(I haven't checked).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-05-23 21:29 Oleg Nesterov
  2007-05-23 22:56 ` Cliff Wickman
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-05-23 21:29 UTC (permalink / raw)
  To: cpw; +Cc: Andrew Morton, Ingo Molnar, linux-kernel

Cliff Wickman wrote:
>
> In order to do this, move_task_off_dead_cpu() must make a call to
> cpuset_cpus_allowed(), which may block.
>
> move_task_off_dead_cpu() has been within a critical region when called
> from migrate_live_tasks().  So this patch also changes migrate_live_tasks()
> to enable interrupts before calling move_task_off_dead_cpu().
> Since the tasklist_lock is dropped, the list scan must be restarted from
> the top.
>
> [... snip ...]
>
> - * NOTE: interrupts should be disabled by the caller
> + * NOTE: interrupts are not disabled by the caller
>   */
>  static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
>  {
> @@ -5008,6 +5008,17 @@ restart:
>  	if (dest_cpu == NR_CPUS)
>  		dest_cpu = any_online_cpu(p->cpus_allowed);
>
> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		/*
> +		 * Call to cpuset_cpus_allowed may sleep, so we depend
> +		 * on move_task_off_dead_cpu() being called in a non-critical
> +		 * region.
> +		 */
> +		p->cpus_allowed = cpuset_cpus_allowed(p);
> +		dest_cpu = any_online_cpu(p->cpus_allowed);
> +	}

I know nothing about cpuset.c, a _very_ naive question.

Do we really need task_lock() (used by cpuset_cpus_allowed) here ?

If not, probably we can make this simpler. CPU_DEAD takes cpuset_lock(),
move_task_off_dead_cpu() uses guarantee_online_cpus() which doesn't sleep,
so we don't need other changes.

Possible?

If not, this patch should also change migrate_dead(), it still calls
move_task_off_dead_cpu() with irqs disabled, no?

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-05-23 21:29 [PATCH 1/1] hotplug cpu: migrate a task within its cpuset Oleg Nesterov
@ 2007-05-23 22:56 ` Cliff Wickman
  2007-05-23 23:32   ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Cliff Wickman @ 2007-05-23 22:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, pj


On Thu, May 24, 2007 at 01:29:02AM +0400, Oleg Nesterov wrote:
> Cliff Wickman wrote:
> >
> > In order to do this, move_task_off_dead_cpu() must make a call to
> > cpuset_cpus_allowed(), which may block.
> >
> > move_task_off_dead_cpu() has been within a critical region when called
> > from migrate_live_tasks().  So this patch also changes migrate_live_tasks()
> > to enable interrupts before calling move_task_off_dead_cpu().
> > Since the tasklist_lock is dropped, the list scan must be restarted from
> > the top.
> >
> > [... snip ...]
> >
> > - * NOTE: interrupts should be disabled by the caller
> > + * NOTE: interrupts are not disabled by the caller
> >   */
> >  static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> >  {
> > @@ -5008,6 +5008,17 @@ restart:
> >  	if (dest_cpu == NR_CPUS)
> >  		dest_cpu = any_online_cpu(p->cpus_allowed);
> >
> > +	/* try to stay on the same cpuset */
> > +	if (dest_cpu == NR_CPUS) {
> > +		/*
> > +		 * Call to cpuset_cpus_allowed may sleep, so we depend
> > +		 * on move_task_off_dead_cpu() being called in a non-critical
> > +		 * region.
> > +		 */
> > +		p->cpus_allowed = cpuset_cpus_allowed(p);
> > +		dest_cpu = any_online_cpu(p->cpus_allowed);
> > +	}
> 
> I know nothing about cpuset.c, a _very_ naive question.

Paul Jackson is the cpuset guru.
 
> Do we really need task_lock() (used by cpuset_cpus_allowed) here ?

According to Paul's comment in kernel/cpuset.c
 * It is ok to first take manage_sem, then nest callback_sem.  We also
 * require taking task_lock() when dereferencing a tasks cpuset pointer.
So I'm afraid it is not safe to call guarantee_online_cpus(tsk->cpuset, &mask);
without it.  Could the task not be exiting?

> If not, probably we can make this simpler. CPU_DEAD takes cpuset_lock(),
> move_task_off_dead_cpu() uses guarantee_online_cpus() which doesn't sleep,
> so we don't need other changes.
> 
> Possible?
> 
> If not, this patch should also change migrate_dead(), it still calls
> move_task_off_dead_cpu() with irqs disabled, no?

Right, the lock is released but I indeed didn't reenable irqs.
How would you suggest doing that?  The irq state was saved in local
variable "flags" back in migration_call().

> 
> Oleg.

-- 
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-05-23 22:56 ` Cliff Wickman
@ 2007-05-23 23:32   ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-05-23 23:32 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: linux-kernel, pj

On 05/23, Cliff Wickman wrote:
> 
> On Thu, May 24, 2007 at 01:29:02AM +0400, Oleg Nesterov wrote:
> > Cliff Wickman wrote:
> > >
> > > - * NOTE: interrupts should be disabled by the caller
> > > + * NOTE: interrupts are not disabled by the caller
> > >   */
> > >  static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > >  {
> > > @@ -5008,6 +5008,17 @@ restart:
> > >  	if (dest_cpu == NR_CPUS)
> > >  		dest_cpu = any_online_cpu(p->cpus_allowed);
> > >
> > > +	/* try to stay on the same cpuset */
> > > +	if (dest_cpu == NR_CPUS) {
> > > +		/*
> > > +		 * Call to cpuset_cpus_allowed may sleep, so we depend
> > > +		 * on move_task_off_dead_cpu() being called in a non-critical
> > > +		 * region.
> > > +		 */
> > > +		p->cpus_allowed = cpuset_cpus_allowed(p);
> > > +		dest_cpu = any_online_cpu(p->cpus_allowed);
> > > +	}
> > 
> > I know nothing about cpuset.c, a _very_ naive question.
> 
> Paul Jackson is the cpuset guru.

Hopefully Paul can help us...

> > Do we really need task_lock() (used by cpuset_cpus_allowed) here ?
> 
> According to Paul's comment in kernel/cpuset.c
>  * It is ok to first take manage_sem, then nest callback_sem.  We also
>  * require taking task_lock() when dereferencing a tasks cpuset pointer.
> So I'm afraid it is not safe to call guarantee_online_cpus(tsk->cpuset, &mask);
> without it.  Could the task not be exiting?

But how task_lock() can help? cpuset_exit() doesn't take it, it changes ->cpuset
lockless. However, it sets ->cpuset = &top_cpuset, and the comment says:

	* Don't even think about derefencing 'cs' after the cpuset use count
	* goes to zero, except inside a critical section guarded by manage_mutex
	* or callback_mutex.
	     ^^^^^^^^^^^^^^
So, perhaps cpuset_lock() should be enough.

(That said, looking at cpuset_exit() I can't understand why callback_mutex is
 enough. If it is not, cpuset_cpus_allowed() is not safe either).

> > If not, probably we can make this simpler. CPU_DEAD takes cpuset_lock(),
> > move_task_off_dead_cpu() uses guarantee_online_cpus() which doesn't sleep,
> > so we don't need other changes.
> > 
> > Possible?
> > 
> > If not, this patch should also change migrate_dead(), it still calls
> > move_task_off_dead_cpu() with irqs disabled, no?
> 
> Right, the lock is released but I indeed didn't reenable irqs.
> How would you suggest doing that?  The irq state was saved in local
> variable "flags" back in migration_call().

migration_call(CPU_DEAD) is called with irqs enabled.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-05-23 17:43 ` Andrew Morton
@ 2007-05-24  7:56   ` Ingo Molnar
  2007-05-29 19:32     ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2007-05-24  7:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cliff Wickman, linux-kernel, Paul Jackson, Christoph Lameter,
	Paul Menage, Nick Piggin


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > When a cpu is disabled, move_task_off_dead_cpu() is called for tasks 
> > that have been running on that cpu.
> 
> So I still have these three patches in the pending queue but I was 
> rather hoping that the scheduler, sched-domains and cpuset people 
> could take a look at them, please.
> 
> They hit sched.c and cpuset.c mainly, and they might trash Ingo's CFS 
> patch (I haven't checked).

The patch looks good to me. It applies cleanly ontop of CFS and it 
builds and boots fine with and without CONFIG_CPU_HOTPLUG (although i 
havent tried to explicitly stress the codepath in question). We are a 
bit paranoid in this codepath but it's not performance-critical 
normally.

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-05-24  7:56   ` Ingo Molnar
@ 2007-05-29 19:32     ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2007-05-29 19:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Cliff Wickman, linux-kernel, Paul Jackson, Christoph Lameter,
	Paul Menage, Nick Piggin

On Thu, 24 May 2007 09:56:01 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > When a cpu is disabled, move_task_off_dead_cpu() is called for tasks 
> > > that have been running on that cpu.
> > 
> > So I still have these three patches in the pending queue but I was 
> > rather hoping that the scheduler, sched-domains and cpuset people 
> > could take a look at them, please.
> > 
> > They hit sched.c and cpuset.c mainly, and they might trash Ingo's CFS 
> > patch (I haven't checked).
> 
> The patch looks good to me. It applies cleanly ontop of CFS and it 
> builds and boots fine with and without CONFIG_CPU_HOTPLUG (although i 
> havent tried to explicitly stress the codepath in question). We are a 
> bit paranoid in this codepath but it's not performance-critical 
> normally.
> 
> Acked-by: Ingo Molnar <mingo@elte.hu>
> 

I applied this to -mm, thanks.

The other two patches don't apply well to the current pending patch queue
and nobody seems keen on doing a serious review (or any review, iirc), so I
ducked them.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
@ 2007-08-24 22:18 Cliff Wickman
  2007-08-24 22:54 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Cliff Wickman @ 2007-08-24 22:18 UTC (permalink / raw)
  To: akpm, ego, mingo, vatsa, oleg, pj; +Cc: linux-kernel


When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

It is typical of a multithreaded application running on a large NUMA system
to have its tasks confined to a cpuset so as to cluster them near the
memory that they share. Furthermore, it is typical to explicitly place such
a task on a specific cpu in that cpuset.  And in that case the task's
cpus_allowed includes only a single cpu.

This patch would insert a preference to migrate such a task to some cpu within
its cpuset (and set its cpus_allowed to its entire cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
    and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


In order to do this, move_task_off_dead_cpu() must make a call to
cpuset_cpus_allowed_lock(), a new variant of cpuset_cpus_allowed() that
will not block.
Calls are made to cpuset_lock() and cpuset_unlock() in migration_call()
to set the cpuset mutex during the whole migrate_live_tasks() and
migrate_dead_tasks() procedure.

This patch depends on 2 patches from Oleg Nesterov:
  [PATCH 1/2] do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)
  [PATCH 2/2] migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()

Diffed against 2.6.23-rc3

Signed-off-by: Cliff Wickman <cpw@sgi.com>

---
 include/linux/cpuset.h |    5 +++++
 kernel/cpuset.c        |   19 +++++++++++++++++++
 kernel/sched.c         |   14 ++++++++++++++
 3 files changed, 38 insertions(+)

Index: linus.070821/kernel/sched.c
===================================================================
--- linus.070821.orig/kernel/sched.c
+++ linus.070821/kernel/sched.c
@@ -61,6 +61,7 @@
 #include <linux/delayacct.h>
 #include <linux/reciprocal_div.h>
 #include <linux/unistd.h>
+#include <linux/cpuset.h>
 
 #include <asm/tlb.h>
 
@@ -5091,6 +5092,17 @@ restart:
 	if (dest_cpu == NR_CPUS)
 		dest_cpu = any_online_cpu(p->cpus_allowed);
 
+	/* try to stay on the same cpuset */
+	if (dest_cpu == NR_CPUS) {
+		/*
+		 * The cpuset_cpus_allowed_lock() variant of
+		 * cpuset_cpus_allowed() will not block
+		 * It must be called within calls to cpuset_lock/cpuset_unlock.
+		 */
+		p->cpus_allowed = cpuset_cpus_allowed_lock(p);
+		dest_cpu = any_online_cpu(p->cpus_allowed);
+	}
+
 	/* No more Mr. Nice Guy. */
 	if (dest_cpu == NR_CPUS) {
 		rq = task_rq_lock(p, &flags);
@@ -5412,6 +5424,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
+		cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
 		migrate_live_tasks(cpu);
 		rq = cpu_rq(cpu);
 		kthread_stop(rq->migration_thread);
@@ -5425,6 +5438,7 @@ migration_call(struct notifier_block *nf
 		rq->idle->sched_class = &idle_sched_class;
 		migrate_dead_tasks(cpu);
 		spin_unlock_irq(&rq->lock);
+		cpuset_unlock();
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 
Index: linus.070821/include/linux/cpuset.h
===================================================================
--- linus.070821.orig/include/linux/cpuset.h
+++ linus.070821/include/linux/cpuset.h
@@ -22,6 +22,7 @@ extern void cpuset_init_smp(void);
 extern void cpuset_fork(struct task_struct *p);
 extern void cpuset_exit(struct task_struct *p);
 extern cpumask_t cpuset_cpus_allowed(struct task_struct *p);
+extern cpumask_t cpuset_cpus_allowed_lock(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -87,6 +88,10 @@ static inline cpumask_t cpuset_cpus_allo
 {
 	return cpu_possible_map;
 }
+static inline cpumask_t cpuset_cpus_allowed_lock(struct task_struct *p)
+{
+	return cpu_possible_map;
+}
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
 {
Index: linus.070821/kernel/cpuset.c
===================================================================
--- linus.070821.orig/kernel/cpuset.c
+++ linus.070821/kernel/cpuset.c
@@ -2333,6 +2333,25 @@ cpumask_t cpuset_cpus_allowed(struct tas
 	return mask;
 }
 
+/**
+ * cpuset_cpus_allowed_lock - return cpus_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
+ *
+ * Description: Same as cpuset_cpus_allowed, but called with callback_mutex
+ * already held.
+ **/
+
+cpumask_t cpuset_cpus_allowed_lock(struct task_struct *tsk)
+{
+	cpumask_t mask;
+
+	task_lock(tsk);
+	guarantee_online_cpus(tsk->cpuset, &mask);
+	task_unlock(tsk);
+
+	return mask;
+}
+
 void cpuset_init_current_mems_allowed(void)
 {
 	current->mems_allowed = NODE_MASK_ALL;
-- 
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-24 22:18 Cliff Wickman
@ 2007-08-24 22:54 ` Andrew Morton
  2007-08-25  9:47   ` Oleg Nesterov
  2007-08-25  9:58 ` Oleg Nesterov
  2007-08-25 11:18 ` Oleg Nesterov
  2 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-08-24 22:54 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: ego, mingo, vatsa, oleg, pj, linux-kernel

On Fri, 24 Aug 2007 17:18:06 -0500
Cliff Wickman <cpw@sgi.com> wrote:

> When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> that have been running on that cpu.
> 
> Currently, such a task is migrated:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any cpu which is both online and among that task's cpus_allowed
> 
> It is typical of a multithreaded application running on a large NUMA system
> to have its tasks confined to a cpuset so as to cluster them near the
> memory that they share. Furthermore, it is typical to explicitly place such
> a task on a specific cpu in that cpuset.  And in that case the task's
> cpus_allowed includes only a single cpu.

operator error..

> This patch would insert a preference to migrate such a task to some cpu within
> its cpuset (and set its cpus_allowed to its entire cpuset).
> 
> With this patch, migrate the task to:
>  1) to any cpu on the same node as the disabled cpu, which is both online
>     and among that task's cpus_allowed
>  2) to any online cpu within the task's cpuset
>  3) to any cpu which is both online and among that task's cpus_allowed

Wouldn't it be saner to refuse the offlining request if the CPU has tasks
which cannot be migrated to any other CPU?  I mean, the operator has gone
and asked the machine to perform two inconsistent/incompatible things at
the same time.

Look at it this way.  If we were to merge this patch then it would be
logical to also merge a patch which has the following description:

  "if an process attempts to pin itself onto an presently-offlined CPU,
   the kernel will choose a different CPU according to <heuristics> and
   will pin the process to that CPU instead".

Which is the same thing as your patch, only it handles the two events when
they occur in the other order.

No?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-24 22:54 ` Andrew Morton
@ 2007-08-25  9:47   ` Oleg Nesterov
  2007-08-26  0:16     ` Gautham R Shenoy
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2007-08-25  9:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Cliff Wickman, ego, mingo, vatsa, pj, linux-kernel

On 08/24, Andrew Morton wrote:
>
> On Fri, 24 Aug 2007 17:18:06 -0500
> Cliff Wickman <cpw@sgi.com> wrote:
> 
> > When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> > that have been running on that cpu.
> > 
> > Currently, such a task is migrated:
> >  1) to any cpu on the same node as the disabled cpu, which is both online
> >     and among that task's cpus_allowed
> >  2) to any cpu which is both online and among that task's cpus_allowed
> > 
> > It is typical of a multithreaded application running on a large NUMA system
> > to have its tasks confined to a cpuset so as to cluster them near the
> > memory that they share. Furthermore, it is typical to explicitly place such
> > a task on a specific cpu in that cpuset.  And in that case the task's
> > cpus_allowed includes only a single cpu.
> 
> operator error..
> 
> > This patch would insert a preference to migrate such a task to some cpu within
> > its cpuset (and set its cpus_allowed to its entire cpuset).
> > 
> > With this patch, migrate the task to:
> >  1) to any cpu on the same node as the disabled cpu, which is both online
> >     and among that task's cpus_allowed
> >  2) to any online cpu within the task's cpuset
> >  3) to any cpu which is both online and among that task's cpus_allowed
> 
> Wouldn't it be saner to refuse the offlining request if the CPU has tasks
> which cannot be migrated to any other CPU?  I mean, the operator has gone
> and asked the machine to perform two inconsistent/incompatible things at
> the same time.

I don't think so (regardless of this patch and CONFIG_CPUSETS). Any user
can bind its process to (say) CPU 4. This shouldn't block cpu-unplug.

Now, let's suppose that this process is a member of some cpuset which
contains CPUs 3 and 4, and CPU 4 goes down.

Before this patch, process leaves its ->cpuset and migrates to some "random"
any_online_cpu(). With this patch it stays within ->cpuset and migrates to
CPU 3.

> Look at it this way.  If we were to merge this patch then it would be
> logical to also merge a patch which has the following description:
>
>   "if an process attempts to pin itself onto an presently-offlined CPU,
>    the kernel will choose a different CPU according to <heuristics> and
>    will pin the process to that CPU instead".

set_cpus_allowed() just returns -EINVAL in that case, this looks a bit
more logical.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-24 22:18 Cliff Wickman
  2007-08-24 22:54 ` Andrew Morton
@ 2007-08-25  9:58 ` Oleg Nesterov
  2007-08-25 11:18 ` Oleg Nesterov
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-08-25  9:58 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: akpm, ego, mingo, vatsa, pj, linux-kernel

On 08/24, Cliff Wickman wrote:
> --- linus.070821.orig/kernel/cpuset.c
> +++ linus.070821/kernel/cpuset.c
> @@ -2333,6 +2333,25 @@ cpumask_t cpuset_cpus_allowed(struct tas
>  	return mask;
>  }
>  
> +/**
> + * cpuset_cpus_allowed_lock - return cpus_allowed mask from a tasks cpuset.
> + * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
> + *
> + * Description: Same as cpuset_cpus_allowed, but called with callback_mutex
> + * already held.
> + **/
> +
> +cpumask_t cpuset_cpus_allowed_lock(struct task_struct *tsk)
> +{
> +	cpumask_t mask;
> +
> +	task_lock(tsk);
> +	guarantee_online_cpus(tsk->cpuset, &mask);
> +	task_unlock(tsk);
> +
> +	return mask;
> +}

Very minor nit, perhaps it makes sense to use this function in cpuset_cpus_allowed()
to avoid the code duplication. Its name is a bit confusing, imho. How about
cpuset_cpus_allowed_locked() or __cpuset_cpus_allowed() ?

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-24 22:18 Cliff Wickman
  2007-08-24 22:54 ` Andrew Morton
  2007-08-25  9:58 ` Oleg Nesterov
@ 2007-08-25 11:18 ` Oleg Nesterov
  2 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2007-08-25 11:18 UTC (permalink / raw)
  To: Cliff Wickman; +Cc: akpm, ego, mingo, vatsa, pj, linux-kernel

On 08/24, Cliff Wickman wrote:
> 
> @@ -5091,6 +5092,17 @@ restart:
>  	if (dest_cpu == NR_CPUS)
>  		dest_cpu = any_online_cpu(p->cpus_allowed);
>  
> +	/* try to stay on the same cpuset */
> +	if (dest_cpu == NR_CPUS) {
> +		/*
> +		 * The cpuset_cpus_allowed_lock() variant of
> +		 * cpuset_cpus_allowed() will not block
> +		 * It must be called within calls to cpuset_lock/cpuset_unlock.
> +		 */
> +		p->cpus_allowed = cpuset_cpus_allowed_lock(p);
> +		dest_cpu = any_online_cpu(p->cpus_allowed);

Ugh, didn't notice while reading this patch.

We shouldn't change ->cpus_allowed without rq->lock. Please look what
the current code does in "/* No more Mr. Nice Guy. */" case.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-25  9:47   ` Oleg Nesterov
@ 2007-08-26  0:16     ` Gautham R Shenoy
  2007-08-26  0:47       ` Rusty Russell
  0 siblings, 1 reply; 25+ messages in thread
From: Gautham R Shenoy @ 2007-08-26  0:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Cliff Wickman, mingo, vatsa, pj, linux-kernel, ntl,
	Rusty Russel, Dipankar Sarma

On Sat, Aug 25, 2007 at 01:47:40PM +0400, Oleg Nesterov wrote:
> On 08/24, Andrew Morton wrote:
> >
> > On Fri, 24 Aug 2007 17:18:06 -0500
> > Cliff Wickman <cpw@sgi.com> wrote:
> > 
> > > When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> > > that have been running on that cpu.
> > > 
> > > Currently, such a task is migrated:
> > >  1) to any cpu on the same node as the disabled cpu, which is both online
> > >     and among that task's cpus_allowed
> > >  2) to any cpu which is both online and among that task's cpus_allowed
> > > 
> > > It is typical of a multithreaded application running on a large NUMA system
> > > to have its tasks confined to a cpuset so as to cluster them near the
> > > memory that they share. Furthermore, it is typical to explicitly place such
> > > a task on a specific cpu in that cpuset.  And in that case the task's
> > > cpus_allowed includes only a single cpu.
> > 
> > operator error..
> > 
> > > This patch would insert a preference to migrate such a task to some cpu within
> > > its cpuset (and set its cpus_allowed to its entire cpuset).
> > > 
> > > With this patch, migrate the task to:
> > >  1) to any cpu on the same node as the disabled cpu, which is both online
> > >     and among that task's cpus_allowed
> > >  2) to any online cpu within the task's cpuset
> > >  3) to any cpu which is both online and among that task's cpus_allowed
> > 
> > Wouldn't it be saner to refuse the offlining request if the CPU has tasks
> > which cannot be migrated to any other CPU?  I mean, the operator has gone
> > and asked the machine to perform two inconsistent/incompatible things at
> > the same time.
> 
> I don't think so (regardless of this patch and CONFIG_CPUSETS). Any user
> can bind its process to (say) CPU 4. This shouldn't block cpu-unplug.
> 
> Now, let's suppose that this process is a member of some cpuset which
> contains CPUs 3 and 4, and CPU 4 goes down.
> 
> Before this patch, process leaves its ->cpuset and migrates to some "random"
> any_online_cpu(). With this patch it stays within ->cpuset and migrates to
> CPU 3.

The decision to bind a task to a specific cpu, was taken by the userspace
for a reason, which is _unknown_ to the kernel.
So logically, shouldn't the userspace decide what should be 
the fate of those exclusive-affined tasks, whose cpu is about to go
offline? After all, the reason to offline the cpu is, again, unknown to
the kernel.

Though we have been breaking such a task's affinity in  
/* No more Mr. Nice Guy. */ part, IMO a nicer way to do it would be to: 
- Fail the cpu-offline operation with -EBUSY, if there exist userspace tasks 
  exclusively affined to that cpu.
- Provide some kind of infrastructure like a sysfs file
  /sys/devices/system/cpu/cpuX/exclusive_tasks which will help
  the administrator take proactive steps before requesting a
  cpu offline, instead of the kernel taking the reactive step once the
  offline is done.

The side-effect, ofcourse would be that it would break some of the
existing applications, which are not used to cpu-offline failing unless
the cpu was already offline or there was only one online cpu. But is the
side effect so critical, that we continue with this funny contradiction in 
the kernel?! Or is there something important, I'm missing here?

> 
> > Look at it this way.  If we were to merge this patch then it would be
> > logical to also merge a patch which has the following description:
> >
> >   "if an process attempts to pin itself onto an presently-offlined CPU,
> >    the kernel will choose a different CPU according to <heuristics> and
> >    will pin the process to that CPU instead".
> 
> set_cpus_allowed() just returns -EINVAL in that case, this looks a bit
> more logical.
> 

Yup, it sure does!

> Oleg.
> 

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-26  0:16     ` Gautham R Shenoy
@ 2007-08-26  0:47       ` Rusty Russell
  2007-08-26  8:09         ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Rusty Russell @ 2007-08-26  0:47 UTC (permalink / raw)
  To: ego
  Cc: Oleg Nesterov, Andrew Morton, Cliff Wickman, mingo, vatsa, pj,
	linux-kernel, ntl, Dipankar Sarma

On Sun, 2007-08-26 at 05:46 +0530, Gautham R Shenoy wrote:
> On Sat, Aug 25, 2007 at 01:47:40PM +0400, Oleg Nesterov wrote:
> > Before this patch, process leaves its ->cpuset and migrates to some "random"
> > any_online_cpu(). With this patch it stays within ->cpuset and migrates to
> > CPU 3.
> 
> The decision to bind a task to a specific cpu, was taken by the userspace
> for a reason, which is _unknown_ to the kernel.
> So logically, shouldn't the userspace decide what should be 
> the fate of those exclusive-affined tasks, whose cpu is about to go
> offline? After all, the reason to offline the cpu is, again, unknown to
> the kernel.

Userspace is not monolithic.  If you refuse to take a CPU offline
because a task is affine, then any user can prevent a CPU from going
offline.

You could, perhaps, introduce a "gentle" offline which fails if process
affinity can no longer be met.

Rusty.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-26  0:47       ` Rusty Russell
@ 2007-08-26  8:09         ` Andrew Morton
  2007-08-27  7:01           ` Rusty Russell
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-08-26  8:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: ego, Oleg Nesterov, Cliff Wickman, mingo, vatsa, pj, linux-kernel,
	ntl, Dipankar Sarma

On Sun, 26 Aug 2007 10:47:24 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Sun, 2007-08-26 at 05:46 +0530, Gautham R Shenoy wrote:
> > On Sat, Aug 25, 2007 at 01:47:40PM +0400, Oleg Nesterov wrote:
> > > Before this patch, process leaves its ->cpuset and migrates to some "random"
> > > any_online_cpu(). With this patch it stays within ->cpuset and migrates to
> > > CPU 3.
> > 
> > The decision to bind a task to a specific cpu, was taken by the userspace
> > for a reason, which is _unknown_ to the kernel.
> > So logically, shouldn't the userspace decide what should be 
> > the fate of those exclusive-affined tasks, whose cpu is about to go
> > offline? After all, the reason to offline the cpu is, again, unknown to
> > the kernel.
> 
> Userspace is not monolithic.  If you refuse to take a CPU offline
> because a task is affine, then any user can prevent a CPU from going
> offline.

That's a kernel bug.

> You could, perhaps, introduce a "gentle" offline which fails if process
> affinity can no longer be met.

Suitably privileged userspace should be able to

1) prevent tasks from binding to CPU N then
2) migrate all tasks which can use CPU N over to other CPU(s) then
3) offline CPU N.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset
  2007-08-26  8:09         ` Andrew Morton
@ 2007-08-27  7:01           ` Rusty Russell
  0 siblings, 0 replies; 25+ messages in thread
From: Rusty Russell @ 2007-08-27  7:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ego, Oleg Nesterov, Cliff Wickman, mingo, vatsa, pj, linux-kernel,
	ntl, Dipankar Sarma

On Sun, 2007-08-26 at 01:09 -0700, Andrew Morton wrote:
> On Sun, 26 Aug 2007 10:47:24 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:
> > Userspace is not monolithic.  If you refuse to take a CPU offline
> > because a task is affine, then any user can prevent a CPU from going
> > offline.
> 
> That's a kernel bug.

You mean "would be if it were implemented"?  Although consider the
equivalent forkbomb or thrashing userspace problems, where we just say
"use quotas".

Just to clarify: that is not how we work, we migrate tasks off a dying
CPU, breaking affinity and printing a warning if necessary.  

It was simple and has not proven problematic in practice.  (Userspace
cpu affinity has been a question of optimality not correctness)

> > You could, perhaps, introduce a "gentle" offline which fails if process
> > affinity can no longer be met.
> 
> Suitably privileged userspace should be able to
> 
> 1) prevent tasks from binding to CPU N then
> 2) migrate all tasks which can use CPU N over to other CPU(s) then
> 3) offline CPU N.

Indeed, (1) is missing.  I would hesitate to introduce more
infrastructure in an under-worn and over-buggy part of the kernel
though.

Rusty.



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-08-28 17:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-23 21:29 [PATCH 1/1] hotplug cpu: migrate a task within its cpuset Oleg Nesterov
2007-05-23 22:56 ` Cliff Wickman
2007-05-23 23:32   ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2007-08-24 22:18 Cliff Wickman
2007-08-24 22:54 ` Andrew Morton
2007-08-25  9:47   ` Oleg Nesterov
2007-08-26  0:16     ` Gautham R Shenoy
2007-08-26  0:47       ` Rusty Russell
2007-08-26  8:09         ` Andrew Morton
2007-08-27  7:01           ` Rusty Russell
2007-08-25  9:58 ` Oleg Nesterov
2007-08-25 11:18 ` Oleg Nesterov
2007-05-21 20:08 Cliff Wickman
2007-05-23 17:43 ` Andrew Morton
2007-05-24  7:56   ` Ingo Molnar
2007-05-29 19:32     ` Andrew Morton
2007-03-09 19:39 Cliff Wickman
2007-03-09 23:58 ` Nathan Lynch
2007-03-15  0:36   ` Robin Holt
2007-03-15  7:32     ` Nathan Lynch
2007-03-10  9:19 ` Ingo Molnar
2007-03-10 15:51   ` Nathan Lynch
2007-03-10 17:08     ` Ingo Molnar
2007-03-07 14:24 Cliff Wickman
2007-03-07 18:25 ` Randy Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox