All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <piggin@cyberone.com.au>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Anton Blanchard <anton@samba.org>, Ingo Molnar <mingo@redhat.com>,
	"Martin J. Bligh" <mbligh@aracnet.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	Mark Wong <markw@osdl.org>, John Hawkes <hawkes@sgi.com>
Subject: Re: [CFT][RFC] HT scheduler
Date: Fri, 19 Dec 2003 15:57:13 +1100	[thread overview]
Message-ID: <3FE28529.1010003@cyberone.com.au> (raw)
In-Reply-To: <3FDE3EF7.7000001@cyberone.com.au>

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]



Nick Piggin wrote:

>
>
> Rusty Russell wrote:
>
>>
>> A few things need work:
>>
>> 1) There's a race between sys_sched_setaffinity() and
>>   sched_migrate_task() (this is nothing to do with your patch).
>>
>
> Yep. They should both take the task's runqueue lock. 


Easier said than done... anyway, how does this patch look?
It should also cure a possible and not entirely unlikely use
after free of the task_struct in sched_migrate_task on NUMA
AFAIKS.

Patch is on top of a few other changes so might not apply, just
for review. I'll release a new rollup with this included soon.

>
>
>>
>> 2) Please change those #defines into an enum for idle (patch follows,
>>   untested but trivial)
>>
>
> Thanks, I'll take the patch.


done

>
>>
>> 3) conditional locking in load_balance is v. bad idea.
>>
>
> Yeah... I'm thinking about this. I don't think it should be too hard
> to break out the shared portion. 


done

>
>
>>
>> 4) load_balance returns "(!failed && !balanced)", but callers stop
>>   calling it when it returns true.  Why not simply return "balanced",
>>   or at least "balanced && !failed"?
>>
>>
>
> No, the idle balancer stops calling it when it returns true, the periodic
> balancer sets idle to 0 when it returns true.
>
> !balanced && !failed means it has moved a task.
>
> I'll either comment that, or return it in a more direct way.
>

done



[-- Attachment #2: sched-migrate-affinity-race.patch --]
[-- Type: text/plain, Size: 4423 bytes --]


Prevents a race where sys_sched_setaffinity can race with sched_migrate_task
and cause sched_migrate_task to restore an invalid cpu mask.


 linux-2.6-npiggin/kernel/sched.c |   89 +++++++++++++++++++++++++++++----------
 1 files changed, 68 insertions(+), 21 deletions(-)

diff -puN kernel/sched.c~sched-migrate-affinity-race kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-migrate-affinity-race	2003-12-19 14:45:58.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c	2003-12-19 15:19:30.000000000 +1100
@@ -947,6 +947,9 @@ static inline void double_rq_unlock(runq
 }
 
 #ifdef CONFIG_NUMA
+
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags);
+
 /*
  * If dest_cpu is allowed for this process, migrate the task to it.
  * This is accomplished by forcing the cpu_allowed mask to only
@@ -955,16 +958,43 @@ static inline void double_rq_unlock(runq
  */
 static void sched_migrate_task(task_t *p, int dest_cpu)
 {
-	cpumask_t old_mask;
+	runqueue_t *rq;
+	unsigned long flags;
+	cpumask_t old_mask, new_mask = cpumask_of_cpu(dest_cpu);
 
+	rq = task_rq_lock(p, &flags);
 	old_mask = p->cpus_allowed;
-	if (!cpu_isset(dest_cpu, old_mask))
+	if (!cpu_isset(dest_cpu, old_mask)) {
+		task_rq_unlock(rq, &flags);
 		return;
+	}
+
+	get_task_struct(p);
+
 	/* force the process onto the specified CPU */
-	set_cpus_allowed(p, cpumask_of_cpu(dest_cpu));
+	if (__set_cpus_allowed(p, new_mask, &flags) < 0)
+		goto out;
+
+	/* __set_cpus_allowed unlocks the runqueue */
+	rq = task_rq_lock(p, &flags);
+	if (unlikely(p->cpus_allowed != new_mask)) {
+		/*
+		 * We have raced with another set_cpus_allowed.
+		 * old_mask is invalid and we needn't move the
+		 * task back.
+		 */
+		task_rq_unlock(rq, &flags);
+		goto out;
+	}
+
+	/*
+	 * restore the cpus allowed mask. old_mask must be valid because
+	 * p->cpus_allowed is a subset of old_mask.
+	 */
+	__set_cpus_allowed(p, old_mask, &flags);
 
-	/* restore the cpus allowed mask */
-	set_cpus_allowed(p, old_mask);
+out:
+	put_task_struct(p);
 }
 
 /*
@@ -2603,31 +2633,27 @@ typedef struct {
 } migration_req_t;
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely.  The
- * call is not atomic; no spinlocks may be held.
+ * See comment for set_cpus_allowed. calling rules are different:
+ * the task's runqueue lock must be held, and __set_cpus_allowed
+ * will return with the runqueue unlocked.
  */
-int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags)
 {
-	unsigned long flags;
 	migration_req_t req;
-	runqueue_t *rq;
+	runqueue_t *rq = task_rq(p);
 
-	if (any_online_cpu(new_mask) == NR_CPUS)
+	if (any_online_cpu(new_mask) == NR_CPUS) {
+		task_rq_unlock(rq, flags);
 		return -EINVAL;
+	}
 
-	rq = task_rq_lock(p, &flags);
 	p->cpus_allowed = new_mask;
 	/*
 	 * Can the task run on the task's current CPU? If not then
 	 * migrate the thread off to a proper CPU.
 	 */
 	if (cpu_isset(task_cpu(p), new_mask)) {
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, flags);
 		return 0;
 	}
 	/*
@@ -2636,18 +2662,39 @@ int set_cpus_allowed(task_t *p, cpumask_
 	 */
 	if (!p->array && !task_running(rq, p)) {
 		set_task_cpu(p, any_online_cpu(p->cpus_allowed));
-		task_rq_unlock(rq, &flags);
+		task_rq_unlock(rq, flags);
 		return 0;
 	}
+
 	init_completion(&req.done);
 	req.task = p;
 	list_add(&req.list, &rq->migration_queue);
-	task_rq_unlock(rq, &flags);
+	task_rq_unlock(rq, flags);
 
 	wake_up_process(rq->migration_thread);
-
 	wait_for_completion(&req.done);
+
 	return 0;
+
+}
+
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely.  The
+ * call is not atomic; no spinlocks may be held.
+ */
+int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+{
+	unsigned long flags;
+	runqueue_t *rq;
+
+	rq = task_rq_lock(p, &flags);
+
+	return __set_cpus_allowed(p, new_mask, &flags);
 }
 
 EXPORT_SYMBOL_GPL(set_cpus_allowed);

_

  reply	other threads:[~2003-12-19  4:57 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-08  4:25 [PATCH][RFC] make cpu_sibling_map a cpumask_t Nick Piggin
2003-12-08 15:59 ` Anton Blanchard
2003-12-08 23:08   ` Nick Piggin
2003-12-09  0:14     ` Anton Blanchard
2003-12-11  4:25       ` [CFT][RFC] HT scheduler Nick Piggin
2003-12-11  7:24         ` Nick Piggin
2003-12-11  8:57           ` Nick Piggin
2003-12-11 11:52             ` William Lee Irwin III
2003-12-11 13:09               ` Nick Piggin
2003-12-11 13:23                 ` William Lee Irwin III
2003-12-11 13:30                   ` Nick Piggin
2003-12-11 13:32                     ` William Lee Irwin III
2003-12-11 15:30                       ` Nick Piggin
2003-12-11 15:38                         ` William Lee Irwin III
2003-12-11 15:51                           ` Nick Piggin
2003-12-11 15:56                             ` William Lee Irwin III
2003-12-11 16:37                               ` Nick Piggin
2003-12-11 16:40                                 ` William Lee Irwin III
2003-12-12  1:52                         ` [PATCH] improve rwsem scalability (was Re: [CFT][RFC] HT scheduler) Nick Piggin
2003-12-12  2:02                           ` Nick Piggin
2003-12-12  9:41                           ` Ingo Molnar
2003-12-13  0:07                             ` Nick Piggin
2003-12-14  0:44                               ` Nick Piggin
2003-12-17  5:27                                 ` Nick Piggin
2003-12-19 11:52                                   ` Nick Piggin
2003-12-19 15:06                                     ` Martin J. Bligh
2003-12-20  0:08                                       ` Nick Piggin
2003-12-12  0:58             ` [CFT][RFC] HT scheduler Rusty Russell
2003-12-11 10:01           ` Rhino
2003-12-11  8:14             ` Nick Piggin
2003-12-11 16:49               ` Rhino
2003-12-11 15:16                 ` Nick Piggin
2003-12-11 11:40             ` William Lee Irwin III
2003-12-11 17:05               ` Rhino
2003-12-11 15:17                 ` William Lee Irwin III
2003-12-11 16:28         ` Kevin P. Fleming
2003-12-11 16:41           ` Nick Piggin
2003-12-12  2:24         ` Rusty Russell
2003-12-12  7:00           ` Nick Piggin
2003-12-12  7:23             ` Rusty Russell
2003-12-13  6:43               ` Nick Piggin
2003-12-14  1:35                 ` bill davidsen
2003-12-14  2:18                   ` Nick Piggin
2003-12-14  4:32                     ` Jamie Lokier
2003-12-14  9:40                       ` Nick Piggin
2003-12-14 10:46                         ` Arjan van de Ven
2003-12-16 17:46                         ` Bill Davidsen
2003-12-16 18:22                       ` Linus Torvalds
2003-12-17  0:24                         ` Davide Libenzi
2003-12-17  0:41                           ` Linus Torvalds
2003-12-17  0:54                             ` Davide Libenzi
2003-12-16 17:34                     ` Bill Davidsen
2003-12-15  5:53                 ` Rusty Russell
2003-12-15 23:08                   ` Nick Piggin
2003-12-19  4:57                     ` Nick Piggin [this message]
2003-12-19  5:13                       ` Nick Piggin
2003-12-20  2:43                       ` Rusty Russell
2003-12-21  2:56                         ` Nick Piggin
2004-01-03 18:57                   ` Bill Davidsen
2003-12-15 20:21                 ` Zwane Mwaikambo
2003-12-15 23:20                   ` Nick Piggin
2003-12-16  0:11                     ` Zwane Mwaikambo
2003-12-12  8:59             ` Nick Piggin
2003-12-12 15:14               ` Martin J. Bligh
2003-12-08 19:44 ` [PATCH][RFC] make cpu_sibling_map a cpumask_t James Cleverdon
2003-12-08 20:38 ` Ingo Molnar
2003-12-08 20:51 ` Zwane Mwaikambo
2003-12-08 20:55   ` Ingo Molnar
2003-12-08 23:17     ` Nick Piggin
2003-12-08 23:36       ` Ingo Molnar
2003-12-08 23:58         ` Nick Piggin
2003-12-08 23:46 ` Rusty Russell
2003-12-09 13:36   ` Nick Piggin
2003-12-11 21:41     ` bill davidsen
     [not found] <20031213022038.300B22C2C1@lists.samba.org.suse.lists.linux.kernel>
     [not found] ` <3FDAB517.4000309@cyberone.com.au.suse.lists.linux.kernel>
     [not found]   ` <brgeo7$huv$1@gatekeeper.tmr.com.suse.lists.linux.kernel>
     [not found]     ` <3FDBC876.3020603@cyberone.com.au.suse.lists.linux.kernel>
     [not found]       ` <20031214043245.GC21241@mail.shareable.org.suse.lists.linux.kernel>
     [not found]         ` <3FDC3023.9030708@cyberone.com.au.suse.lists.linux.kernel>
     [not found]           ` <1071398761.5233.1.camel@laptop.fenrus.com.suse.lists.linux.kernel>
2003-12-14 16:26             ` [CFT][RFC] HT scheduler Andi Kleen
2003-12-14 16:54               ` Arjan van de Ven
     [not found] <200312161127.13691.habanero@us.ibm.com>
2003-12-16 17:37 ` Andrew Theurer
2003-12-17  2:41   ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2003-12-16 19:03 Nakajima, Jun
2003-12-17  0:38 Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3FE28529.1010003@cyberone.com.au \
    --to=piggin@cyberone.com.au \
    --cc=anton@samba.org \
    --cc=hawkes@sgi.com \
    --cc=jun.nakajima@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markw@osdl.org \
    --cc=mbligh@aracnet.com \
    --cc=mingo@redhat.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.