From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264455AbUAVIXe (ORCPT ); Thu, 22 Jan 2004 03:23:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266006AbUAVIXe (ORCPT ); Thu, 22 Jan 2004 03:23:34 -0500 Received: from dp.samba.org ([66.70.73.150]:41401 "EHLO lists.samba.org") by vger.kernel.org with ESMTP id S264455AbUAVIWs (ORCPT ); Thu, 22 Jan 2004 03:22:48 -0500 From: Rusty Russell To: Nick Piggin Cc: Andrew Morton , mingo@elte.hu, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Directed migration: Don't Change cpumask in sched_balance_exec() In-reply-to: Your message of "Thu, 22 Jan 2004 16:19:45 +1100." <400F5D71.7010702@cyberone.com.au> Date: Thu, 22 Jan 2004 18:30:10 +1100 Message-Id: <20040122082303.333432C08A@lists.samba.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org In message <400F5D71.7010702@cyberone.com.au> you write: > Hi Rusty, > Yes I do like it. It hass bothered me that the visible cpus_allowed > mask is changed in order to do the balancing. Thanks Rusty. Ingo? Note that I should have used for_each_online_cpu, not for_each_cpu, since we explicitly don't want to look at offline cpus. New patch with one-line change below... Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: Directed Migration: Don't Change cpumask in sched_balance_exec() Author: Rusty Russell Status: Experimental D: The current sched_balance_exec() sets the task's cpus_allowed mask D: temporarily to move it to a different CPU. This has several D: issues, including the fact that a task will see its affinity at a D: bogus value. D: D: So we change the migration_req_t to explicitly specify a D: destination CPU, rather than the migration thread deriving it from D: cpus_allowed. If the requested CPU is no longer valid (racing with D: another set_cpus_allowed, say), it can be ignored: if the task is D: not allowed on this CPU, there will be another migration request D: pending. D: D: This change allows sched_balance_exec() to tell the migration D: thread what to do without changing the cpus_allowed mask. D: D: So we rename __set_cpus_allowed() to move_task(), as the D: cpus_allowed mask is now set by the caller. And move_task_away(), D: which the migration thread uses to actually perform the move, is D: renamed __move_task(). D: D: I also ignore offline CPUs in sched_best_cpu(), so sched_migrate_task() D: doesn't need to check for offline CPUs. D: D: Alterior motive: this approach also plays well with CPU Hotplug. D: Previously that patch might have seen a task with cpus_allowed only D: containing the dying CPU (temporarily due to sched_balance_exec) and D: forcibly reset it to all cpus, which might be wrong. The other approach D: is to hold the cpucontrol sem around sched_balance_exec(), which is D: too much of a bottleneck. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .16376-linux-2.6.2-rc1/kernel/sched.c .16376-linux-2.6.2-rc1.updated/kernel/sched.c --- .16376-linux-2.6.2-rc1/kernel/sched.c 2004-01-21 16:19:08.000000000 +1100 +++ .16376-linux-2.6.2-rc1.updated/kernel/sched.c 2004-01-22 13:34:50.000000000 +1100 @@ -543,26 +543,18 @@ inline int task_curr(task_t *p) typedef struct { struct list_head list; task_t *task; + unsigned int dest_cpu; struct completion done; } migration_req_t; /* - * The task's runqueue lock must be held, and the new mask must be valid. + * The task's runqueue lock must be held. * Returns true if you have to wait for migration thread. */ -static int __set_cpus_allowed(task_t *p, cpumask_t new_mask, - migration_req_t *req) +static int move_task(task_t *p, unsigned int dest_cpu, migration_req_t *req) { runqueue_t *rq = task_rq(p); - p->cpus_allowed = new_mask; - /* - * Can the task run on the task's current CPU? If not then - * migrate the thread off to a proper CPU. - */ - if (cpu_isset(task_cpu(p), new_mask)) - return 0; - /* * If the task is not on a runqueue (and not running), then * it is sufficient to simply update the task's cpu field. @@ -1011,28 +1003,18 @@ static void sched_migrate_task(task_t *p runqueue_t *rq; migration_req_t req; unsigned long flags; - cpumask_t old_mask, new_mask = cpumask_of_cpu(dest_cpu); rq = task_rq_lock(p, &flags); - old_mask = p->cpus_allowed; - if (!cpu_isset(dest_cpu, old_mask) || !cpu_online(dest_cpu)) + if (!cpu_isset(dest_cpu, p->cpus_allowed)) goto out; /* force the process onto the specified CPU */ - if (__set_cpus_allowed(p, new_mask, &req)) { + if (move_task(p, dest_cpu, &req)) { /* Need to wait for migration thread. */ task_rq_unlock(rq, &flags); wake_up_process(rq->migration_thread); wait_for_completion(&req.done); - - /* If we raced with sys_sched_setaffinity, don't - * restore mask. */ - rq = task_rq_lock(p, &flags); - if (likely(cpus_equal(p->cpus_allowed, new_mask))) { - /* Restore old mask: won't need migration - * thread, since current cpu is allowed. */ - BUG_ON(__set_cpus_allowed(p, old_mask, NULL)); - } + return; } out: task_rq_unlock(rq, &flags); @@ -1068,7 +1050,7 @@ static int sched_best_cpu(struct task_st minload = 10000000; cpumask = node_to_cpumask(node); - for (i = 0; i < NR_CPUS; ++i) { + for_each_online_cpu(i) { if (!cpu_isset(i, cpumask)) continue; if (cpu_rq(i)->nr_running < minload) { @@ -2705,7 +2687,12 @@ int set_cpus_allowed(task_t *p, cpumask_ goto out; } - if (__set_cpus_allowed(p, new_mask, &req)) { + p->cpus_allowed = new_mask; + /* Can the task run on the task's current CPU? If so, we're done */ + if (cpu_isset(task_cpu(p), new_mask)) + goto out; + + if (move_task(p, any_online_cpu(new_mask), &req)) { /* Need help from migration thread: drop lock and wait. */ task_rq_unlock(rq, &flags); wake_up_process(rq->migration_thread); @@ -2719,8 +2706,15 @@ out: EXPORT_SYMBOL_GPL(set_cpus_allowed); -/* Move (not current) task off this cpu, onto dest cpu. */ -static void move_task_away(struct task_struct *p, int dest_cpu) +/* Move (not current) task off this cpu, onto dest cpu. We're doing + this because either it can't run here any more (set_cpus_allowed() + away from this CPU, or CPU going down), or because we're + attempting to rebalance this task on exec (sched_balance_exec). + + So we race with normal scheduler movements, but that's OK, as long + as the task is no longer on this CPU. + */ +static void __move_task(struct task_struct *p, int dest_cpu) { runqueue_t *rq_dest; unsigned long flags; @@ -2729,8 +2723,12 @@ static void move_task_away(struct task_s local_irq_save(flags); double_rq_lock(this_rq(), rq_dest); + /* Already moved. */ if (task_cpu(p) != smp_processor_id()) - goto out; /* Already moved */ + goto out; + /* Affinity changed (again). */ + if (!cpu_isset(dest_cpu, p->cpus_allowed)) + goto out; set_task_cpu(p, dest_cpu); if (p->array) { @@ -2800,8 +2798,7 @@ static int migration_thread(void * data) list_del_init(head->next); spin_unlock_irq(&rq->lock); - move_task_away(req->task, - any_online_cpu(req->task->cpus_allowed)); + __move_task(req->task, req->dest_cpu); complete(&req->done); } }