From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932159Ab3JIR2e (ORCPT ); Wed, 9 Oct 2013 13:28:34 -0400 Received: from terminus.zytor.com ([198.137.202.10]:56132 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756094Ab3JIR2b (ORCPT ); Wed, 9 Oct 2013 13:28:31 -0400 Date: Wed, 9 Oct 2013 10:27:52 -0700 From: tip-bot for Mel Gorman Message-ID: Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org, peterz@infradead.org, hannes@cmpxchg.org, riel@redhat.com, aarcange@redhat.com, srikar@linux.vnet.ibm.com, mgorman@suse.de, tglx@linutronix.de Reply-To: mingo@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org, peterz@infradead.org, hannes@cmpxchg.org, riel@redhat.com, aarcange@redhat.com, srikar@linux.vnet.ibm.com, mgorman@suse.de, tglx@linutronix.de In-Reply-To: <1381141781-10992-25-git-send-email-mgorman@suse.de> References: <1381141781-10992-25-git-send-email-mgorman@suse.de> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/core] sched/numa: Reschedule task on preferred NUMA node once selected Git-Commit-ID: e6628d5b0a2979f3e0ee6f7783ede5df50cb9ede X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (terminus.zytor.com [127.0.0.1]); Wed, 09 Oct 2013 10:27:58 -0700 (PDT) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: e6628d5b0a2979f3e0ee6f7783ede5df50cb9ede Gitweb: http://git.kernel.org/tip/e6628d5b0a2979f3e0ee6f7783ede5df50cb9ede Author: Mel Gorman AuthorDate: Mon, 7 Oct 2013 11:29:02 +0100 Committer: Ingo Molnar CommitDate: Wed, 9 Oct 2013 12:40:28 +0200 sched/numa: Reschedule task on preferred NUMA node once selected A preferred node is selected based on the node the most NUMA hinting faults was incurred on. There is no guarantee that the task is running on that node at the time so this patch rescheules the task to run on the most idle CPU of the selected node when selected. This avoids waiting for the balancer to make a decision. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Srikar Dronamraju Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1381141781-10992-25-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 19 +++++++++++++++++++ kernel/sched/fair.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 1 + 3 files changed, 65 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7e6b6f..66b878e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4348,6 +4348,25 @@ fail: return ret; } +#ifdef CONFIG_NUMA_BALANCING +/* Migrate current task p to target_cpu */ +int migrate_task_to(struct task_struct *p, int target_cpu) +{ + struct migration_arg arg = { p, target_cpu }; + int curr_cpu = task_cpu(p); + + if (curr_cpu == target_cpu) + return 0; + + if (!cpumask_test_cpu(target_cpu, tsk_cpus_allowed(p))) + return -EINVAL; + + /* TODO: This is not properly updating schedstats */ + + return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg); +} +#endif + /* * migration_cpu_stop - this will be executed by a highprio stopper thread * and performs thread migration by bumping thread off CPU then diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8943124..8b15e9e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -886,6 +886,31 @@ static unsigned int task_scan_max(struct task_struct *p) */ unsigned int sysctl_numa_balancing_settle_count __read_mostly = 3; +static unsigned long weighted_cpuload(const int cpu); + + +static int +find_idlest_cpu_node(int this_cpu, int nid) +{ + unsigned long load, min_load = ULONG_MAX; + int i, idlest_cpu = this_cpu; + + BUG_ON(cpu_to_node(this_cpu) == nid); + + rcu_read_lock(); + for_each_cpu(i, cpumask_of_node(nid)) { + load = weighted_cpuload(i); + + if (load < min_load) { + min_load = load; + idlest_cpu = i; + } + } + rcu_read_unlock(); + + return idlest_cpu; +} + static void task_numa_placement(struct task_struct *p) { int seq, nid, max_nid = -1; @@ -916,10 +941,29 @@ static void task_numa_placement(struct task_struct *p) } } - /* Update the tasks preferred node if necessary */ + /* + * Record the preferred node as the node with the most faults, + * requeue the task to be running on the idlest CPU on the + * preferred node and reset the scanning rate to recheck + * the working set placement. + */ if (max_faults && max_nid != p->numa_preferred_nid) { + int preferred_cpu; + + /* + * If the task is not on the preferred node then find the most + * idle CPU to migrate to. + */ + preferred_cpu = task_cpu(p); + if (cpu_to_node(preferred_cpu) != max_nid) { + preferred_cpu = find_idlest_cpu_node(preferred_cpu, + max_nid); + } + + /* Update the preferred nid and migrate task if possible */ p->numa_preferred_nid = max_nid; p->numa_migrate_seq = 0; + migrate_task_to(p, preferred_cpu); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 199099c..66458c9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -557,6 +557,7 @@ static inline u64 rq_clock_task(struct rq *rq) } #ifdef CONFIG_NUMA_BALANCING +extern int migrate_task_to(struct task_struct *p, int cpu); static inline void task_numa_free(struct task_struct *p) { kfree(p->numa_faults);