From: riel@redhat.com
To: linux-kernel@vger.kernel.org
Cc: chegu_vinod@hp.com, peterz@infradead.com, mgorman@suse.de,
mingo@kernel.org
Subject: [PATCH 5/7] sched,numa: examine a task move when examining a task swap
Date: Mon, 23 Jun 2014 11:41:33 -0400 [thread overview]
Message-ID: <1403538095-31256-6-git-send-email-riel@redhat.com> (raw)
In-Reply-To: <1403538095-31256-1-git-send-email-riel@redhat.com>
From: Rik van Riel <riel@redhat.com>
Running "perf bench numa mem -0 -m -P 1000 -p 8 -t 20" on a 4
node system results in 160 runnable threads on a system with 80
CPU threads.
Once a process has nearly converged, with 39 threads on one node
and 1 thread on another node, the remaining thread will be unable
to migrate to its preferred node through a task swap.
However, a simple task move would make the workload converge,
witout causing an imbalance.
Test for this unlikely occurrence, and attempt a task move to
the preferred nid when it happens.
# Running main, "perf bench numa mem -p 8 -t 20 -0 -m -P 1000"
###
# 160 tasks will execute (on 4 nodes, 80 CPUs):
# -1x 0MB global shared mem operations
# -1x 1000MB process shared mem operations
# -1x 0MB thread local mem operations
###
###
#
# 0.0% [0.2 mins] 0/0 1/1 36/2 0/0 [36/3 ] l: 0-0 ( 0) {0-2}
# 0.0% [0.3 mins] 43/3 37/2 39/2 41/3 [ 6/10] l: 0-1 ( 1) {1-2}
# 0.0% [0.4 mins] 42/3 38/2 40/2 40/2 [ 4/9 ] l: 1-2 ( 1) [50.0%] {1-2}
# 0.0% [0.6 mins] 41/3 39/2 40/2 40/2 [ 2/9 ] l: 2-4 ( 2) [50.0%] {1-2}
# 0.0% [0.7 mins] 40/2 40/2 40/2 40/2 [ 0/8 ] l: 3-5 ( 2) [40.0%] ( 41.8s converged)
Without this patch, this same perf bench numa mem run had to
rely on the scheduler load balancer to first balance out the
load (moving a random task), before a task swap could complete
the NUMA convergence.
The load balancer does not normally take action unless the load
difference exceeds 25%. Convergence times of over half an hour
have been observed without this patch.
With this patch, the NUMA balancing code will simply migrate the
task, if that does not cause an imbalance.
Also skip examining a CPU in detail if the improvement on that CPU
is no more than the best we already have.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
kernel/sched/fair.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2eb845c..d525451 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1155,6 +1155,7 @@ static void task_numa_compare(struct task_numa_env *env,
long src_load, dst_load;
long load;
long imp = env->p->numa_group ? groupimp : taskimp;
+ long moveimp = imp;
rcu_read_lock();
cur = ACCESS_ONCE(dst_rq->curr);
@@ -1201,7 +1202,7 @@ static void task_numa_compare(struct task_numa_env *env,
}
}
- if (imp < env->best_imp)
+ if (imp <= env->best_imp && moveimp <= env->best_imp)
goto unlock;
if (!cur) {
@@ -1214,7 +1215,8 @@ static void task_numa_compare(struct task_numa_env *env,
}
/* Balance doesn't matter much if we're running a task per cpu */
- if (src_rq->nr_running == 1 && dst_rq->nr_running == 1)
+ if (imp > env->best_imp && src_rq->nr_running == 1 &&
+ dst_rq->nr_running == 1)
goto assign;
/*
@@ -1230,6 +1232,23 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += effective_load(tg, env->src_cpu, -load, -load);
dst_load += effective_load(tg, env->dst_cpu, load, load);
+ if (moveimp > imp && moveimp > env->best_imp) {
+ /*
+ * If the improvement from just moving env->p direction is
+ * better than swapping tasks around, check if a move is
+ * possible. Store a slightly smaller score than moveimp,
+ * so an actually idle CPU will win.
+ */
+ if (!load_too_imbalanced(src_load, dst_load, env)) {
+ imp = moveimp - 1;
+ cur = NULL;
+ goto assign;
+ }
+ }
+
+ if (imp <= env->best_imp)
+ goto unlock;
+
if (cur) {
/* Cur moves in the opposite direction. */
load = task_h_load(cur);
--
1.8.5.3
next prev parent reply other threads:[~2014-06-23 15:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-23 15:41 [PATCH 0/7] sched,numa: improve NUMA convergence times riel
2014-06-23 15:41 ` [PATCH 1/7] sched,numa: use group's max nid as task's preferred nid riel
2014-06-25 10:31 ` Mel Gorman
2014-07-05 10:44 ` [tip:sched/core] sched/numa: Use group's max nid as task' s " tip-bot for Rik van Riel
2014-06-23 15:41 ` [PATCH 3/7] sched,numa: use effective_load to balance NUMA loads riel
2014-06-23 15:41 ` [PATCH 4/7] sched,numa: simplify task_numa_compare riel
2014-06-25 10:39 ` Mel Gorman
2014-06-23 15:41 ` riel [this message]
2014-06-23 15:41 ` [PATCH 6/7] sched,numa: rework best node setting in task_numa_migrate riel
2014-07-05 10:45 ` [tip:sched/core] sched/numa: Rework best node setting in task_numa_migrate() tip-bot for Rik van Riel
2014-06-23 15:41 ` [PATCH 7/7] sched,numa: change scan period code to match intent riel
2014-06-25 10:19 ` Mel Gorman
2014-07-05 10:45 ` [tip:sched/core] sched/numa: Change " tip-bot for Rik van Riel
2014-06-23 22:30 ` [PATCH 8/7] sched,numa: do not let a move increase the imbalance Rik van Riel
2014-06-24 14:38 ` Peter Zijlstra
2014-06-24 15:30 ` Rik van Riel
2014-06-25 1:57 ` Rik van Riel
2014-06-24 19:14 ` [PATCH 9/7] sched,numa: remove task_h_load from task_numa_compare Rik van Riel
2014-06-25 5:07 ` Peter Zijlstra
2014-06-25 5:09 ` Rik van Riel
2014-06-25 5:21 ` Peter Zijlstra
2014-06-25 5:25 ` Rik van Riel
2014-06-25 5:31 ` Peter Zijlstra
2014-06-25 5:39 ` Rik van Riel
2014-06-25 5:57 ` Peter Zijlstra
[not found] <1403538378-31571-1-git-send-email-riel@redhat.com>
2014-06-23 15:46 ` [PATCH 5/7] sched,numa: examine a task move when examining a task swap riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1403538095-31256-6-git-send-email-riel@redhat.com \
--to=riel@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).