* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)
Objective:
In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.
Test environment:
- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz
Test results:
A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:
i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png
A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:
i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png
Added overhead:
First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.
perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair
Most of the time i see that cfs_tasks and group_node are in L1-dcache line:
│ __list_del(entry->prev, entry->next);
3.51 │ mov 0xb0(%rbp),%rdx
1.75 │ mov 0xa8(%rbp),%rcx
│ pick_next_task_fair():
│ list_move(&p->se.group_node, &rq->cfs_tasks);
│ lea 0xa8(%rbp),%rax
│ __list_del():
group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
│ list_add():
│ __list_add(new, head, head->next);
2.44 │ mov 0x940(%r15),%rdx
│ __list_add():
cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.
Conclusion:
according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.
It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.
Best Regards,
Uladzislau Rezki
Uladzislau Rezki (1):
sched/fair: search a task from the tail of the queue
kernel/sched/fair.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
@ 2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04 9:46 ` Peter Zijlstra
2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
0 siblings, 2 replies; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki
From: Uladzislau Rezki <urezki@gmail.com>
As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.
Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.
Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:
- tends to pick a task with highest wait time;
- tasks located in the tail are less likely cache-hot,
therefore the can_migrate_task() decision is higher.
hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:
default: 0.657 avg
patched: 0.646 avg
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
kernel/sched/fair.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1d51c0..b598907e5236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6354,10 +6354,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
set_next_entity(cfs_rq, se);
}
- if (hrtick_enabled(rq))
- hrtick_start_fair(rq, p);
-
- return p;
+ goto done;
simple:
cfs_rq = &rq->cfs;
#endif
@@ -6375,6 +6372,16 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
p = task_of(se);
+done: __maybe_unused
+#ifdef CONFIG_SMP
+ /*
+ * Move the next running task to the front of
+ * the list, so our cfs_tasks list becomes MRU
+ * one.
+ */
+ list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
@@ -6806,11 +6813,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
*/
static struct task_struct *detach_one_task(struct lb_env *env)
{
- struct task_struct *p, *n;
+ struct task_struct *p;
lockdep_assert_held(&env->src_rq->lock);
- list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+ list_for_each_entry_reverse(p,
+ &env->src_rq->cfs_tasks, se.group_node) {
if (!can_migrate_task(p, env))
continue;
@@ -6856,7 +6864,7 @@ static int detach_tasks(struct lb_env *env)
if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
break;
- p = list_first_entry(tasks, struct task_struct, se.group_node);
+ p = list_last_entry(tasks, struct task_struct, se.group_node);
env->loop++;
/* We've more or less seen every task there is, call it quits */
@@ -6906,7 +6914,7 @@ static int detach_tasks(struct lb_env *env)
continue;
next:
- list_move_tail(&p->se.group_node, tasks);
+ list_move(&p->se.group_node, tasks);
}
/*
--
2.11.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2] sched/fair: search a task from the tail of the queue
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
@ 2017-10-04 9:46 ` Peter Zijlstra
2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2017-10-04 9:46 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
Kirill Tkhai, Tim Chen, Nicolas Pitre
On Wed, Sep 13, 2017 at 12:24:30PM +0200, Uladzislau Rezki (Sony) wrote:
> From: Uladzislau Rezki <urezki@gmail.com>
>
> As a first step this patch makes cfs_tasks list as MRU one.
> It means, that when a next task is picked to run on physical
> CPU it is moved to the front of the list.
>
> Therefore, the cfs_tasks list is more or less sorted (except
> woken tasks) starting from recently given CPU time tasks toward
> tasks with max wait time in a run-queue, i.e. MRU list.
>
> Second, as part of the load balance operation, this approach
> starts detach_tasks()/detach_one_task() from the tail of the
> queue instead of the head, giving some advantages:
>
> - tends to pick a task with highest wait time;
> - tasks located in the tail are less likely cache-hot,
> therefore the can_migrate_task() decision is higher.
>
> hackbench illustrates slightly better performance. For example
> doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
> figures:
>
> default: 0.657 avg
> patched: 0.646 avg
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:sched/core] sched/fair: Search a task from the tail of the queue
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04 9:46 ` Peter Zijlstra
@ 2017-10-10 10:58 ` tip-bot for Uladzislau Rezki
1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Uladzislau Rezki @ 2017-10-10 10:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, peterz, mingo, tkhai, umgwanakikbuti, nicolas.pitre,
oleksiy.avramchenko, tim.c.chen, tglx, pjt, torvalds, efault,
linux-kernel, urezki, rostedt, oleg
Commit-ID: 93824900a2e242766f5fe6ae7697e3d7171aa234
Gitweb: https://git.kernel.org/tip/93824900a2e242766f5fe6ae7697e3d7171aa234
Author: Uladzislau Rezki <urezki@gmail.com>
AuthorDate: Wed, 13 Sep 2017 12:24:30 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:30 +0200
sched/fair: Search a task from the tail of the queue
As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.
Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.
Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:
- tends to pick a task with highest wait time;
- tasks located in the tail are less likely cache-hot,
therefore the can_migrate_task() decision is higher.
hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:
default: 0.657 avg
patched: 0.646 avg
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170913102430.8985-2-urezki@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/fair.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac6602c..cc0bfb0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6628,10 +6628,7 @@ again:
set_next_entity(cfs_rq, se);
}
- if (hrtick_enabled(rq))
- hrtick_start_fair(rq, p);
-
- return p;
+ goto done;
simple:
#endif
@@ -6645,6 +6642,16 @@ simple:
p = task_of(se);
+done: __maybe_unused
+#ifdef CONFIG_SMP
+ /*
+ * Move the next running task to the front of
+ * the list, so our cfs_tasks list becomes MRU
+ * one.
+ */
+ list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
@@ -7080,11 +7087,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
*/
static struct task_struct *detach_one_task(struct lb_env *env)
{
- struct task_struct *p, *n;
+ struct task_struct *p;
lockdep_assert_held(&env->src_rq->lock);
- list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+ list_for_each_entry_reverse(p,
+ &env->src_rq->cfs_tasks, se.group_node) {
if (!can_migrate_task(p, env))
continue;
@@ -7130,7 +7138,7 @@ static int detach_tasks(struct lb_env *env)
if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
break;
- p = list_first_entry(tasks, struct task_struct, se.group_node);
+ p = list_last_entry(tasks, struct task_struct, se.group_node);
env->loop++;
/* We've more or less seen every task there is, call it quits */
@@ -7180,7 +7188,7 @@ static int detach_tasks(struct lb_env *env)
continue;
next:
- list_move_tail(&p->se.group_node, tasks);
+ list_move(&p->se.group_node, tasks);
}
/*
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-10 11:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04 9:46 ` Peter Zijlstra
2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox