* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)
Objective:
In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.
Test environment:
- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz
Test results:
A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:
i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png
A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:
i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png
Added overhead:
First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.
perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair
Most of the time i see that cfs_tasks and group_node are in L1-dcache line:
│ __list_del(entry->prev, entry->next);
3.51 │ mov 0xb0(%rbp),%rdx
1.75 │ mov 0xa8(%rbp),%rcx
│ pick_next_task_fair():
│ list_move(&p->se.group_node, &rq->cfs_tasks);
│ lea 0xa8(%rbp),%rax
│ __list_del():
group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
│ list_add():
│ __list_add(new, head, head->next);
2.44 │ mov 0x940(%r15),%rdx
│ __list_add():
cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.
Conclusion:
according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.
It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.
Best Regards,
Uladzislau Rezki
Uladzislau Rezki (1):
sched/fair: search a task from the tail of the queue
kernel/sched/fair.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [RFC PATCH v2] sched/fair: search a task from the tail of the queue 2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony) @ 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki 0 siblings, 2 replies; 4+ messages in thread From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko, Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith, Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki From: Uladzislau Rezki <urezki@gmail.com> As a first step this patch makes cfs_tasks list as MRU one. It means, that when a next task is picked to run on physical CPU it is moved to the front of the list. Therefore, the cfs_tasks list is more or less sorted (except woken tasks) starting from recently given CPU time tasks toward tasks with max wait time in a run-queue, i.e. MRU list. Second, as part of the load balance operation, this approach starts detach_tasks()/detach_one_task() from the tail of the queue instead of the head, giving some advantages: - tends to pick a task with highest wait time; - tasks located in the tail are less likely cache-hot, therefore the can_migrate_task() decision is higher. hackbench illustrates slightly better performance. For example doing 1000 samples and 40 groups on i5-3320M CPU, it shows below figures: default: 0.657 avg patched: 0.646 avg Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/sched/fair.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c77e4b1d51c0..b598907e5236 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6354,10 +6354,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf set_next_entity(cfs_rq, se); } - if (hrtick_enabled(rq)) - hrtick_start_fair(rq, p); - - return p; + goto done; simple: cfs_rq = &rq->cfs; #endif @@ -6375,6 +6372,16 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf p = task_of(se); +done: __maybe_unused +#ifdef CONFIG_SMP + /* + * Move the next running task to the front of + * the list, so our cfs_tasks list becomes MRU + * one. + */ + list_move(&p->se.group_node, &rq->cfs_tasks); +#endif + if (hrtick_enabled(rq)) hrtick_start_fair(rq, p); @@ -6806,11 +6813,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env) */ static struct task_struct *detach_one_task(struct lb_env *env) { - struct task_struct *p, *n; + struct task_struct *p; lockdep_assert_held(&env->src_rq->lock); - list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) { + list_for_each_entry_reverse(p, + &env->src_rq->cfs_tasks, se.group_node) { if (!can_migrate_task(p, env)) continue; @@ -6856,7 +6864,7 @@ static int detach_tasks(struct lb_env *env) if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) break; - p = list_first_entry(tasks, struct task_struct, se.group_node); + p = list_last_entry(tasks, struct task_struct, se.group_node); env->loop++; /* We've more or less seen every task there is, call it quits */ @@ -6906,7 +6914,7 @@ static int detach_tasks(struct lb_env *env) continue; next: - list_move_tail(&p->se.group_node, tasks); + list_move(&p->se.group_node, tasks); } /* -- 2.11.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2] sched/fair: search a task from the tail of the queue 2017-09-13 10:24 ` Uladzislau Rezki (Sony) @ 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki 1 sibling, 0 replies; 4+ messages in thread From: Peter Zijlstra @ 2017-10-04 9:46 UTC (permalink / raw) To: Uladzislau Rezki (Sony) Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko, Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith, Kirill Tkhai, Tim Chen, Nicolas Pitre On Wed, Sep 13, 2017 at 12:24:30PM +0200, Uladzislau Rezki (Sony) wrote: > From: Uladzislau Rezki <urezki@gmail.com> > > As a first step this patch makes cfs_tasks list as MRU one. > It means, that when a next task is picked to run on physical > CPU it is moved to the front of the list. > > Therefore, the cfs_tasks list is more or less sorted (except > woken tasks) starting from recently given CPU time tasks toward > tasks with max wait time in a run-queue, i.e. MRU list. > > Second, as part of the load balance operation, this approach > starts detach_tasks()/detach_one_task() from the tail of the > queue instead of the head, giving some advantages: > > - tends to pick a task with highest wait time; > - tasks located in the tail are less likely cache-hot, > therefore the can_migrate_task() decision is higher. > > hackbench illustrates slightly better performance. For example > doing 1000 samples and 40 groups on i5-3320M CPU, it shows below > figures: > > default: 0.657 avg > patched: 0.646 avg > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Thanks! ^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:sched/core] sched/fair: Search a task from the tail of the queue 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra @ 2017-10-10 10:58 ` tip-bot for Uladzislau Rezki 1 sibling, 0 replies; 4+ messages in thread From: tip-bot for Uladzislau Rezki @ 2017-10-10 10:58 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, peterz, mingo, tkhai, umgwanakikbuti, nicolas.pitre, oleksiy.avramchenko, tim.c.chen, tglx, pjt, torvalds, efault, linux-kernel, urezki, rostedt, oleg Commit-ID: 93824900a2e242766f5fe6ae7697e3d7171aa234 Gitweb: https://git.kernel.org/tip/93824900a2e242766f5fe6ae7697e3d7171aa234 Author: Uladzislau Rezki <urezki@gmail.com> AuthorDate: Wed, 13 Sep 2017 12:24:30 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 10 Oct 2017 11:45:30 +0200 sched/fair: Search a task from the tail of the queue As a first step this patch makes cfs_tasks list as MRU one. It means, that when a next task is picked to run on physical CPU it is moved to the front of the list. Therefore, the cfs_tasks list is more or less sorted (except woken tasks) starting from recently given CPU time tasks toward tasks with max wait time in a run-queue, i.e. MRU list. Second, as part of the load balance operation, this approach starts detach_tasks()/detach_one_task() from the tail of the queue instead of the head, giving some advantages: - tends to pick a task with highest wait time; - tasks located in the tail are less likely cache-hot, therefore the can_migrate_task() decision is higher. hackbench illustrates slightly better performance. For example doing 1000 samples and 40 groups on i5-3320M CPU, it shows below figures: default: 0.657 avg patched: 0.646 avg Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20170913102430.8985-2-urezki@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ac6602c..cc0bfb0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6628,10 +6628,7 @@ again: set_next_entity(cfs_rq, se); } - if (hrtick_enabled(rq)) - hrtick_start_fair(rq, p); - - return p; + goto done; simple: #endif @@ -6645,6 +6642,16 @@ simple: p = task_of(se); +done: __maybe_unused +#ifdef CONFIG_SMP + /* + * Move the next running task to the front of + * the list, so our cfs_tasks list becomes MRU + * one. + */ + list_move(&p->se.group_node, &rq->cfs_tasks); +#endif + if (hrtick_enabled(rq)) hrtick_start_fair(rq, p); @@ -7080,11 +7087,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env) */ static struct task_struct *detach_one_task(struct lb_env *env) { - struct task_struct *p, *n; + struct task_struct *p; lockdep_assert_held(&env->src_rq->lock); - list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) { + list_for_each_entry_reverse(p, + &env->src_rq->cfs_tasks, se.group_node) { if (!can_migrate_task(p, env)) continue; @@ -7130,7 +7138,7 @@ static int detach_tasks(struct lb_env *env) if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) break; - p = list_first_entry(tasks, struct task_struct, se.group_node); + p = list_last_entry(tasks, struct task_struct, se.group_node); env->loop++; /* We've more or less seen every task there is, call it quits */ @@ -7180,7 +7188,7 @@ static int detach_tasks(struct lb_env *env) continue; next: - list_move_tail(&p->se.group_node, tasks); + list_move(&p->se.group_node, tasks); } /* ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-10 11:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony) 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox