* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)
Objective:
In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.
Test environment:
- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz
Test results:
A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:
i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png
i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png
A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:
i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png
Added overhead:
First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.
perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair
Most of the time i see that cfs_tasks and group_node are in L1-dcache line:
│ __list_del(entry->prev, entry->next);
3.51 │ mov 0xb0(%rbp),%rdx
1.75 │ mov 0xa8(%rbp),%rcx
│ pick_next_task_fair():
│ list_move(&p->se.group_node, &rq->cfs_tasks);
│ lea 0xa8(%rbp),%rax
│ __list_del():
group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
│ list_add():
│ __list_add(new, head, head->next);
2.44 │ mov 0x940(%r15),%rdx
│ __list_add():
cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.
In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.
Conclusion:
according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.
It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.
Best Regards,
Uladzislau Rezki
Uladzislau Rezki (1):
sched/fair: search a task from the tail of the queue
kernel/sched/fair.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [RFC PATCH v2] sched/fair: search a task from the tail of the queue 2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony) @ 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki 0 siblings, 2 replies; 4+ messages in thread From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko, Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith, Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki From: Uladzislau Rezki <urezki@gmail.com> As a first step this patch makes cfs_tasks list as MRU one. It means, that when a next task is picked to run on physical CPU it is moved to the front of the list. Therefore, the cfs_tasks list is more or less sorted (except woken tasks) starting from recently given CPU time tasks toward tasks with max wait time in a run-queue, i.e. MRU list. Second, as part of the load balance operation, this approach starts detach_tasks()/detach_one_task() from the tail of the queue instead of the head, giving some advantages: - tends to pick a task with highest wait time; - tasks located in the tail are less likely cache-hot, therefore the can_migrate_task() decision is higher. hackbench illustrates slightly better performance. For example doing 1000 samples and 40 groups on i5-3320M CPU, it shows below figures: default: 0.657 avg patched: 0.646 avg Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/sched/fair.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c77e4b1d51c0..b598907e5236 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6354,10 +6354,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf set_next_entity(cfs_rq, se); } - if (hrtick_enabled(rq)) - hrtick_start_fair(rq, p); - - return p; + goto done; simple: cfs_rq = &rq->cfs; #endif @@ -6375,6 +6372,16 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf p = task_of(se); +done: __maybe_unused +#ifdef CONFIG_SMP + /* + * Move the next running task to the front of + * the list, so our cfs_tasks list becomes MRU + * one. + */ + list_move(&p->se.group_node, &rq->cfs_tasks); +#endif + if (hrtick_enabled(rq)) hrtick_start_fair(rq, p); @@ -6806,11 +6813,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env) */ static struct task_struct *detach_one_task(struct lb_env *env) { - struct task_struct *p, *n; + struct task_struct *p; lockdep_assert_held(&env->src_rq->lock); - list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) { + list_for_each_entry_reverse(p, + &env->src_rq->cfs_tasks, se.group_node) { if (!can_migrate_task(p, env)) continue; @@ -6856,7 +6864,7 @@ static int detach_tasks(struct lb_env *env) if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) break; - p = list_first_entry(tasks, struct task_struct, se.group_node); + p = list_last_entry(tasks, struct task_struct, se.group_node); env->loop++; /* We've more or less seen every task there is, call it quits */ @@ -6906,7 +6914,7 @@ static int detach_tasks(struct lb_env *env) continue; next: - list_move_tail(&p->se.group_node, tasks); + list_move(&p->se.group_node, tasks); } /* -- 2.11.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH v2] sched/fair: search a task from the tail of the queue 2017-09-13 10:24 ` Uladzislau Rezki (Sony) @ 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki 1 sibling, 0 replies; 4+ messages in thread From: Peter Zijlstra @ 2017-10-04 9:46 UTC (permalink / raw) To: Uladzislau Rezki (Sony) Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko, Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith, Kirill Tkhai, Tim Chen, Nicolas Pitre On Wed, Sep 13, 2017 at 12:24:30PM +0200, Uladzislau Rezki (Sony) wrote: > From: Uladzislau Rezki <urezki@gmail.com> > > As a first step this patch makes cfs_tasks list as MRU one. > It means, that when a next task is picked to run on physical > CPU it is moved to the front of the list. > > Therefore, the cfs_tasks list is more or less sorted (except > woken tasks) starting from recently given CPU time tasks toward > tasks with max wait time in a run-queue, i.e. MRU list. > > Second, as part of the load balance operation, this approach > starts detach_tasks()/detach_one_task() from the tail of the > queue instead of the head, giving some advantages: > > - tends to pick a task with highest wait time; > - tasks located in the tail are less likely cache-hot, > therefore the can_migrate_task() decision is higher. > > hackbench illustrates slightly better performance. For example > doing 1000 samples and 40 groups on i5-3320M CPU, it shows below > figures: > > default: 0.657 avg > patched: 0.646 avg > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Thanks! ^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:sched/core] sched/fair: Search a task from the tail of the queue 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra @ 2017-10-10 10:58 ` tip-bot for Uladzislau Rezki 1 sibling, 0 replies; 4+ messages in thread From: tip-bot for Uladzislau Rezki @ 2017-10-10 10:58 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, peterz, mingo, tkhai, umgwanakikbuti, nicolas.pitre, oleksiy.avramchenko, tim.c.chen, tglx, pjt, torvalds, efault, linux-kernel, urezki, rostedt, oleg Commit-ID: 93824900a2e242766f5fe6ae7697e3d7171aa234 Gitweb: https://git.kernel.org/tip/93824900a2e242766f5fe6ae7697e3d7171aa234 Author: Uladzislau Rezki <urezki@gmail.com> AuthorDate: Wed, 13 Sep 2017 12:24:30 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 10 Oct 2017 11:45:30 +0200 sched/fair: Search a task from the tail of the queue As a first step this patch makes cfs_tasks list as MRU one. It means, that when a next task is picked to run on physical CPU it is moved to the front of the list. Therefore, the cfs_tasks list is more or less sorted (except woken tasks) starting from recently given CPU time tasks toward tasks with max wait time in a run-queue, i.e. MRU list. Second, as part of the load balance operation, this approach starts detach_tasks()/detach_one_task() from the tail of the queue instead of the head, giving some advantages: - tends to pick a task with highest wait time; - tasks located in the tail are less likely cache-hot, therefore the can_migrate_task() decision is higher. hackbench illustrates slightly better performance. For example doing 1000 samples and 40 groups on i5-3320M CPU, it shows below figures: default: 0.657 avg patched: 0.646 avg Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20170913102430.8985-2-urezki@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ac6602c..cc0bfb0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6628,10 +6628,7 @@ again: set_next_entity(cfs_rq, se); } - if (hrtick_enabled(rq)) - hrtick_start_fair(rq, p); - - return p; + goto done; simple: #endif @@ -6645,6 +6642,16 @@ simple: p = task_of(se); +done: __maybe_unused +#ifdef CONFIG_SMP + /* + * Move the next running task to the front of + * the list, so our cfs_tasks list becomes MRU + * one. + */ + list_move(&p->se.group_node, &rq->cfs_tasks); +#endif + if (hrtick_enabled(rq)) hrtick_start_fair(rq, p); @@ -7080,11 +7087,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env) */ static struct task_struct *detach_one_task(struct lb_env *env) { - struct task_struct *p, *n; + struct task_struct *p; lockdep_assert_held(&env->src_rq->lock); - list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) { + list_for_each_entry_reverse(p, + &env->src_rq->cfs_tasks, se.group_node) { if (!can_migrate_task(p, env)) continue; @@ -7130,7 +7138,7 @@ static int detach_tasks(struct lb_env *env) if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) break; - p = list_first_entry(tasks, struct task_struct, se.group_node); + p = list_last_entry(tasks, struct task_struct, se.group_node); env->loop++; /* We've more or less seen every task there is, call it quits */ @@ -7180,7 +7188,7 @@ static int detach_tasks(struct lb_env *env) continue; next: - list_move_tail(&p->se.group_node, tasks); + list_move(&p->se.group_node, tasks); } /* ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-10 11:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony) 2017-09-13 10:24 ` Uladzislau Rezki (Sony) 2017-10-04 9:46 ` Peter Zijlstra 2017-10-10 10:58 ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.