All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)

Objective:

In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.

Test environment:

- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz

Test results:

A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:

i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png

A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:

i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png

Added overhead:

First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.

perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair

Most of the time i see that cfs_tasks and group_node are in L1-dcache line:

       │             __list_del(entry->prev, entry->next);
  3.51 │       mov    0xb0(%rbp),%rdx
  1.75 │       mov    0xa8(%rbp),%rcx
       │     pick_next_task_fair():
       │                     list_move(&p->se.group_node, &rq->cfs_tasks);
       │       lea    0xa8(%rbp),%rax
       │     __list_del():

group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

       │     list_add():
       │             __list_add(new, head, head->next);
  2.44 │       mov    0x940(%r15),%rdx
       │     __list_add():

cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.

Conclusion:

according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.

It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.

Best Regards,
Uladzislau Rezki

Uladzislau Rezki (1):
  sched/fair: search a task from the tail of the queue

 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-10 11:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04  9:46   ` Peter Zijlstra
2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.