[RFC PATCH v2] sched/fair: search a task from the tail of the queue

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)

Objective:

In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.

Test environment:

- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz

Test results:

A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:

i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png

A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:

i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png

Added overhead:

First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.

perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair

Most of the time i see that cfs_tasks and group_node are in L1-dcache line:

       │             __list_del(entry->prev, entry->next);
  3.51 │       mov    0xb0(%rbp),%rdx
  1.75 │       mov    0xa8(%rbp),%rcx
       │     pick_next_task_fair():
       │                     list_move(&p->se.group_node, &rq->cfs_tasks);
       │       lea    0xa8(%rbp),%rax
       │     __list_del():

group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

       │     list_add():
       │             __list_add(new, head, head->next);
  2.44 │       mov    0x940(%r15),%rdx
       │     __list_add():

cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.

Conclusion:

according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.

It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.

Best Regards,
Uladzislau Rezki

Uladzislau Rezki (1):
  sched/fair: search a task from the tail of the queue

 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
  2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
@ 2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  2017-10-04  9:46   ` Peter Zijlstra
  2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
  0 siblings, 2 replies; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki

From: Uladzislau Rezki <urezki@gmail.com>

As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.

Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.

Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:

- tends to pick a task with highest wait time;
- tasks located in the tail are less likely cache-hot,
  therefore the can_migrate_task() decision is higher.

hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:

default: 0.657 avg
patched: 0.646 avg

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1d51c0..b598907e5236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6354,10 +6354,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 		set_next_entity(cfs_rq, se);
 	}
 
-	if (hrtick_enabled(rq))
-		hrtick_start_fair(rq, p);
-
-	return p;
+	goto done;
 simple:
 	cfs_rq = &rq->cfs;
 #endif
@@ -6375,6 +6372,16 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 
 	p = task_of(se);
 
+done: __maybe_unused
+#ifdef CONFIG_SMP
+	/*
+	 * Move the next running task to the front of
+	 * the list, so our cfs_tasks list becomes MRU
+	 * one.
+	 */
+	list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
 	if (hrtick_enabled(rq))
 		hrtick_start_fair(rq, p);
 
@@ -6806,11 +6813,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
  */
 static struct task_struct *detach_one_task(struct lb_env *env)
 {
-	struct task_struct *p, *n;
+	struct task_struct *p;
 
 	lockdep_assert_held(&env->src_rq->lock);
 
-	list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+	list_for_each_entry_reverse(p,
+			&env->src_rq->cfs_tasks, se.group_node) {
 		if (!can_migrate_task(p, env))
 			continue;
 
@@ -6856,7 +6864,7 @@ static int detach_tasks(struct lb_env *env)
 		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
 			break;
 
-		p = list_first_entry(tasks, struct task_struct, se.group_node);
+		p = list_last_entry(tasks, struct task_struct, se.group_node);
 
 		env->loop++;
 		/* We've more or less seen every task there is, call it quits */
@@ -6906,7 +6914,7 @@ static int detach_tasks(struct lb_env *env)
 
 		continue;
 next:
-		list_move_tail(&p->se.group_node, tasks);
+		list_move(&p->se.group_node, tasks);
 	}
 
 	/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] sched/fair: search a task from the tail of the queue
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
@ 2017-10-04  9:46   ` Peter Zijlstra
  2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2017-10-04  9:46 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre

On Wed, Sep 13, 2017 at 12:24:30PM +0200, Uladzislau Rezki (Sony) wrote:
> From: Uladzislau Rezki <urezki@gmail.com>
> 
> As a first step this patch makes cfs_tasks list as MRU one.
> It means, that when a next task is picked to run on physical
> CPU it is moved to the front of the list.
> 
> Therefore, the cfs_tasks list is more or less sorted (except
> woken tasks) starting from recently given CPU time tasks toward
> tasks with max wait time in a run-queue, i.e. MRU list.
> 
> Second, as part of the load balance operation, this approach
> starts detach_tasks()/detach_one_task() from the tail of the
> queue instead of the head, giving some advantages:
> 
> - tends to pick a task with highest wait time;
> - tasks located in the tail are less likely cache-hot,
>   therefore the can_migrate_task() decision is higher.
> 
> hackbench illustrates slightly better performance. For example
> doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
> figures:
> 
> default: 0.657 avg
> patched: 0.646 avg
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:sched/core] sched/fair: Search a task from the tail of the queue
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  2017-10-04  9:46   ` Peter Zijlstra
@ 2017-10-10 10:58   ` tip-bot for Uladzislau Rezki
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Uladzislau Rezki @ 2017-10-10 10:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, peterz, mingo, tkhai, umgwanakikbuti, nicolas.pitre,
	oleksiy.avramchenko, tim.c.chen, tglx, pjt, torvalds, efault,
	linux-kernel, urezki, rostedt, oleg

Commit-ID:  93824900a2e242766f5fe6ae7697e3d7171aa234
Gitweb:     https://git.kernel.org/tip/93824900a2e242766f5fe6ae7697e3d7171aa234
Author:     Uladzislau Rezki <urezki@gmail.com>
AuthorDate: Wed, 13 Sep 2017 12:24:30 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:30 +0200

sched/fair: Search a task from the tail of the queue

As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.

Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.

Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:

 - tends to pick a task with highest wait time;

 - tasks located in the tail are less likely cache-hot,
   therefore the can_migrate_task() decision is higher.

hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:

 default: 0.657 avg
 patched: 0.646 avg

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170913102430.8985-2-urezki@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac6602c..cc0bfb0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6628,10 +6628,7 @@ again:
 		set_next_entity(cfs_rq, se);
 	}
 
-	if (hrtick_enabled(rq))
-		hrtick_start_fair(rq, p);
-
-	return p;
+	goto done;
 simple:
 #endif
 
@@ -6645,6 +6642,16 @@ simple:
 
 	p = task_of(se);
 
+done: __maybe_unused
+#ifdef CONFIG_SMP
+	/*
+	 * Move the next running task to the front of
+	 * the list, so our cfs_tasks list becomes MRU
+	 * one.
+	 */
+	list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
 	if (hrtick_enabled(rq))
 		hrtick_start_fair(rq, p);
 
@@ -7080,11 +7087,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
  */
 static struct task_struct *detach_one_task(struct lb_env *env)
 {
-	struct task_struct *p, *n;
+	struct task_struct *p;
 
 	lockdep_assert_held(&env->src_rq->lock);
 
-	list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+	list_for_each_entry_reverse(p,
+			&env->src_rq->cfs_tasks, se.group_node) {
 		if (!can_migrate_task(p, env))
 			continue;
 
@@ -7130,7 +7138,7 @@ static int detach_tasks(struct lb_env *env)
 		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
 			break;
 
-		p = list_first_entry(tasks, struct task_struct, se.group_node);
+		p = list_last_entry(tasks, struct task_struct, se.group_node);
 
 		env->loop++;
 		/* We've more or less seen every task there is, call it quits */
@@ -7180,7 +7188,7 @@ static int detach_tasks(struct lb_env *env)
 
 		continue;
 next:
-		list_move_tail(&p->se.group_node, tasks);
+		list_move(&p->se.group_node, tasks);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-10 11:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04  9:46   ` Peter Zijlstra
2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.