[RFC PATCH v2] sched/fair: search a task from the tail of the queue

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
@ 2017-09-13 10:24 Uladzislau Rezki (Sony)
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  0 siblings, 1 reply; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki (Sony)

Objective:

In an attempt to improve the criteria of which tasks we should consider to
be migrated (SMP case) during load balance operations, i have done some
performance evaluations.

Test environment:

- set performance governor
- echo 0 > /proc/sys/kernel/nmi_watchdog
- intel_pstate=disable
- i5-3320M CPU @ 2.60GHz

Test results:

A first test was to evaluate hackbench with different number of groups,
i used 10, 20, 40. See below plots with results:

i=0; while [ $i -le 1000 ]; do ./hackbench 10 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_10_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 20 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_20_groups.png

i=0; while [ $i -le 1000 ]; do ./hackbench 40 | grep "Time" | awk '{print $2}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/hacknench_1000_samples_40_groups.png

A second test was to evaluate how "perf bench sched pipe" behaves in a single
CPU scenario. As Peter Zijlstra suggested before, to check caches and find out
extra overhead caused by list manipulation:

i=0; while [ $i -le 500 ]; do taskset 1 perf bench sched pipe | grep "Total" | awk '{print $3}'; i=$(($i+1)); done
ftp://vps418301.ovh.net/incoming/taskset_1_perf_bench_sched_pipe.png

Added overhead:

First, i checked if "cfs_tasks" and "group_node" are in a cache line
by annotating pick_next_task_fair symbol and running single CPU test.

perf record -F 100000 -a -e L1-dcache-misses -- taskset 1 perf bench sched pipe -l 10000000
perf annotate pick_next_task_fair

Most of the time i see that cfs_tasks and group_node are in L1-dcache line:

       │             __list_del(entry->prev, entry->next);
  3.51 │       mov    0xb0(%rbp),%rdx
  1.75 │       mov    0xa8(%rbp),%rcx
       │     pick_next_task_fair():
       │                     list_move(&p->se.group_node, &rq->cfs_tasks);
       │       lea    0xa8(%rbp),%rax
       │     __list_del():

group_node: 3.51 corresponds to 2 samples or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

       │     list_add():
       │             __list_add(new, head, head->next);
  2.44 │       mov    0x940(%r15),%rdx
       │     __list_add():

cfs_tasks: 2.44 corresponds to 1 sample or misses. Minimum value is 0
maximum is 2 misses, among 10 runs.

In case of checking all level cache misses "-e cache-misses" i do not
see any samples or misses.

Conclusion:

according to provided results and my subjective opinion, it worth to
sort cfs_task list and start pulling from the back of the list during
load balance (+ active) or idle balance operations.

It would be appreciated if there are any comments, proposals or ideas
regarding this small investigation.

Best Regards,
Uladzislau Rezki

Uladzislau Rezki (1):
  sched/fair: search a task from the tail of the queue

 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH v2] sched/fair: search a task from the tail of the queue
  2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
@ 2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  2017-10-04  9:46   ` Peter Zijlstra
  2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
  0 siblings, 2 replies; 4+ messages in thread
From: Uladzislau Rezki (Sony) @ 2017-09-13 10:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre, Uladzislau Rezki

From: Uladzislau Rezki <urezki@gmail.com>

As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.

Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.

Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:

- tends to pick a task with highest wait time;
- tasks located in the tail are less likely cache-hot,
  therefore the can_migrate_task() decision is higher.

hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:

default: 0.657 avg
patched: 0.646 avg

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1d51c0..b598907e5236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6354,10 +6354,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 		set_next_entity(cfs_rq, se);
 	}
 
-	if (hrtick_enabled(rq))
-		hrtick_start_fair(rq, p);
-
-	return p;
+	goto done;
 simple:
 	cfs_rq = &rq->cfs;
 #endif
@@ -6375,6 +6372,16 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 
 	p = task_of(se);
 
+done: __maybe_unused
+#ifdef CONFIG_SMP
+	/*
+	 * Move the next running task to the front of
+	 * the list, so our cfs_tasks list becomes MRU
+	 * one.
+	 */
+	list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
 	if (hrtick_enabled(rq))
 		hrtick_start_fair(rq, p);
 
@@ -6806,11 +6813,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
  */
 static struct task_struct *detach_one_task(struct lb_env *env)
 {
-	struct task_struct *p, *n;
+	struct task_struct *p;
 
 	lockdep_assert_held(&env->src_rq->lock);
 
-	list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+	list_for_each_entry_reverse(p,
+			&env->src_rq->cfs_tasks, se.group_node) {
 		if (!can_migrate_task(p, env))
 			continue;
 
@@ -6856,7 +6864,7 @@ static int detach_tasks(struct lb_env *env)
 		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
 			break;
 
-		p = list_first_entry(tasks, struct task_struct, se.group_node);
+		p = list_last_entry(tasks, struct task_struct, se.group_node);
 
 		env->loop++;
 		/* We've more or less seen every task there is, call it quits */
@@ -6906,7 +6914,7 @@ static int detach_tasks(struct lb_env *env)
 
 		continue;
 next:
-		list_move_tail(&p->se.group_node, tasks);
+		list_move(&p->se.group_node, tasks);
 	}
 
 	/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] sched/fair: search a task from the tail of the queue
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
@ 2017-10-04  9:46   ` Peter Zijlstra
  2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2017-10-04  9:46 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, Ingo Molnar, Mike Galbraith, Oleksiy Avramchenko,
	Paul Turner, Oleg Nesterov, Steven Rostedt, Mike Galbraith,
	Kirill Tkhai, Tim Chen, Nicolas Pitre

On Wed, Sep 13, 2017 at 12:24:30PM +0200, Uladzislau Rezki (Sony) wrote:
> From: Uladzislau Rezki <urezki@gmail.com>
> 
> As a first step this patch makes cfs_tasks list as MRU one.
> It means, that when a next task is picked to run on physical
> CPU it is moved to the front of the list.
> 
> Therefore, the cfs_tasks list is more or less sorted (except
> woken tasks) starting from recently given CPU time tasks toward
> tasks with max wait time in a run-queue, i.e. MRU list.
> 
> Second, as part of the load balance operation, this approach
> starts detach_tasks()/detach_one_task() from the tail of the
> queue instead of the head, giving some advantages:
> 
> - tends to pick a task with highest wait time;
> - tasks located in the tail are less likely cache-hot,
>   therefore the can_migrate_task() decision is higher.
> 
> hackbench illustrates slightly better performance. For example
> doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
> figures:
> 
> default: 0.657 avg
> patched: 0.646 avg
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:sched/core] sched/fair: Search a task from the tail of the queue
  2017-09-13 10:24 ` Uladzislau Rezki (Sony)
  2017-10-04  9:46   ` Peter Zijlstra
@ 2017-10-10 10:58   ` tip-bot for Uladzislau Rezki
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Uladzislau Rezki @ 2017-10-10 10:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, peterz, mingo, tkhai, umgwanakikbuti, nicolas.pitre,
	oleksiy.avramchenko, tim.c.chen, tglx, pjt, torvalds, efault,
	linux-kernel, urezki, rostedt, oleg

Commit-ID:  93824900a2e242766f5fe6ae7697e3d7171aa234
Gitweb:     https://git.kernel.org/tip/93824900a2e242766f5fe6ae7697e3d7171aa234
Author:     Uladzislau Rezki <urezki@gmail.com>
AuthorDate: Wed, 13 Sep 2017 12:24:30 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Oct 2017 11:45:30 +0200

sched/fair: Search a task from the tail of the queue

As a first step this patch makes cfs_tasks list as MRU one.
It means, that when a next task is picked to run on physical
CPU it is moved to the front of the list.

Therefore, the cfs_tasks list is more or less sorted (except
woken tasks) starting from recently given CPU time tasks toward
tasks with max wait time in a run-queue, i.e. MRU list.

Second, as part of the load balance operation, this approach
starts detach_tasks()/detach_one_task() from the tail of the
queue instead of the head, giving some advantages:

 - tends to pick a task with highest wait time;

 - tasks located in the tail are less likely cache-hot,
   therefore the can_migrate_task() decision is higher.

hackbench illustrates slightly better performance. For example
doing 1000 samples and 40 groups on i5-3320M CPU, it shows below
figures:

 default: 0.657 avg
 patched: 0.646 avg

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20170913102430.8985-2-urezki@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac6602c..cc0bfb0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6628,10 +6628,7 @@ again:
 		set_next_entity(cfs_rq, se);
 	}
 
-	if (hrtick_enabled(rq))
-		hrtick_start_fair(rq, p);
-
-	return p;
+	goto done;
 simple:
 #endif
 
@@ -6645,6 +6642,16 @@ simple:
 
 	p = task_of(se);
 
+done: __maybe_unused
+#ifdef CONFIG_SMP
+	/*
+	 * Move the next running task to the front of
+	 * the list, so our cfs_tasks list becomes MRU
+	 * one.
+	 */
+	list_move(&p->se.group_node, &rq->cfs_tasks);
+#endif
+
 	if (hrtick_enabled(rq))
 		hrtick_start_fair(rq, p);
 
@@ -7080,11 +7087,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
  */
 static struct task_struct *detach_one_task(struct lb_env *env)
 {
-	struct task_struct *p, *n;
+	struct task_struct *p;
 
 	lockdep_assert_held(&env->src_rq->lock);
 
-	list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
+	list_for_each_entry_reverse(p,
+			&env->src_rq->cfs_tasks, se.group_node) {
 		if (!can_migrate_task(p, env))
 			continue;
 
@@ -7130,7 +7138,7 @@ static int detach_tasks(struct lb_env *env)
 		if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
 			break;
 
-		p = list_first_entry(tasks, struct task_struct, se.group_node);
+		p = list_last_entry(tasks, struct task_struct, se.group_node);
 
 		env->loop++;
 		/* We've more or less seen every task there is, call it quits */
@@ -7180,7 +7188,7 @@ static int detach_tasks(struct lb_env *env)
 
 		continue;
 next:
-		list_move_tail(&p->se.group_node, tasks);
+		list_move(&p->se.group_node, tasks);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-10 11:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-13 10:24 [RFC PATCH v2] sched/fair: search a task from the tail of the queue Uladzislau Rezki (Sony)
2017-09-13 10:24 ` Uladzislau Rezki (Sony)
2017-10-04  9:46   ` Peter Zijlstra
2017-10-10 10:58   ` [tip:sched/core] sched/fair: Search " tip-bot for Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox