[PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it
@ 2015-08-20 11:21 byungchul.park
  2015-08-20 11:21 ` [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se byungchul.park
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:21 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

there are 3 problems when att(det)aching a se to(from) its cfs_rq.

problem 1. se's average load is not accounted with new cfs_rq in queued case,
when a task changes its cgroup.

problem 2. cfs_rq->avg.load_avg becomes larger and larger whenever changing
cgroup to another. we have to sync se's average load with prev cfs_rq.

problem 3. current code does not sync it with its cfs_rq when switching
sched class to fair class. if we can always assume that a se has been
detached from fair class for a long time enough for its average load to
become useless, at the time attaching it to its fair class cfs_rq, then
current code is acceptable. but the assumption is not always true.

patch 1/5, does code refactoring for further patches.
patch 2/5, solves the problem 1.
patch 3/5, solves the problem 2.
patch 4/5, solves the problem 3.
patch 5/5, does code refactoring for better readability.

change from v3 to v4
* clean up code as Peter suggests
* use upper case letter in comment properly

change from v2 to v3 (logic is same as v2)
* fix up my poor english in commit message and comment
* break down big patches into more patches for being reviewed easily
* supplement cover letter messages

change from v1 to v2
* introduce two functions for adjusting vruntime and load when attaching
  and detaching.
* call the introduced functions instead of switched_from(to)_fair() directly
  in task_move_group_fair().
* add decaying logic for a se which has detached from a cfs_rq.

thanks,
byungchul

Byungchul Park (5):
  sched: add two functions adjusting cfs_rq's load when att(det)aching
    a se
  sched: make task_move_group_fair adjust cfs_rq's load in case of
    queued
  sched: sync a se with prev cfs_rq when changing cgroup
  sched: sync a se with its cfs_rq when switching sched class to fair
    class
  sched: add two functions for att(det)aching a task to(from) a cfs_rq

 kernel/sched/fair.c |  222 ++++++++++++++++++++++++++-------------------------
 1 file changed, 113 insertions(+), 109 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
@ 2015-08-20 11:21 ` byungchul.park
  2015-09-13 10:58   ` [tip:sched/core] sched/fair: Factor out the {at, de}taching of the per entity load {to,from} the runqueue tip-bot for Byungchul Park
  2015-08-20 11:21 ` [PATCH v4 2/5] sched: make task_move_group_fair adjust cfs_rq's load in case of queued byungchul.park
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:21 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

this patch introduces two functions adjusting cfs_rq's average load when
att(det)aching a se to(from) the cfs_rq, and let them use these functions,
and cleans up some code.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |   77 +++++++++++++++++++++++++--------------------------
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4d5f97b..a43023b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2664,8 +2664,8 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
-	int decayed;
 	struct sched_avg *sa = &cfs_rq->avg;
+	int decayed;
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
@@ -2695,33 +2695,52 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 static inline void update_load_avg(struct sched_entity *se, int update_tg)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-	int cpu = cpu_of(rq_of(cfs_rq));
 	u64 now = cfs_rq_clock_task(cfs_rq);
+	int cpu = cpu_of(rq_of(cfs_rq));
 
 	/*
 	 * Track task load average for carrying it to new CPU after migrated, and
 	 * track group sched_entity load average for task_h_load calc in migration
 	 */
 	__update_load_avg(now, cpu, &se->avg,
-		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
+			  se->on_rq * scale_load_down(se->load.weight),
+			  cfs_rq->curr == se, NULL);
 
 	if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
 		update_tg_load_avg(cfs_rq, 0);
 }
 
+static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	se->avg.last_update_time = cfs_rq->avg.last_update_time;
+	cfs_rq->avg.load_avg += se->avg.load_avg;
+	cfs_rq->avg.load_sum += se->avg.load_sum;
+	cfs_rq->avg.util_avg += se->avg.util_avg;
+	cfs_rq->avg.util_sum += se->avg.util_sum;
+}
+
+static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
+			  cfs_rq->curr == se, NULL);
+
+	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
+	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
+	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
+	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+}
+
 /* Add the load generated by se into cfs_rq's load average */
 static inline void
 enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	struct sched_avg *sa = &se->avg;
 	u64 now = cfs_rq_clock_task(cfs_rq);
-	int migrated = 0, decayed;
+	int migrated, decayed;
 
-	if (sa->last_update_time == 0) {
-		sa->last_update_time = now;
-		migrated = 1;
-	}
-	else {
+	migrated = !sa->last_update_time;
+	if (!migrated) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);
@@ -2732,12 +2751,8 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	cfs_rq->runnable_load_avg += sa->load_avg;
 	cfs_rq->runnable_load_sum += sa->load_sum;
 
-	if (migrated) {
-		cfs_rq->avg.load_avg += sa->load_avg;
-		cfs_rq->avg.load_sum += sa->load_sum;
-		cfs_rq->avg.util_avg += sa->util_avg;
-		cfs_rq->avg.util_sum += sa->util_sum;
-	}
+	if (migrated)
+		attach_entity_load_avg(cfs_rq, se);
 
 	if (decayed || migrated)
 		update_tg_load_avg(cfs_rq, 0);
@@ -2752,7 +2767,7 @@ dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	cfs_rq->runnable_load_avg =
 		max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
 	cfs_rq->runnable_load_sum =
-		max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
+		max_t(s64,  cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
 }
 
 /*
@@ -2820,6 +2835,11 @@ static inline void
 dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
 
+static inline void
+attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void
+detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+
 static inline int idle_balance(struct rq *rq)
 {
 	return 0;
@@ -7909,25 +7929,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 		se->vruntime -= cfs_rq->min_vruntime;
 	}
 
-#ifdef CONFIG_SMP
 	/* Catch up with the cfs_rq and remove our load when we leave */
-	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
-		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
-
-	cfs_rq->avg.load_avg =
-		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum =
-		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg =
-		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum =
-		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
-#endif
+	detach_entity_load_avg(cfs_rq, se);
 }
 
-/*
- * We switched to the sched_fair class.
- */
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
@@ -8040,14 +8045,8 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 		cfs_rq = cfs_rq_of(se);
 		se->vruntime += cfs_rq->min_vruntime;
 
-#ifdef CONFIG_SMP
 		/* Virtually synchronize task with its new cfs_rq */
-		p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
-		cfs_rq->avg.load_avg += p->se.avg.load_avg;
-		cfs_rq->avg.load_sum += p->se.avg.load_sum;
-		cfs_rq->avg.util_avg += p->se.avg.util_avg;
-		cfs_rq->avg.util_sum += p->se.avg.util_sum;
-#endif
+		attach_entity_load_avg(cfs_rq, se);
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/5] sched: make task_move_group_fair adjust cfs_rq's load in case of queued
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
  2015-08-20 11:21 ` [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se byungchul.park
@ 2015-08-20 11:21 ` byungchul.park
  2015-09-13 10:59   ` [tip:sched/core] sched/fair: Have task_move_group_fair() unconditionally add the entity load to the runqueue tip-bot for Byungchul Park
  2015-08-20 11:21 ` [PATCH v4 3/5] sched: sync a se with prev cfs_rq when changing cgroup byungchul.park
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:21 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

se's average load should be added to new cfs_rq, not only in case of
!queued but also in case of queued. older code managed cfs_rq's blocked
load separately. in that case, the blocked load was meaningful only in
case that the se is in !queued. but now load tracking code is changed,
it is not true. code adjusting cfs_rq's average load should be changed.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a43023b..9f93c8b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8041,13 +8041,12 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 		se->vruntime -= cfs_rq_of(se)->min_vruntime;
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
-	if (!queued) {
-		cfs_rq = cfs_rq_of(se);
+	cfs_rq = cfs_rq_of(se);
+	if (!queued)
 		se->vruntime += cfs_rq->min_vruntime;
 
-		/* Virtually synchronize task with its new cfs_rq */
-		attach_entity_load_avg(cfs_rq, se);
-	}
+	/* Virtually synchronize task with its new cfs_rq */
+	attach_entity_load_avg(cfs_rq, se);
 }
 
 void free_fair_sched_group(struct task_group *tg)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/5] sched: sync a se with prev cfs_rq when changing cgroup
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
  2015-08-20 11:21 ` [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se byungchul.park
  2015-08-20 11:21 ` [PATCH v4 2/5] sched: make task_move_group_fair adjust cfs_rq's load in case of queued byungchul.park
@ 2015-08-20 11:21 ` byungchul.park
  2015-09-13 10:59   ` [tip:sched/core] sched/fair: Have task_move_group_fair() also detach entity load from the old runqueue tip-bot for Byungchul Park
  2015-08-20 11:21 ` [PATCH v4 4/5] sched: sync a se with its cfs_rq when switching sched class to fair class byungchul.park
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:21 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

current code is wrong with cfs_rq's average load when changing a task's
cfs_rq to another. i tested with "echo pid > cgroup" and found that
e.g. cfs_rq->avg.load_avg became larger and larger whenever changing
cgroup to another again and again. we have to sync se's average load
with both *prev* cfs_rq and next cfs_rq when changing its cgroup.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9f93c8b..d58e9aa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8037,8 +8037,12 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
 		queued = 1;
 
+	cfs_rq = cfs_rq_of(se);
 	if (!queued)
-		se->vruntime -= cfs_rq_of(se)->min_vruntime;
+		se->vruntime -= cfs_rq->min_vruntime;
+
+	/* Synchronize task with its prev cfs_rq */
+	detach_entity_load_avg(cfs_rq, se);
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 	cfs_rq = cfs_rq_of(se);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/5] sched: sync a se with its cfs_rq when switching sched class to fair class
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
                   ` (2 preceding siblings ...)
  2015-08-20 11:21 ` [PATCH v4 3/5] sched: sync a se with prev cfs_rq when changing cgroup byungchul.park
@ 2015-08-20 11:21 ` byungchul.park
  2015-09-13 10:59   ` [tip:sched/core] sched/fair: Fix switched_to_fair()' s per entity load tracking tip-bot for Byungchul Park
  2015-08-20 11:22 ` [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq byungchul.park
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:21 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

we have to sync a se with its cfs_rq, when switching sched class to
fair class. current code does not sync it because the se average load
won't be valid any more if it has been dettached for a long time. however
the se's average load would be valid just after being detached from
cfs_rq, so we need to sync it in that case, e.g. priority inheritance.

to solve the problem that a se average load becomes more useless over
time, this patch decays its average load even for the duration that
the se has been detached, when it gets attached to the cfs_rq.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d58e9aa..af6ad5f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2712,6 +2712,18 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	/*
+	 * In case of migration and cgroup-change, more care should be taken
+	 * because se's cfs_rq was changed, that means calling __update_load_avg
+	 * with new cfs_rq->avg.last_update_time is meaningless. So we skip the
+	 * update here. We have to update it with prev cfs_rq just before changing
+	 * se's cfs_rq, and get here soon.
+	 */
+	if (se->avg.last_update_time) {
+		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+				  &se->avg, 0, 0, NULL);
+	}
+
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -7945,6 +7957,9 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
+	/* Synchronize task with its cfs_rq */
+	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+
 	if (!task_on_rq_queued(p)) {
 
 		/*
@@ -8044,6 +8059,10 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 	/* Synchronize task with its prev cfs_rq */
 	detach_entity_load_avg(cfs_rq, se);
 	set_task_rq(p, task_cpu(p));
+
+	/* Tell se's cfs_rq has been changed -- migrated */
+	p->se.avg.last_update_time = 0;
+
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 	cfs_rq = cfs_rq_of(se);
 	if (!queued)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
                   ` (3 preceding siblings ...)
  2015-08-20 11:21 ` [PATCH v4 4/5] sched: sync a se with its cfs_rq when switching sched class to fair class byungchul.park
@ 2015-08-20 11:22 ` byungchul.park
  2015-08-20 11:35   ` Byungchul Park
  2015-09-13 11:00   ` [tip:sched/core] sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair() tip-bot for Byungchul Park
  2015-08-22  7:45 ` [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it Byungchul Park
  2015-08-25 23:54 ` Byungchul Park
  6 siblings, 2 replies; 22+ messages in thread
From: byungchul.park @ 2015-08-20 11:22 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

this patch introduces two functions for adjusting a task's vruntime
and the target cfs_rq's average load so that redundancy code doing
same things can be simplified. and it provides better readability.
switched_from_fair, switched_to_fair and task_move_group_fair can
use introduced functions and get simplified.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |  137 ++++++++++++++++++++++-----------------------------
 1 file changed, 60 insertions(+), 77 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af6ad5f..53d0e30 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7918,21 +7918,47 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
 		check_preempt_curr(rq, p, 0);
 }
 
-static void switched_from_fair(struct rq *rq, struct task_struct *p)
+static inline bool vruntime_normalized(struct task_struct *p)
 {
+	int queued = task_on_rq_queued(p);
 	struct sched_entity *se = &p->se;
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 	/*
-	 * Ensure the task's vruntime is normalized, so that when it's
-	 * switched back to the fair class the enqueue_entity(.flags=0) will
-	 * do the right thing.
-	 *
 	 * If it's queued, then the dequeue_entity(.flags=0) will already
-	 * have normalized the vruntime, if it's !queued, then only when
-	 * the task is sleeping will it still have non-normalized vruntime.
+	 * have normalized the vruntime.
+	 */
+	if (queued)
+		return true;
+
+	/*
+	 * When !queued, vruntime of the task has usually NOT been normalized.
+	 * But there are some cases where it has already been normalized:
+	 *
+	 * - A forked child which is waiting for being woken up by
+	 *   wake_up_new_task().
+	 * - A task which has been woken up by try_to_wake_up() and
+	 *   waiting for actually being woken up by sched_ttwu_pending().
 	 */
-	if (!task_on_rq_queued(p) && p->state != TASK_RUNNING) {
+	if (!se->sum_exec_runtime || p->state == TASK_WAKING)
+		return true;
+
+	/*
+	 * If it's !queued, then only when the task is sleeping it has a
+	 * non-normalized vruntime, that is, when the task is being migrated
+	 * it has a normailized vruntime.
+	 */
+	if (p->state == TASK_RUNNING)
+		return true;
+
+	return false;
+}
+
+static void detach_task_cfs_rq(struct task_struct *p)
+{
+	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	if (!vruntime_normalized(p)) {
 		/*
 		 * Fix up our vruntime so that the current sleep doesn't
 		 * cause 'unlimited' sleep bonus.
@@ -7945,9 +7971,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 	detach_entity_load_avg(cfs_rq, se);
 }
 
-static void switched_to_fair(struct rq *rq, struct task_struct *p)
+static void attach_task_cfs_rq(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
@@ -7958,33 +7985,32 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 #endif
 
 	/* Synchronize task with its cfs_rq */
-	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+	attach_entity_load_avg(cfs_rq, se);
 
-	if (!task_on_rq_queued(p)) {
+	if (!vruntime_normalized(p))
+		se->vruntime += cfs_rq->min_vruntime;
+}
 
+static void switched_from_fair(struct rq *rq, struct task_struct *p)
+{
+	detach_task_cfs_rq(p);
+}
+
+static void switched_to_fair(struct rq *rq, struct task_struct *p)
+{
+	attach_task_cfs_rq(p);
+
+	if (task_on_rq_queued(p)) {
 		/*
-		 * Ensure the task has a non-normalized vruntime when it is switched
-		 * back to the fair class with !queued, so that enqueue_entity() at
-		 * wake-up time will do the right thing.
-		 *
-		 * If it's queued, then the enqueue_entity(.flags=0) makes the task
-		 * has non-normalized vruntime, if it's !queued, then it still has
-		 * normalized vruntime.
+		 * We were most likely switched from sched_rt, so
+		 * kick off the schedule if running, otherwise just see
+		 * if we can still preempt the current task.
 		 */
-		if (p->state != TASK_RUNNING)
-			se->vruntime += cfs_rq_of(se)->min_vruntime;
-		return;
+		if (rq->curr == p)
+			resched_curr(rq);
+		else
+			check_preempt_curr(rq, p, 0);
 	}
-
-	/*
-	 * We were most likely switched from sched_rt, so
-	 * kick off the schedule if running, otherwise just see
-	 * if we can still preempt the current task.
-	 */
-	if (rq->curr == p)
-		resched_curr(rq);
-	else
-		check_preempt_curr(rq, p, 0);
 }
 
 /* Account for a task changing its policy or group.
@@ -8021,55 +8047,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_move_group_fair(struct task_struct *p, int queued)
 {
-	struct sched_entity *se = &p->se;
-	struct cfs_rq *cfs_rq;
-
-	/*
-	 * If the task was not on the rq at the time of this cgroup movement
-	 * it must have been asleep, sleeping tasks keep their ->vruntime
-	 * absolute on their old rq until wakeup (needed for the fair sleeper
-	 * bonus in place_entity()).
-	 *
-	 * If it was on the rq, we've just 'preempted' it, which does convert
-	 * ->vruntime to a relative base.
-	 *
-	 * Make sure both cases convert their relative position when migrating
-	 * to another cgroup's rq. This does somewhat interfere with the
-	 * fair sleeper stuff for the first placement, but who cares.
-	 */
-	/*
-	 * When !queued, vruntime of the task has usually NOT been normalized.
-	 * But there are some cases where it has already been normalized:
-	 *
-	 * - Moving a forked child which is waiting for being woken up by
-	 *   wake_up_new_task().
-	 * - Moving a task which has been woken up by try_to_wake_up() and
-	 *   waiting for actually being woken up by sched_ttwu_pending().
-	 *
-	 * To prevent boost or penalty in the new cfs_rq caused by delta
-	 * min_vruntime between the two cfs_rqs, we skip vruntime adjustment.
-	 */
-	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
-		queued = 1;
-
-	cfs_rq = cfs_rq_of(se);
-	if (!queued)
-		se->vruntime -= cfs_rq->min_vruntime;
-
-	/* Synchronize task with its prev cfs_rq */
-	detach_entity_load_avg(cfs_rq, se);
+	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
 
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
-
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-	cfs_rq = cfs_rq_of(se);
-	if (!queued)
-		se->vruntime += cfs_rq->min_vruntime;
-
-	/* Virtually synchronize task with its new cfs_rq */
-	attach_entity_load_avg(cfs_rq, se);
+	attach_task_cfs_rq(p);
 }
 
 void free_fair_sched_group(struct task_group *tg)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-08-20 11:22 ` [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq byungchul.park
@ 2015-08-20 11:35   ` Byungchul Park
  2015-08-31 15:21     ` Peter Zijlstra
  2015-09-13 11:00   ` [tip:sched/core] sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair() tip-bot for Byungchul Park
  1 sibling, 1 reply; 22+ messages in thread
From: Byungchul Park @ 2015-08-20 11:35 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du

On Thu, Aug 20, 2015 at 08:22:00PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index af6ad5f..53d0e30 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7918,21 +7918,47 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
>  		check_preempt_curr(rq, p, 0);
>  }
>  
> -static void switched_from_fair(struct rq *rq, struct task_struct *p)
> +static inline bool vruntime_normalized(struct task_struct *p)
>  {
> +	int queued = task_on_rq_queued(p);
>  	struct sched_entity *se = &p->se;
> -	struct cfs_rq *cfs_rq = cfs_rq_of(se);
>  
>  	/*
> -	 * Ensure the task's vruntime is normalized, so that when it's
> -	 * switched back to the fair class the enqueue_entity(.flags=0) will
> -	 * do the right thing.
> -	 *
>  	 * If it's queued, then the dequeue_entity(.flags=0) will already
> -	 * have normalized the vruntime, if it's !queued, then only when
> -	 * the task is sleeping will it still have non-normalized vruntime.
> +	 * have normalized the vruntime.
> +	 */
> +	if (queued)
> +		return true;
> +
> +	/*
> +	 * When !queued, vruntime of the task has usually NOT been normalized.
> +	 * But there are some cases where it has already been normalized:
> +	 *
> +	 * - A forked child which is waiting for being woken up by
> +	 *   wake_up_new_task().
> +	 * - A task which has been woken up by try_to_wake_up() and
> +	 *   waiting for actually being woken up by sched_ttwu_pending().
>  	 */
> -	if (!task_on_rq_queued(p) && p->state != TASK_RUNNING) {
> +	if (!se->sum_exec_runtime || p->state == TASK_WAKING)
> +		return true;
> +
> +	/*
> +	 * If it's !queued, then only when the task is sleeping it has a
> +	 * non-normalized vruntime, that is, when the task is being migrated
> +	 * it has a normailized vruntime.
> +	 */

i tried to change your XXX comment. i think it can be explaned like this.
don't you think so? i want to hear any opinions about this.

thanks,
byungchul

> +	if (p->state == TASK_RUNNING)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void detach_task_cfs_rq(struct task_struct *p)
> +{
> +	struct sched_entity *se = &p->se;
> +	struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +
> +	if (!vruntime_normalized(p)) {
>  		/*
>  		 * Fix up our vruntime so that the current sleep doesn't
>  		 * cause 'unlimited' sleep bonus.
> @@ -7945,9 +7971,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  	detach_entity_load_avg(cfs_rq, se);
>  }
>  
> -static void switched_to_fair(struct rq *rq, struct task_struct *p)
> +static void attach_task_cfs_rq(struct task_struct *p)
>  {
>  	struct sched_entity *se = &p->se;
> +	struct cfs_rq *cfs_rq = cfs_rq_of(se);
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  	/*
> @@ -7958,33 +7985,32 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
>  #endif
>  
>  	/* Synchronize task with its cfs_rq */
> -	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
> +	attach_entity_load_avg(cfs_rq, se);
>  
> -	if (!task_on_rq_queued(p)) {
> +	if (!vruntime_normalized(p))
> +		se->vruntime += cfs_rq->min_vruntime;
> +}
>  
> +static void switched_from_fair(struct rq *rq, struct task_struct *p)
> +{
> +	detach_task_cfs_rq(p);
> +}
> +
> +static void switched_to_fair(struct rq *rq, struct task_struct *p)
> +{
> +	attach_task_cfs_rq(p);
> +
> +	if (task_on_rq_queued(p)) {
>  		/*
> -		 * Ensure the task has a non-normalized vruntime when it is switched
> -		 * back to the fair class with !queued, so that enqueue_entity() at
> -		 * wake-up time will do the right thing.
> -		 *
> -		 * If it's queued, then the enqueue_entity(.flags=0) makes the task
> -		 * has non-normalized vruntime, if it's !queued, then it still has
> -		 * normalized vruntime.
> +		 * We were most likely switched from sched_rt, so
> +		 * kick off the schedule if running, otherwise just see
> +		 * if we can still preempt the current task.
>  		 */
> -		if (p->state != TASK_RUNNING)
> -			se->vruntime += cfs_rq_of(se)->min_vruntime;
> -		return;
> +		if (rq->curr == p)
> +			resched_curr(rq);
> +		else
> +			check_preempt_curr(rq, p, 0);
>  	}
> -
> -	/*
> -	 * We were most likely switched from sched_rt, so
> -	 * kick off the schedule if running, otherwise just see
> -	 * if we can still preempt the current task.
> -	 */
> -	if (rq->curr == p)
> -		resched_curr(rq);
> -	else
> -		check_preempt_curr(rq, p, 0);
>  }
>  
>  /* Account for a task changing its policy or group.
> @@ -8021,55 +8047,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  static void task_move_group_fair(struct task_struct *p, int queued)
>  {
> -	struct sched_entity *se = &p->se;
> -	struct cfs_rq *cfs_rq;
> -
> -	/*
> -	 * If the task was not on the rq at the time of this cgroup movement
> -	 * it must have been asleep, sleeping tasks keep their ->vruntime
> -	 * absolute on their old rq until wakeup (needed for the fair sleeper
> -	 * bonus in place_entity()).
> -	 *
> -	 * If it was on the rq, we've just 'preempted' it, which does convert
> -	 * ->vruntime to a relative base.
> -	 *
> -	 * Make sure both cases convert their relative position when migrating
> -	 * to another cgroup's rq. This does somewhat interfere with the
> -	 * fair sleeper stuff for the first placement, but who cares.
> -	 */
> -	/*
> -	 * When !queued, vruntime of the task has usually NOT been normalized.
> -	 * But there are some cases where it has already been normalized:
> -	 *
> -	 * - Moving a forked child which is waiting for being woken up by
> -	 *   wake_up_new_task().
> -	 * - Moving a task which has been woken up by try_to_wake_up() and
> -	 *   waiting for actually being woken up by sched_ttwu_pending().
> -	 *
> -	 * To prevent boost or penalty in the new cfs_rq caused by delta
> -	 * min_vruntime between the two cfs_rqs, we skip vruntime adjustment.
> -	 */
> -	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
> -		queued = 1;
> -
> -	cfs_rq = cfs_rq_of(se);
> -	if (!queued)
> -		se->vruntime -= cfs_rq->min_vruntime;
> -
> -	/* Synchronize task with its prev cfs_rq */
> -	detach_entity_load_avg(cfs_rq, se);
> +	detach_task_cfs_rq(p);
>  	set_task_rq(p, task_cpu(p));
>  
>  	/* Tell se's cfs_rq has been changed -- migrated */
>  	p->se.avg.last_update_time = 0;
> -
> -	se->depth = se->parent ? se->parent->depth + 1 : 0;
> -	cfs_rq = cfs_rq_of(se);
> -	if (!queued)
> -		se->vruntime += cfs_rq->min_vruntime;
> -
> -	/* Virtually synchronize task with its new cfs_rq */
> -	attach_entity_load_avg(cfs_rq, se);
> +	attach_task_cfs_rq(p);
>  }
>  
>  void free_fair_sched_group(struct task_group *tg)
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
                   ` (4 preceding siblings ...)
  2015-08-20 11:22 ` [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq byungchul.park
@ 2015-08-22  7:45 ` Byungchul Park
  2015-08-25 23:54 ` Byungchul Park
  6 siblings, 0 replies; 22+ messages in thread
From: Byungchul Park @ 2015-08-22  7:45 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du

On Thu, Aug 20, 2015 at 08:21:55PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>

i am not sure if i did it here, in v4, as peter suggested fully. i did
not fully undertand what he commented on v3. i am waiting for any
feedback e.g. answering to my question, just reviewing now and so on.

> 
> there are 3 problems when att(det)aching a se to(from) its cfs_rq.
> 
> problem 1. se's average load is not accounted with new cfs_rq in queued case,
> when a task changes its cgroup.
> 
> problem 2. cfs_rq->avg.load_avg becomes larger and larger whenever changing
> cgroup to another. we have to sync se's average load with prev cfs_rq.
> 
> problem 3. current code does not sync it with its cfs_rq when switching
> sched class to fair class. if we can always assume that a se has been
> detached from fair class for a long time enough for its average load to
> become useless, at the time attaching it to its fair class cfs_rq, then
> current code is acceptable. but the assumption is not always true.
> 
> patch 1/5, does code refactoring for further patches.
> patch 2/5, solves the problem 1.
> patch 3/5, solves the problem 2.
> patch 4/5, solves the problem 3.
> patch 5/5, does code refactoring for better readability.
> 
> change from v3 to v4
> * clean up code as Peter suggests
> * use upper case letter in comment properly
> 
> change from v2 to v3 (logic is same as v2)
> * fix up my poor english in commit message and comment
> * break down big patches into more patches for being reviewed easily
> * supplement cover letter messages
> 
> change from v1 to v2
> * introduce two functions for adjusting vruntime and load when attaching
>   and detaching.
> * call the introduced functions instead of switched_from(to)_fair() directly
>   in task_move_group_fair().
> * add decaying logic for a se which has detached from a cfs_rq.
> 
> thanks,
> byungchul
> 
> Byungchul Park (5):
>   sched: add two functions adjusting cfs_rq's load when att(det)aching
>     a se
>   sched: make task_move_group_fair adjust cfs_rq's load in case of
>     queued
>   sched: sync a se with prev cfs_rq when changing cgroup
>   sched: sync a se with its cfs_rq when switching sched class to fair
>     class
>   sched: add two functions for att(det)aching a task to(from) a cfs_rq
> 
>  kernel/sched/fair.c |  222 ++++++++++++++++++++++++++-------------------------
>  1 file changed, 113 insertions(+), 109 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it
  2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
                   ` (5 preceding siblings ...)
  2015-08-22  7:45 ` [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it Byungchul Park
@ 2015-08-25 23:54 ` Byungchul Park
  6 siblings, 0 replies; 22+ messages in thread
From: Byungchul Park @ 2015-08-25 23:54 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, efault, tglx

On Thu, Aug 20, 2015 at 08:21:55PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 

i am sorry for making a push ; but is there anything for me to improve
for this patch set?

and i forgot to add scheduler related persons to CC and do it.

> there are 3 problems when att(det)aching a se to(from) its cfs_rq.
> 
> problem 1. se's average load is not accounted with new cfs_rq in queued case,
> when a task changes its cgroup.
> 
> problem 2. cfs_rq->avg.load_avg becomes larger and larger whenever changing
> cgroup to another. we have to sync se's average load with prev cfs_rq.
> 
> problem 3. current code does not sync it with its cfs_rq when switching
> sched class to fair class. if we can always assume that a se has been
> detached from fair class for a long time enough for its average load to
> become useless, at the time attaching it to its fair class cfs_rq, then
> current code is acceptable. but the assumption is not always true.
> 
> patch 1/5, does code refactoring for further patches.
> patch 2/5, solves the problem 1.
> patch 3/5, solves the problem 2.
> patch 4/5, solves the problem 3.
> patch 5/5, does code refactoring for better readability.
> 
> change from v3 to v4
> * clean up code as Peter suggests
> * use upper case letter in comment properly
> 
> change from v2 to v3 (logic is same as v2)
> * fix up my poor english in commit message and comment
> * break down big patches into more patches for being reviewed easily
> * supplement cover letter messages
> 
> change from v1 to v2
> * introduce two functions for adjusting vruntime and load when attaching
>   and detaching.
> * call the introduced functions instead of switched_from(to)_fair() directly
>   in task_move_group_fair().
> * add decaying logic for a se which has detached from a cfs_rq.
> 
> thanks,
> byungchul
> 
> Byungchul Park (5):
>   sched: add two functions adjusting cfs_rq's load when att(det)aching
>     a se
>   sched: make task_move_group_fair adjust cfs_rq's load in case of
>     queued
>   sched: sync a se with prev cfs_rq when changing cgroup
>   sched: sync a se with its cfs_rq when switching sched class to fair
>     class
>   sched: add two functions for att(det)aching a task to(from) a cfs_rq
> 
>  kernel/sched/fair.c |  222 ++++++++++++++++++++++++++-------------------------
>  1 file changed, 113 insertions(+), 109 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-08-20 11:35   ` Byungchul Park
@ 2015-08-31 15:21     ` Peter Zijlstra
  2015-08-31 15:37       ` Peter Zijlstra
  2015-09-01  0:28       ` Byungchul Park
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2015-08-31 15:21 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, yuyang.du

On Thu, Aug 20, 2015 at 08:35:16PM +0900, Byungchul Park wrote:
> On Thu, Aug 20, 2015 at 08:22:00PM +0900, byungchul.park@lge.com wrote:

> > +	/*
> > +	 * If it's !queued, then only when the task is sleeping it has a
> > +	 * non-normalized vruntime, that is, when the task is being migrated
> > +	 * it has a normailized vruntime.
> > +	 */
> 
> i tried to change your XXX comment. i think it can be explaned like this.
> don't you think so? i want to hear any opinions about this.
> 
> > +	if (p->state == TASK_RUNNING)
> > +		return true;


--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7943,11 +7943,10 @@ static inline bool vruntime_normalized(s
 		return true;
 
 	/*
-	 * If it's !queued, then only when the task is sleeping it has a
-	 * non-normalized vruntime, that is, when the task is being migrated
-	 * it has a normalized vruntime.
+	 * If it's !queued, sleeping tasks have a normalized vruntime,
+	 * see dequeue_entity().
 	 */
-	if (p->state == TASK_RUNNING)
+	if (!p->se.on_rq)
 		return true;
 
 	return false;


Does that make sense?

I think using p->state for this is fragile, as we could be racy with any
random blocking primitive that does set_current_state() _before_
actually calling into the scheduler.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-08-31 15:21     ` Peter Zijlstra
@ 2015-08-31 15:37       ` Peter Zijlstra
  2015-09-01  0:28       ` Byungchul Park
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2015-08-31 15:37 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, yuyang.du

On Mon, Aug 31, 2015 at 05:21:38PM +0200, Peter Zijlstra wrote:

> +	if (!p->se.on_rq)
>  		return true;
>  
>  	return false;
> 
> 
> Does that make sense?

It does not.. never mind me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-08-31 15:21     ` Peter Zijlstra
  2015-08-31 15:37       ` Peter Zijlstra
@ 2015-09-01  0:28       ` Byungchul Park
  2015-09-01  2:59         ` Byungchul Park
  2015-09-01 15:03         ` Peter Zijlstra
  1 sibling, 2 replies; 22+ messages in thread
From: Byungchul Park @ 2015-09-01  0:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, yuyang.du

On Mon, Aug 31, 2015 at 05:21:38PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 20, 2015 at 08:35:16PM +0900, Byungchul Park wrote:
> > On Thu, Aug 20, 2015 at 08:22:00PM +0900, byungchul.park@lge.com wrote:
> 
> > > +	/*
> > > +	 * If it's !queued, then only when the task is sleeping it has a
> > > +	 * non-normalized vruntime, that is, when the task is being migrated
> > > +	 * it has a normailized vruntime.
> > > +	 */
> > 
> > i tried to change your XXX comment. i think it can be explaned like this.
> > don't you think so? i want to hear any opinions about this.
> > 
> > > +	if (p->state == TASK_RUNNING)
> > > +		return true;
> 
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7943,11 +7943,10 @@ static inline bool vruntime_normalized(s
>  		return true;
>  
>  	/*
> -	 * If it's !queued, then only when the task is sleeping it has a
> -	 * non-normalized vruntime, that is, when the task is being migrated
> -	 * it has a normalized vruntime.
> +	 * If it's !queued, sleeping tasks have a normalized vruntime,
> +	 * see dequeue_entity().

check the condition "!(flags & DEQUEUE_SLEEP)" for doing normalizing in
dequeue_entity(). i think you have to keep my original comment, or
modify your comment to something like below.

before - If it's !queued, sleeping tasks have a normalized vruntime,
after - If it's !queued, sleeping tasks have a non-normalize vruntime,

but.. i think it would be better that you keep my original comment..

>  	 */
> -	if (p->state == TASK_RUNNING)
> +	if (!p->se.on_rq)
>  		return true;
>  
>  	return false;
> 
> 
> Does that make sense?
> 
> I think using p->state for this is fragile, as we could be racy with any
> random blocking primitive that does set_current_state() _before_
> actually calling into the scheduler.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-09-01  0:28       ` Byungchul Park
@ 2015-09-01  2:59         ` Byungchul Park
  2015-09-01 15:03         ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Byungchul Park @ 2015-09-01  2:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, yuyang.du

On Tue, Sep 01, 2015 at 09:28:49AM +0900, Byungchul Park wrote:
> On Mon, Aug 31, 2015 at 05:21:38PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 20, 2015 at 08:35:16PM +0900, Byungchul Park wrote:
> > > On Thu, Aug 20, 2015 at 08:22:00PM +0900, byungchul.park@lge.com wrote:
> > 
> > > > +	/*
> > > > +	 * If it's !queued, then only when the task is sleeping it has a
> > > > +	 * non-normalized vruntime, that is, when the task is being migrated
> > > > +	 * it has a normailized vruntime.
> > > > +	 */
> > > 
> > > i tried to change your XXX comment. i think it can be explaned like this.
> > > don't you think so? i want to hear any opinions about this.
> > > 
> > > > +	if (p->state == TASK_RUNNING)
> > > > +		return true;
> > 
> > 
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7943,11 +7943,10 @@ static inline bool vruntime_normalized(s
> >  		return true;
> >  
> >  	/*
> > -	 * If it's !queued, then only when the task is sleeping it has a
> > -	 * non-normalized vruntime, that is, when the task is being migrated
> > -	 * it has a normalized vruntime.
> > +	 * If it's !queued, sleeping tasks have a normalized vruntime,
> > +	 * see dequeue_entity().
> 
> check the condition "!(flags & DEQUEUE_SLEEP)" for doing normalizing in
> dequeue_entity(). i think you have to keep my original comment, or
> modify your comment to something like below.
> 
> before - If it's !queued, sleeping tasks have a normalized vruntime,
> after - If it's !queued, sleeping tasks have a non-normalize vruntime,
> 
> but.. i think it would be better that you keep my original comment..

i mean "it would be better that you leave my original comment unchanged".

> 
> >  	 */
> > -	if (p->state == TASK_RUNNING)
> > +	if (!p->se.on_rq)
> >  		return true;
> >  
> >  	return false;
> > 
> > 
> > Does that make sense?
> > 
> > I think using p->state for this is fragile, as we could be racy with any
> > random blocking primitive that does set_current_state() _before_
> > actually calling into the scheduler.
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-09-01  0:28       ` Byungchul Park
  2015-09-01  2:59         ` Byungchul Park
@ 2015-09-01 15:03         ` Peter Zijlstra
  2015-09-02  2:33           ` Byungchul Park
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-09-01 15:03 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, yuyang.du

On Tue, Sep 01, 2015 at 09:28:49AM +0900, Byungchul Park wrote:

> check the condition "!(flags & DEQUEUE_SLEEP)" for doing normalizing in
> dequeue_entity(). i think you have to keep my original comment, or
> modify your comment to something like below.
> 
> before - If it's !queued, sleeping tasks have a normalized vruntime,
> after - If it's !queued, sleeping tasks have a non-normalize vruntime,
> 
> but.. i think it would be better that you keep my original comment..

The comment we can talk about later, but I think the condition:

> > -	if (p->state == TASK_RUNNING)
> > +	if (!p->se.on_rq)

is important now. Both are broken in different ways.

	p->state == TASK_RUNNING

is broken in this scenario:

	CPU0					CPU1

	set_current_state(TASK_UNINTERRUPTIBLE);

						sched_move_task()
						  task_move_group_fair()
						    vruntime_normalized() == true
	if (!cond)
		schedule();
	__set_current_state(TASK_RUNNING);


Now the proposed replacement:

	!p->se.on_rq

is equally broken, because (as you point out) clearing it isn't
conditional on DEQUEUE_SLEEP.


And the problem with tracking the vruntime state is that while it helps
detach_task_cfs_rq(), attach_task_cfs_rq() is still left wondering what
it should return to.

So we do indeed need something to determine, based on the current state,
if vruntime should be normalized.

/me ponders moar

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-09-01 15:03         ` Peter Zijlstra
@ 2015-09-02  2:33           ` Byungchul Park
  2015-09-02  8:14             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Byungchul Park @ 2015-09-02  2:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, yuyang.du, efault, tglx

On Tue, Sep 01, 2015 at 05:03:43PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 01, 2015 at 09:28:49AM +0900, Byungchul Park wrote:
> 
> > check the condition "!(flags & DEQUEUE_SLEEP)" for doing normalizing in
> > dequeue_entity(). i think you have to keep my original comment, or
> > modify your comment to something like below.
> > 
> > before - If it's !queued, sleeping tasks have a normalized vruntime,
> > after - If it's !queued, sleeping tasks have a non-normalize vruntime,
> > 
> > but.. i think it would be better that you keep my original comment..
> 
> The comment we can talk about later, but I think the condition:
> 
> > > -	if (p->state == TASK_RUNNING)
> > > +	if (!p->se.on_rq)
> 
> is important now. Both are broken in different ways.
> 
> 	p->state == TASK_RUNNING
> 
> is broken in this scenario:
> 
> 	CPU0					CPU1
> 
> 	set_current_state(TASK_UNINTERRUPTIBLE);

here, the target task has not been dequeued yet (the task will be 
dequeued within schedule()). so, queued is true when sched_move_task()
is called. sched_move_task()->dequeue_task(flag=0) will normalize as
we expect. that is anyway no problem. however, i also think the 
condition here can make us confused, too.

i think "p->state == TASK_RUNNING" condition can work anyway, which
already exists in original code. the condition is only for checking
if the task is being migrated or in sleep. how about this appoach
below to make code more readable?

> 
> 						sched_move_task()
> 						  task_move_group_fair()
> 						    vruntime_normalized() == true
> 	if (!cond)
> 		schedule();
> 	__set_current_state(TASK_RUNNING);
> 
> 
> Now the proposed replacement:
> 
> 	!p->se.on_rq
> 
> is equally broken, because (as you point out) clearing it isn't
> conditional on DEQUEUE_SLEEP.
> 

i think we must take a task's on_rq into account instead of se's on_rq,
because current code set se's on_rq to 0 even when migrating a task. but
task's on_rq would be set to TASK_ON_RQ_MIGRATING when migrating the
task. there it would be ok if we use task's on_rq to check if it has
a normailzed vruntime or not like below.

---

>From 9545e34c2f6fd195f18d3111b5c1d1b03600cdcd Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Wed, 2 Sep 2015 10:59:58 +0900
Subject: [PATCH] sched: make vruntime_normalized() cleaner

in both TASK_ON_RQ_QUEUED case and TASK_ON_RQ_MIGRATING case, the
target task have a normalized vruntime by dequeue_entity(.flags=0).
so we don't need to check this separately with a different condition
statement.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 53d0e30..dfe8754 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7920,14 +7920,14 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
 
 static inline bool vruntime_normalized(struct task_struct *p)
 {
-	int queued = task_on_rq_queued(p);
 	struct sched_entity *se = &p->se;
 
 	/*
-	 * If it's queued, then the dequeue_entity(.flags=0) will already
-	 * have normalized the vruntime.
+	 * In both TASK_ON_RQ_QUEUED case and TASK_ON_RQ_MIGRATING case,
+	 * the dequeue_entity(.flags=0) will already have normalized the
+	 * vruntime.
 	 */
-	if (queued)
+	if (p->on_rq)
 		return true;
 
 	/*
@@ -7942,14 +7942,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
 	if (!se->sum_exec_runtime || p->state == TASK_WAKING)
 		return true;
 
-	/*
-	 * If it's !queued, then only when the task is sleeping it has a
-	 * non-normalized vruntime, that is, when the task is being migrated
-	 * it has a normailized vruntime.
-	 */
-	if (p->state == TASK_RUNNING)
-		return true;
-
 	return false;
 }
 
-- 
1.7.9.5

> 
> And the problem with tracking the vruntime state is that while it helps
> detach_task_cfs_rq(), attach_task_cfs_rq() is still left wondering what
> it should return to.
> 
> So we do indeed need something to determine, based on the current state,
> if vruntime should be normalized.
> 
> /me ponders moar
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-09-02  2:33           ` Byungchul Park
@ 2015-09-02  8:14             ` Peter Zijlstra
  2015-09-02  8:29               ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2015-09-02  8:14 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, yuyang.du, efault, tglx

On Wed, Sep 02, 2015 at 11:33:15AM +0900, Byungchul Park wrote:
> +++ b/kernel/sched/fair.c
> @@ -7920,14 +7920,14 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
>  
>  static inline bool vruntime_normalized(struct task_struct *p)
>  {
>  	struct sched_entity *se = &p->se;
>  
>  	/*
> +	 * In both TASK_ON_RQ_QUEUED case and TASK_ON_RQ_MIGRATING case,
> +	 * the dequeue_entity(.flags=0) will already have normalized the
> +	 * vruntime.
>  	 */
> +	if (p->on_rq)
>  		return true;
>  

You're right. And yes this is nicer. Thanks!

I've folded it into your 5/5 patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq
  2015-09-02  8:14             ` Peter Zijlstra
@ 2015-09-02  8:29               ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2015-09-02  8:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Byungchul Park, linux-kernel, yuyang.du, efault, tglx


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Sep 02, 2015 at 11:33:15AM +0900, Byungchul Park wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -7920,14 +7920,14 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
> >  
> >  static inline bool vruntime_normalized(struct task_struct *p)
> >  {
> >  	struct sched_entity *se = &p->se;
> >  
> >  	/*
> > +	 * In both TASK_ON_RQ_QUEUED case and TASK_ON_RQ_MIGRATING case,

s/         In both the TASK_ON_RQ_QUEUED and TASK_ON_RQ_MIGRATING cases,

> > +	 * the dequeue_entity(.flags=0) will already have normalized the
> > +	 * vruntime.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/fair: Factor out the {at, de}taching of the per entity load {to,from} the runqueue
  2015-08-20 11:21 ` [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se byungchul.park
@ 2015-09-13 10:58   ` tip-bot for Byungchul Park
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Byungchul Park @ 2015-09-13 10:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, linux-kernel, peterz, hpa, torvalds, byungchul.park,
	efault

Commit-ID:  a05e8c51ff097ff73ec2947631d9102283545f7c
Gitweb:     http://git.kernel.org/tip/a05e8c51ff097ff73ec2947631d9102283545f7c
Author:     Byungchul Park <byungchul.park@lge.com>
AuthorDate: Thu, 20 Aug 2015 20:21:56 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 09:52:45 +0200

sched/fair: Factor out the {at,de}taching of the per entity load {to,from} the runqueue

Currently we open-code the addition/subtraction of the per entity load
to/from the runqueue, factor this out into helper functions.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Rewrote the changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1440069720-27038-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 77 ++++++++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e2e348..a72a71b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2664,8 +2664,8 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
-	int decayed;
 	struct sched_avg *sa = &cfs_rq->avg;
+	int decayed;
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
@@ -2695,33 +2695,52 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 static inline void update_load_avg(struct sched_entity *se, int update_tg)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-	int cpu = cpu_of(rq_of(cfs_rq));
 	u64 now = cfs_rq_clock_task(cfs_rq);
+	int cpu = cpu_of(rq_of(cfs_rq));
 
 	/*
 	 * Track task load average for carrying it to new CPU after migrated, and
 	 * track group sched_entity load average for task_h_load calc in migration
 	 */
 	__update_load_avg(now, cpu, &se->avg,
-		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
+			  se->on_rq * scale_load_down(se->load.weight),
+			  cfs_rq->curr == se, NULL);
 
 	if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
 		update_tg_load_avg(cfs_rq, 0);
 }
 
+static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	se->avg.last_update_time = cfs_rq->avg.last_update_time;
+	cfs_rq->avg.load_avg += se->avg.load_avg;
+	cfs_rq->avg.load_sum += se->avg.load_sum;
+	cfs_rq->avg.util_avg += se->avg.util_avg;
+	cfs_rq->avg.util_sum += se->avg.util_sum;
+}
+
+static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
+			  cfs_rq->curr == se, NULL);
+
+	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
+	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
+	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
+	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+}
+
 /* Add the load generated by se into cfs_rq's load average */
 static inline void
 enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	struct sched_avg *sa = &se->avg;
 	u64 now = cfs_rq_clock_task(cfs_rq);
-	int migrated = 0, decayed;
+	int migrated, decayed;
 
-	if (sa->last_update_time == 0) {
-		sa->last_update_time = now;
-		migrated = 1;
-	}
-	else {
+	migrated = !sa->last_update_time;
+	if (!migrated) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);
@@ -2732,12 +2751,8 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	cfs_rq->runnable_load_avg += sa->load_avg;
 	cfs_rq->runnable_load_sum += sa->load_sum;
 
-	if (migrated) {
-		cfs_rq->avg.load_avg += sa->load_avg;
-		cfs_rq->avg.load_sum += sa->load_sum;
-		cfs_rq->avg.util_avg += sa->util_avg;
-		cfs_rq->avg.util_sum += sa->util_sum;
-	}
+	if (migrated)
+		attach_entity_load_avg(cfs_rq, se);
 
 	if (decayed || migrated)
 		update_tg_load_avg(cfs_rq, 0);
@@ -2752,7 +2767,7 @@ dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	cfs_rq->runnable_load_avg =
 		max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
 	cfs_rq->runnable_load_sum =
-		max_t(s64, cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
+		max_t(s64,  cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
 }
 
 /*
@@ -2820,6 +2835,11 @@ static inline void
 dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
 
+static inline void
+attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void
+detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+
 static inline int idle_balance(struct rq *rq)
 {
 	return 0;
@@ -7909,25 +7929,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 		se->vruntime -= cfs_rq->min_vruntime;
 	}
 
-#ifdef CONFIG_SMP
 	/* Catch up with the cfs_rq and remove our load when we leave */
-	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
-		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
-
-	cfs_rq->avg.load_avg =
-		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum =
-		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg =
-		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum =
-		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
-#endif
+	detach_entity_load_avg(cfs_rq, se);
 }
 
-/*
- * We switched to the sched_fair class.
- */
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
@@ -8040,14 +8045,8 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 		cfs_rq = cfs_rq_of(se);
 		se->vruntime += cfs_rq->min_vruntime;
 
-#ifdef CONFIG_SMP
 		/* Virtually synchronize task with its new cfs_rq */
-		p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
-		cfs_rq->avg.load_avg += p->se.avg.load_avg;
-		cfs_rq->avg.load_sum += p->se.avg.load_sum;
-		cfs_rq->avg.util_avg += p->se.avg.util_avg;
-		cfs_rq->avg.util_sum += p->se.avg.util_sum;
-#endif
+		attach_entity_load_avg(cfs_rq, se);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/fair: Have task_move_group_fair() unconditionally add the entity load to the runqueue
  2015-08-20 11:21 ` [PATCH v4 2/5] sched: make task_move_group_fair adjust cfs_rq's load in case of queued byungchul.park
@ 2015-09-13 10:59   ` tip-bot for Byungchul Park
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Byungchul Park @ 2015-09-13 10:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, linux-kernel, hpa, byungchul.park, efault, tglx, mingo,
	peterz

Commit-ID:  50a2a3b246149d041065a67ccb3e98145f780a2f
Gitweb:     http://git.kernel.org/tip/50a2a3b246149d041065a67ccb3e98145f780a2f
Author:     Byungchul Park <byungchul.park@lge.com>
AuthorDate: Thu, 20 Aug 2015 20:21:57 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 09:52:46 +0200

sched/fair: Have task_move_group_fair() unconditionally add the entity load to the runqueue

Currently we conditionally add the entity load to the rq when moving
the task between cgroups.

This doesn't make sense as we always 'migrate' the task between
cgroups, so we should always migrate the load too.

[ The history here is that we used to only migrate the blocked load
  which was only meaningfull when !queued. ]

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Rewrote the changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1440069720-27038-3-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a72a71b..959b2ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8041,13 +8041,12 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 		se->vruntime -= cfs_rq_of(se)->min_vruntime;
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
-	if (!queued) {
-		cfs_rq = cfs_rq_of(se);
+	cfs_rq = cfs_rq_of(se);
+	if (!queued)
 		se->vruntime += cfs_rq->min_vruntime;
 
-		/* Virtually synchronize task with its new cfs_rq */
-		attach_entity_load_avg(cfs_rq, se);
-	}
+	/* Virtually synchronize task with its new cfs_rq */
+	attach_entity_load_avg(cfs_rq, se);
 }
 
 void free_fair_sched_group(struct task_group *tg)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/fair: Have task_move_group_fair() also detach entity load from the old runqueue
  2015-08-20 11:21 ` [PATCH v4 3/5] sched: sync a se with prev cfs_rq when changing cgroup byungchul.park
@ 2015-09-13 10:59   ` tip-bot for Byungchul Park
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Byungchul Park @ 2015-09-13 10:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: byungchul.park, hpa, peterz, torvalds, tglx, mingo, linux-kernel,
	efault

Commit-ID:  1746babbb15594ba2d8d8196589bbbc2b5ff51c9
Gitweb:     http://git.kernel.org/tip/1746babbb15594ba2d8d8196589bbbc2b5ff51c9
Author:     Byungchul Park <byungchul.park@lge.com>
AuthorDate: Thu, 20 Aug 2015 20:21:58 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 09:52:47 +0200

sched/fair: Have task_move_group_fair() also detach entity load from the old runqueue

Since we attach the entity load to the new runqueue, we should also
detatch the entity load from the old runqueue, otherwise load can
accumulate.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Rewrote the changelog. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1440069720-27038-4-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 959b2ea..1e1fe7f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8037,8 +8037,12 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
 		queued = 1;
 
+	cfs_rq = cfs_rq_of(se);
 	if (!queued)
-		se->vruntime -= cfs_rq_of(se)->min_vruntime;
+		se->vruntime -= cfs_rq->min_vruntime;
+
+	/* Synchronize task with its prev cfs_rq */
+	detach_entity_load_avg(cfs_rq, se);
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 	cfs_rq = cfs_rq_of(se);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/fair: Fix switched_to_fair()' s per entity load tracking
  2015-08-20 11:21 ` [PATCH v4 4/5] sched: sync a se with its cfs_rq when switching sched class to fair class byungchul.park
@ 2015-09-13 10:59   ` tip-bot for Byungchul Park
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot for Byungchul Park @ 2015-09-13 10:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, torvalds, tglx, mingo, byungchul.park, peterz, hpa,
	linux-kernel

Commit-ID:  6efdb105d392da3ad5cb4ef951aed373cd049813
Gitweb:     http://git.kernel.org/tip/6efdb105d392da3ad5cb4ef951aed373cd049813
Author:     Byungchul Park <byungchul.park@lge.com>
AuthorDate: Thu, 20 Aug 2015 20:21:59 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 09:52:48 +0200

sched/fair: Fix switched_to_fair()'s per entity load tracking

Where switched_from_fair() will remove the entity's load from the
runqueue, switched_to_fair() does not currently add it back. This
means that when a task leaves the fair class for a short duration; say
because of PI; we loose its load contribution.

This can ripple forward and disturb the load tracking because other
operations (enqueue, dequeue) assume its factored in. Only once the
runqueue empties will the load tracking recover.

When we add it back in, age the per entity average to match up with
the runqueue age. This has the obvious problem that if the task leaves
the fair class for a significant time, the load will age to 0.

Employ the normal migration rule for inter-runqueue moves in
task_move_group_fair(). Again, there is the obvious problem of the
task migrating while not in the fair class.

The alternative solution would be to to omit the chunk in
attach_entity_load_avg(), which would effectively reset the timestamp
and use whatever avg there was.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Rewrote the changelog and comments. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1440069720-27038-5-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e1fe7f..5143ea0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2712,6 +2712,20 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	/*
+	 * If we got migrated (either between CPUs or between cgroups) we'll
+	 * have aged the average right before clearing @last_update_time.
+	 */
+	if (se->avg.last_update_time) {
+		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+				  &se->avg, 0, 0, NULL);
+
+		/*
+		 * XXX: we could have just aged the entire load away if we've been
+		 * absent from the fair class for too long.
+		 */
+	}
+
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -7945,6 +7959,9 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
+	/* Synchronize task with its cfs_rq */
+	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+
 	if (!task_on_rq_queued(p)) {
 
 		/*
@@ -8044,6 +8061,12 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 	/* Synchronize task with its prev cfs_rq */
 	detach_entity_load_avg(cfs_rq, se);
 	set_task_rq(p, task_cpu(p));
+
+#ifdef CONFIG_SMP
+	/* Tell se's cfs_rq has been changed -- migrated */
+	p->se.avg.last_update_time = 0;
+#endif
+
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 	cfs_rq = cfs_rq_of(se);
 	if (!queued)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip:sched/core] sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair()
  2015-08-20 11:22 ` [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq byungchul.park
  2015-08-20 11:35   ` Byungchul Park
@ 2015-09-13 11:00   ` tip-bot for Byungchul Park
  1 sibling, 0 replies; 22+ messages in thread
From: tip-bot for Byungchul Park @ 2015-09-13 11:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, torvalds, byungchul.park, peterz, linux-kernel, tglx, efault,
	mingo

Commit-ID:  daa59407b558e6e621e9081a308d5db3ef991fb6
Gitweb:     http://git.kernel.org/tip/daa59407b558e6e621e9081a308d5db3ef991fb6
Author:     Byungchul Park <byungchul.park@lge.com>
AuthorDate: Thu, 20 Aug 2015 20:22:00 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 09:52:49 +0200

sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair()

By observing that switched_from_fair() detaches from a runqueue, and
switched_to_fair() attaches to a runqueue, we can see that
task_move_group_fair() is one followed by the other with flipping the
runqueue in between.

Therefore extract all the common bits and implement all three
functions in terms of them.

This should fix a few corner cases wrt. vruntime normalization; where,
when we take a task off of a runqueue we convert to an approximation
of lag by subtracting min_vruntime, and when placing a task on the a
runqueue to the reverse.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[peterz: Changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yuyang.du@intel.com
Link: http://lkml.kernel.org/r/1440069720-27038-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 129 +++++++++++++++++++++-------------------------------
 1 file changed, 52 insertions(+), 77 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5cd7054..b96d8dd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7924,21 +7924,39 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
 		check_preempt_curr(rq, p, 0);
 }
 
-static void switched_from_fair(struct rq *rq, struct task_struct *p)
+static inline bool vruntime_normalized(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 	/*
-	 * Ensure the task's vruntime is normalized, so that when it's
-	 * switched back to the fair class the enqueue_entity(.flags=0) will
-	 * do the right thing.
+	 * In both the TASK_ON_RQ_QUEUED and TASK_ON_RQ_MIGRATING cases,
+	 * the dequeue_entity(.flags=0) will already have normalized the
+	 * vruntime.
+	 */
+	if (p->on_rq)
+		return true;
+
+	/*
+	 * When !on_rq, vruntime of the task has usually NOT been normalized.
+	 * But there are some cases where it has already been normalized:
 	 *
-	 * If it's queued, then the dequeue_entity(.flags=0) will already
-	 * have normalized the vruntime, if it's !queued, then only when
-	 * the task is sleeping will it still have non-normalized vruntime.
+	 * - A forked child which is waiting for being woken up by
+	 *   wake_up_new_task().
+	 * - A task which has been woken up by try_to_wake_up() and
+	 *   waiting for actually being woken up by sched_ttwu_pending().
 	 */
-	if (!task_on_rq_queued(p) && p->state != TASK_RUNNING) {
+	if (!se->sum_exec_runtime || p->state == TASK_WAKING)
+		return true;
+
+	return false;
+}
+
+static void detach_task_cfs_rq(struct task_struct *p)
+{
+	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	if (!vruntime_normalized(p)) {
 		/*
 		 * Fix up our vruntime so that the current sleep doesn't
 		 * cause 'unlimited' sleep bonus.
@@ -7951,9 +7969,10 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 	detach_entity_load_avg(cfs_rq, se);
 }
 
-static void switched_to_fair(struct rq *rq, struct task_struct *p)
+static void attach_task_cfs_rq(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
@@ -7964,33 +7983,32 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 #endif
 
 	/* Synchronize task with its cfs_rq */
-	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+	attach_entity_load_avg(cfs_rq, se);
+
+	if (!vruntime_normalized(p))
+		se->vruntime += cfs_rq->min_vruntime;
+}
 
-	if (!task_on_rq_queued(p)) {
+static void switched_from_fair(struct rq *rq, struct task_struct *p)
+{
+	detach_task_cfs_rq(p);
+}
+
+static void switched_to_fair(struct rq *rq, struct task_struct *p)
+{
+	attach_task_cfs_rq(p);
 
+	if (task_on_rq_queued(p)) {
 		/*
-		 * Ensure the task has a non-normalized vruntime when it is switched
-		 * back to the fair class with !queued, so that enqueue_entity() at
-		 * wake-up time will do the right thing.
-		 *
-		 * If it's queued, then the enqueue_entity(.flags=0) makes the task
-		 * has non-normalized vruntime, if it's !queued, then it still has
-		 * normalized vruntime.
+		 * We were most likely switched from sched_rt, so
+		 * kick off the schedule if running, otherwise just see
+		 * if we can still preempt the current task.
 		 */
-		if (p->state != TASK_RUNNING)
-			se->vruntime += cfs_rq_of(se)->min_vruntime;
-		return;
+		if (rq->curr == p)
+			resched_curr(rq);
+		else
+			check_preempt_curr(rq, p, 0);
 	}
-
-	/*
-	 * We were most likely switched from sched_rt, so
-	 * kick off the schedule if running, otherwise just see
-	 * if we can still preempt the current task.
-	 */
-	if (rq->curr == p)
-		resched_curr(rq);
-	else
-		check_preempt_curr(rq, p, 0);
 }
 
 /* Account for a task changing its policy or group.
@@ -8027,57 +8045,14 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_move_group_fair(struct task_struct *p, int queued)
 {
-	struct sched_entity *se = &p->se;
-	struct cfs_rq *cfs_rq;
-
-	/*
-	 * If the task was not on the rq at the time of this cgroup movement
-	 * it must have been asleep, sleeping tasks keep their ->vruntime
-	 * absolute on their old rq until wakeup (needed for the fair sleeper
-	 * bonus in place_entity()).
-	 *
-	 * If it was on the rq, we've just 'preempted' it, which does convert
-	 * ->vruntime to a relative base.
-	 *
-	 * Make sure both cases convert their relative position when migrating
-	 * to another cgroup's rq. This does somewhat interfere with the
-	 * fair sleeper stuff for the first placement, but who cares.
-	 */
-	/*
-	 * When !queued, vruntime of the task has usually NOT been normalized.
-	 * But there are some cases where it has already been normalized:
-	 *
-	 * - Moving a forked child which is waiting for being woken up by
-	 *   wake_up_new_task().
-	 * - Moving a task which has been woken up by try_to_wake_up() and
-	 *   waiting for actually being woken up by sched_ttwu_pending().
-	 *
-	 * To prevent boost or penalty in the new cfs_rq caused by delta
-	 * min_vruntime between the two cfs_rqs, we skip vruntime adjustment.
-	 */
-	if (!queued && (!se->sum_exec_runtime || p->state == TASK_WAKING))
-		queued = 1;
-
-	cfs_rq = cfs_rq_of(se);
-	if (!queued)
-		se->vruntime -= cfs_rq->min_vruntime;
-
-	/* Synchronize task with its prev cfs_rq */
-	detach_entity_load_avg(cfs_rq, se);
+	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
 #endif
-
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-	cfs_rq = cfs_rq_of(se);
-	if (!queued)
-		se->vruntime += cfs_rq->min_vruntime;
-
-	/* Virtually synchronize task with its new cfs_rq */
-	attach_entity_load_avg(cfs_rq, se);
+	attach_task_cfs_rq(p);
 }
 
 void free_fair_sched_group(struct task_group *tg)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-09-13 11:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-20 11:21 [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it byungchul.park
2015-08-20 11:21 ` [PATCH v4 1/5] sched: add two functions adjusting cfs_rq's load when att(det)aching a se byungchul.park
2015-09-13 10:58   ` [tip:sched/core] sched/fair: Factor out the {at, de}taching of the per entity load {to,from} the runqueue tip-bot for Byungchul Park
2015-08-20 11:21 ` [PATCH v4 2/5] sched: make task_move_group_fair adjust cfs_rq's load in case of queued byungchul.park
2015-09-13 10:59   ` [tip:sched/core] sched/fair: Have task_move_group_fair() unconditionally add the entity load to the runqueue tip-bot for Byungchul Park
2015-08-20 11:21 ` [PATCH v4 3/5] sched: sync a se with prev cfs_rq when changing cgroup byungchul.park
2015-09-13 10:59   ` [tip:sched/core] sched/fair: Have task_move_group_fair() also detach entity load from the old runqueue tip-bot for Byungchul Park
2015-08-20 11:21 ` [PATCH v4 4/5] sched: sync a se with its cfs_rq when switching sched class to fair class byungchul.park
2015-09-13 10:59   ` [tip:sched/core] sched/fair: Fix switched_to_fair()' s per entity load tracking tip-bot for Byungchul Park
2015-08-20 11:22 ` [PATCH v4 5/5] sched: add two functions for att(det)aching a task to(from) a cfs_rq byungchul.park
2015-08-20 11:35   ` Byungchul Park
2015-08-31 15:21     ` Peter Zijlstra
2015-08-31 15:37       ` Peter Zijlstra
2015-09-01  0:28       ` Byungchul Park
2015-09-01  2:59         ` Byungchul Park
2015-09-01 15:03         ` Peter Zijlstra
2015-09-02  2:33           ` Byungchul Park
2015-09-02  8:14             ` Peter Zijlstra
2015-09-02  8:29               ` Ingo Molnar
2015-09-13 11:00   ` [tip:sched/core] sched/fair: Unify switched_{from,to}_fair() and task_move_group_fair() tip-bot for Byungchul Park
2015-08-22  7:45 ` [PATCH v4 0/5] sync a se with its cfs_rq when att(det)aching it Byungchul Park
2015-08-25 23:54 ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox