public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Ben Segall <bsegall@google.com>, Yuyang Du <yuyang.du@intel.com>
Subject: Re: [RFC][PATCH 03/14] sched/fair: Remove se->load.weight from se->avg.load_sum
Date: Wed, 17 May 2017 11:50:45 +0200	[thread overview]
Message-ID: <20170517095045.GA8420@linaro.org> (raw)
In-Reply-To: <CAKfTPtB3vgmBcM=cwDBFWCNVvdk9eupbotYeP7oN98yUeaVioA@mail.gmail.com>

Le Wednesday 17 May 2017 à 09:04:47 (+0200), Vincent Guittot a écrit :
> Hi Peter,
> 
> On 12 May 2017 at 18:44, Peter Zijlstra <peterz@infradead.org> wrote:
> > Remove the load from the load_sum for sched_entities, basically
> > turning load_sum into runnable_sum.  This prepares for better
> > reweighting of group entities.
> >
> > Since we now have different rules for computing load_avg, split
> > ___update_load_avg() into two parts, ___update_load_sum() and
> > ___update_load_avg().
> >
> > So for se:
> >
> >   ___update_load_sum(.weight = 1)
> >   ___upate_load_avg(.weight = se->load.weight)
> >
> > and for cfs_rq:
> >
> >   ___update_load_sum(.weight = cfs_rq->load.weight)
> >   ___upate_load_avg(.weight = 1)
> >
> > Since the primary consumable is load_avg, most things will not be
> > affected. Only those few sites that initialize/modify load_sum need
> > attention.
> 
> I wonder if there is a problem with this new way to compute se's
> load_avg and cfs_rq's load_avg when a task changes is nice prio before
> migrating to another CPU.
> 
> se load_avg is now: runnable x current weight
> but cfs_rq load_avg keeps the history of the previous weight of the se
> When we detach se, we will remove an up to date se's load_avg from
> cfs_rq which doesn't have the up to date load_avg in its own load_avg.
> So if se's prio decreases just before migrating, some load_avg stays
> in prev cfs_rq and if se's prio increases, we will remove too much
> load_avg and possibly make the cfs_rq load_avg null whereas other
> tasks are running.
> 
> Thought ?
> 
> I'm able to reproduce the problem with a simple rt-app use case (after
> adding a new feature in rt-app)
> 
> Vincent
>

The hack below fixes the problem I mentioned above. It applies on task what is
done when updating group_entity's weight

---
 kernel/sched/core.c | 26 +++++++++++++++++++-------
 kernel/sched/fair.c | 20 ++++++++++++++++++++
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 09e0996..f327ab6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -732,7 +732,9 @@ int tg_nop(struct task_group *tg, void *data)
 }
 #endif
 
-static void set_load_weight(struct task_struct *p)
+extern void reweight_task(struct task_struct *p, int prio);
+
+static void set_load_weight(struct task_struct *p, bool update_load)
 {
 	int prio = p->static_prio - MAX_RT_PRIO;
 	struct load_weight *load = &p->se.load;
@@ -746,8 +748,18 @@ static void set_load_weight(struct task_struct *p)
 		return;
 	}
 
-	load->weight = scale_load(sched_prio_to_weight[prio]);
-	load->inv_weight = sched_prio_to_wmult[prio];
+	/*
+	 * SCHED_OTHER tasks have to update their load when changing their
+	 * weight
+	 */
+	if (update_load) {
+		reweight_task(p, prio);
+	} else {
+		load->weight = scale_load(sched_prio_to_weight[prio]);
+		load->inv_weight = sched_prio_to_wmult[prio];
+	}
+
+
 }
 
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -2373,7 +2385,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 			p->static_prio = NICE_TO_PRIO(0);
 
 		p->prio = p->normal_prio = __normal_prio(p);
-		set_load_weight(p);
+		set_load_weight(p, false);
 
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
@@ -3854,7 +3866,7 @@ void set_user_nice(struct task_struct *p, long nice)
 		put_prev_task(rq, p);
 
 	p->static_prio = NICE_TO_PRIO(nice);
-	set_load_weight(p);
+	set_load_weight(p, true);
 	old_prio = p->prio;
 	p->prio = effective_prio(p);
 	delta = p->prio - old_prio;
@@ -4051,7 +4063,7 @@ static void __setscheduler_params(struct task_struct *p,
 	 */
 	p->rt_priority = attr->sched_priority;
 	p->normal_prio = normal_prio(p);
-	set_load_weight(p);
+	set_load_weight(p, fair_policy(policy));
 }
 
 /* Actually do priority change: must hold pi & rq lock. */
@@ -6152,7 +6164,7 @@ void __init sched_init(void)
 		atomic_set(&rq->nr_iowait, 0);
 	}
 
-	set_load_weight(&init_task);
+	set_load_weight(&init_task, false);
 
 	/*
 	 * The boot idle thread does lazy MMU switching as well:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bd2f9f5..3853e34 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2825,6 +2825,26 @@ __sub_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
 }
 
+void reweight_task(struct task_struct *p, int prio)
+{
+	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	struct load_weight *load = &p->se.load;
+
+	u32 divider = LOAD_AVG_MAX - 1024 + se->avg.period_contrib;
+
+	__sub_load_avg(cfs_rq, se);
+
+	load->weight = scale_load(sched_prio_to_weight[prio]);
+	load->inv_weight = sched_prio_to_wmult[prio];
+
+	se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider);
+	se->avg.runnable_load_avg =
+		div_u64(se_runnable(se) * se->avg.runnable_load_sum, divider);
+
+	__add_load_avg(cfs_rq, se);
+}
+
 static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 			    unsigned long weight, unsigned long runnable)
 {
-- 

  reply	other threads:[~2017-05-17  9:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 16:44 [RFC][PATCH 00/14] sched/fair: A bit of a cgroup/PELT overhaul (again) Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 01/14] sched/fair: Clean up calc_cfs_shares() Peter Zijlstra
2017-05-16  8:02   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 02/14] sched/fair: Add comment to calc_cfs_shares() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 03/14] sched/fair: Remove se->load.weight from se->avg.load_sum Peter Zijlstra
2017-05-17  7:04   ` Vincent Guittot
2017-05-17  9:50     ` Vincent Guittot [this message]
2017-05-17 14:20       ` Peter Zijlstra
2017-09-29 20:11       ` [tip:sched/core] sched/fair: Use reweight_entity() for set_user_nice() tip-bot for Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 04/14] sched/fair: More accurate reweight_entity() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 05/14] sched/fair: Change update_load_avg() arguments Peter Zijlstra
2017-05-17 10:46   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 06/14] sched/fair: Move enqueue migrate handling Peter Zijlstra
2017-05-29 13:41   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 07/14] sched/fair: Rewrite cfs_rq->removed_*avg Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 08/14] sched/fair: Rewrite PELT migration propagation Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 09/14] sched/fair: Propagate an effective runnable_load_avg Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 10/14] sched/fair: more obvious Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 11/14] sched/fair: Synchonous PELT detach on load-balance migrate Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 12/14] sched/fair: Cure calc_cfs_shares() vs reweight_entity() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 13/14] sched/fair: Align PELT windows between cfs_rq and its se Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 14/14] sched/fair: More accurate async detach Peter Zijlstra
2017-05-16 22:02 ` [RFC][PATCH 00/14] sched/fair: A bit of a cgroup/PELT overhaul (again) Tejun Heo
2017-05-17  6:53   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170517095045.GA8420@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=clm@fb.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox