All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Ben Segall <bsegall@google.com>, Yuyang Du <yuyang.du@intel.com>
Subject: Re: [RFC][PATCH 03/14] sched/fair: Remove se->load.weight from se->avg.load_sum
Date: Wed, 17 May 2017 11:50:45 +0200	[thread overview]
Message-ID: <20170517095045.GA8420@linaro.org> (raw)
In-Reply-To: <CAKfTPtB3vgmBcM=cwDBFWCNVvdk9eupbotYeP7oN98yUeaVioA@mail.gmail.com>

Le Wednesday 17 May 2017 à 09:04:47 (+0200), Vincent Guittot a écrit :
> Hi Peter,
> 
> On 12 May 2017 at 18:44, Peter Zijlstra <peterz@infradead.org> wrote:
> > Remove the load from the load_sum for sched_entities, basically
> > turning load_sum into runnable_sum.  This prepares for better
> > reweighting of group entities.
> >
> > Since we now have different rules for computing load_avg, split
> > ___update_load_avg() into two parts, ___update_load_sum() and
> > ___update_load_avg().
> >
> > So for se:
> >
> >   ___update_load_sum(.weight = 1)
> >   ___upate_load_avg(.weight = se->load.weight)
> >
> > and for cfs_rq:
> >
> >   ___update_load_sum(.weight = cfs_rq->load.weight)
> >   ___upate_load_avg(.weight = 1)
> >
> > Since the primary consumable is load_avg, most things will not be
> > affected. Only those few sites that initialize/modify load_sum need
> > attention.
> 
> I wonder if there is a problem with this new way to compute se's
> load_avg and cfs_rq's load_avg when a task changes is nice prio before
> migrating to another CPU.
> 
> se load_avg is now: runnable x current weight
> but cfs_rq load_avg keeps the history of the previous weight of the se
> When we detach se, we will remove an up to date se's load_avg from
> cfs_rq which doesn't have the up to date load_avg in its own load_avg.
> So if se's prio decreases just before migrating, some load_avg stays
> in prev cfs_rq and if se's prio increases, we will remove too much
> load_avg and possibly make the cfs_rq load_avg null whereas other
> tasks are running.
> 
> Thought ?
> 
> I'm able to reproduce the problem with a simple rt-app use case (after
> adding a new feature in rt-app)
> 
> Vincent
>

The hack below fixes the problem I mentioned above. It applies on task what is
done when updating group_entity's weight

---
 kernel/sched/core.c | 26 +++++++++++++++++++-------
 kernel/sched/fair.c | 20 ++++++++++++++++++++
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 09e0996..f327ab6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -732,7 +732,9 @@ int tg_nop(struct task_group *tg, void *data)
 }
 #endif
 
-static void set_load_weight(struct task_struct *p)
+extern void reweight_task(struct task_struct *p, int prio);
+
+static void set_load_weight(struct task_struct *p, bool update_load)
 {
 	int prio = p->static_prio - MAX_RT_PRIO;
 	struct load_weight *load = &p->se.load;
@@ -746,8 +748,18 @@ static void set_load_weight(struct task_struct *p)
 		return;
 	}
 
-	load->weight = scale_load(sched_prio_to_weight[prio]);
-	load->inv_weight = sched_prio_to_wmult[prio];
+	/*
+	 * SCHED_OTHER tasks have to update their load when changing their
+	 * weight
+	 */
+	if (update_load) {
+		reweight_task(p, prio);
+	} else {
+		load->weight = scale_load(sched_prio_to_weight[prio]);
+		load->inv_weight = sched_prio_to_wmult[prio];
+	}
+
+
 }
 
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -2373,7 +2385,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 			p->static_prio = NICE_TO_PRIO(0);
 
 		p->prio = p->normal_prio = __normal_prio(p);
-		set_load_weight(p);
+		set_load_weight(p, false);
 
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
@@ -3854,7 +3866,7 @@ void set_user_nice(struct task_struct *p, long nice)
 		put_prev_task(rq, p);
 
 	p->static_prio = NICE_TO_PRIO(nice);
-	set_load_weight(p);
+	set_load_weight(p, true);
 	old_prio = p->prio;
 	p->prio = effective_prio(p);
 	delta = p->prio - old_prio;
@@ -4051,7 +4063,7 @@ static void __setscheduler_params(struct task_struct *p,
 	 */
 	p->rt_priority = attr->sched_priority;
 	p->normal_prio = normal_prio(p);
-	set_load_weight(p);
+	set_load_weight(p, fair_policy(policy));
 }
 
 /* Actually do priority change: must hold pi & rq lock. */
@@ -6152,7 +6164,7 @@ void __init sched_init(void)
 		atomic_set(&rq->nr_iowait, 0);
 	}
 
-	set_load_weight(&init_task);
+	set_load_weight(&init_task, false);
 
 	/*
 	 * The boot idle thread does lazy MMU switching as well:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bd2f9f5..3853e34 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2825,6 +2825,26 @@ __sub_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
 }
 
+void reweight_task(struct task_struct *p, int prio)
+{
+	struct sched_entity *se = &p->se;
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	struct load_weight *load = &p->se.load;
+
+	u32 divider = LOAD_AVG_MAX - 1024 + se->avg.period_contrib;
+
+	__sub_load_avg(cfs_rq, se);
+
+	load->weight = scale_load(sched_prio_to_weight[prio]);
+	load->inv_weight = sched_prio_to_wmult[prio];
+
+	se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider);
+	se->avg.runnable_load_avg =
+		div_u64(se_runnable(se) * se->avg.runnable_load_sum, divider);
+
+	__add_load_avg(cfs_rq, se);
+}
+
 static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 			    unsigned long weight, unsigned long runnable)
 {
-- 

  reply	other threads:[~2017-05-17  9:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 16:44 [RFC][PATCH 00/14] sched/fair: A bit of a cgroup/PELT overhaul (again) Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 01/14] sched/fair: Clean up calc_cfs_shares() Peter Zijlstra
2017-05-16  8:02   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 02/14] sched/fair: Add comment to calc_cfs_shares() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 03/14] sched/fair: Remove se->load.weight from se->avg.load_sum Peter Zijlstra
2017-05-17  7:04   ` Vincent Guittot
2017-05-17  9:50     ` Vincent Guittot [this message]
2017-05-17 14:20       ` Peter Zijlstra
2017-09-29 20:11       ` [tip:sched/core] sched/fair: Use reweight_entity() for set_user_nice() tip-bot for Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 04/14] sched/fair: More accurate reweight_entity() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 05/14] sched/fair: Change update_load_avg() arguments Peter Zijlstra
2017-05-17 10:46   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 06/14] sched/fair: Move enqueue migrate handling Peter Zijlstra
2017-05-29 13:41   ` Vincent Guittot
2017-05-12 16:44 ` [RFC][PATCH 07/14] sched/fair: Rewrite cfs_rq->removed_*avg Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 08/14] sched/fair: Rewrite PELT migration propagation Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 09/14] sched/fair: Propagate an effective runnable_load_avg Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 10/14] sched/fair: more obvious Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 11/14] sched/fair: Synchonous PELT detach on load-balance migrate Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 12/14] sched/fair: Cure calc_cfs_shares() vs reweight_entity() Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 13/14] sched/fair: Align PELT windows between cfs_rq and its se Peter Zijlstra
2017-05-12 16:44 ` [RFC][PATCH 14/14] sched/fair: More accurate async detach Peter Zijlstra
2017-05-16 22:02 ` [RFC][PATCH 00/14] sched/fair: A bit of a cgroup/PELT overhaul (again) Tejun Heo
2017-05-17  6:53   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170517095045.GA8420@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=clm@fb.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.