From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
mingo@kernel.org, linux-kernel@vger.kernel.org
Cc: Pavan Kondeti <pkondeti@codeaurora.org>,
Ben Segall <bsegall@google.com>,
Matt Fleming <matt@codeblueprint.co.uk>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Paul Turner <pjt@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
byungchul.park@lge.com, Andrew Hunter <ahh@google.com>
Subject: Re: [PATCH 2/3] sched,fair: Fix local starvation
Date: Sat, 21 May 2016 21:00:36 +0200 [thread overview]
Message-ID: <1463857236.10353.5.camel@gmail.com> (raw)
In-Reply-To: <1463839488.24578.45.camel@suse.de>
On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> and likely take a size XXL latency hit. Big box running master bled
> profusely under heavy load until I turned TTWU_QUEUE off.
The below made big box a happy camper again.
sched/fair: Move se->vruntime normalization state into struct sched_entity
Make ->vruntime normalization state explicit.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 42 ++++++++++++++----------------------------
3 files changed, 16 insertions(+), 28 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1348,6 +1348,7 @@ struct sched_entity {
struct rb_node run_node;
struct list_head group_node;
unsigned int on_rq;
+ bool normalized;
u64 exec_start;
u64 sum_exec_runtime;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2259,6 +2259,7 @@ static void __sched_fork(unsigned long c
p->se.prev_sum_exec_runtime = 0;
p->se.nr_migrations = 0;
p->se.vruntime = 0;
+ p->se.normalized = true;
INIT_LIST_HEAD(&p->se.group_node);
#ifdef CONFIG_FAIR_GROUP_SCHED
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3305,14 +3305,13 @@ static inline void check_schedstat_requi
static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
- bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
bool curr = cfs_rq->curr == se;
/*
* If we're the current task, we must renormalise before calling
* update_curr().
*/
- if (renorm && curr)
+ if (se->normalized && curr)
se->vruntime += cfs_rq->min_vruntime;
update_curr(cfs_rq);
@@ -3323,9 +3322,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
* placed in the past could significantly boost this task to the
* fairness detriment of existing tasks.
*/
- if (renorm && !curr)
+ if (se->normalized && !curr)
se->vruntime += cfs_rq->min_vruntime;
+ se->normalized = false;
+
enqueue_entity_load_avg(cfs_rq, se);
account_entity_enqueue(cfs_rq, se);
update_cfs_shares(cfs_rq);
@@ -3422,8 +3423,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
* update can refer to the ->curr item and we need to reflect this
* movement in our normalized position.
*/
- if (!(flags & DEQUEUE_SLEEP))
+ if (!(flags & DEQUEUE_SLEEP)) {
se->vruntime -= cfs_rq->min_vruntime;
+ se->normalized = true;
+ }
/* return excess runtime on last dequeue */
return_cfs_rq_runtime(cfs_rq);
@@ -5681,6 +5684,7 @@ static void migrate_task_rq_fair(struct
#endif
se->vruntime -= min_vruntime;
+ se->normalized = true;
}
/*
@@ -8591,6 +8595,7 @@ static void task_fork_fair(struct task_s
}
se->vruntime -= cfs_rq->min_vruntime;
+ se->normalized = true;
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
@@ -8619,29 +8624,7 @@ prio_changed_fair(struct rq *rq, struct
static inline bool vruntime_normalized(struct task_struct *p)
{
- struct sched_entity *se = &p->se;
-
- /*
- * In both the TASK_ON_RQ_QUEUED and TASK_ON_RQ_MIGRATING cases,
- * the dequeue_entity(.flags=0) will already have normalized the
- * vruntime.
- */
- if (p->on_rq)
- return true;
-
- /*
- * When !on_rq, vruntime of the task has usually NOT been normalized.
- * But there are some cases where it has already been normalized:
- *
- * - A forked child which is waiting for being woken up by
- * wake_up_new_task().
- * - A task which has been woken up by try_to_wake_up() and
- * waiting for actually being woken up by sched_ttwu_pending().
- */
- if (!se->sum_exec_runtime || p->state == TASK_WAKING)
- return true;
-
- return false;
+ return p->se.normalized;
}
static void detach_task_cfs_rq(struct task_struct *p)
@@ -8656,6 +8639,7 @@ static void detach_task_cfs_rq(struct ta
*/
place_entity(cfs_rq, se, 0);
se->vruntime -= cfs_rq->min_vruntime;
+ se->normalized = true;
}
/* Catch up with the cfs_rq and remove our load when we leave */
@@ -8678,8 +8662,10 @@ static void attach_task_cfs_rq(struct ta
/* Synchronize task with its cfs_rq */
attach_entity_load_avg(cfs_rq, se);
- if (!vruntime_normalized(p))
+ if (vruntime_normalized(p)) {
se->vruntime += cfs_rq->min_vruntime;
+ se->normalized = false;
+ }
}
static void switched_from_fair(struct rq *rq, struct task_struct *p)
next prev parent reply other threads:[~2016-05-21 19:00 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-10 17:43 [PATCH 0/3] sched: Fix wakeup preemption regression Peter Zijlstra
2016-05-10 17:43 ` [PATCH 1/3] sched,fair: Move record_wakee() Peter Zijlstra
2016-05-12 10:27 ` Matt Fleming
2016-05-12 10:31 ` Peter Zijlstra
2016-05-10 17:43 ` [PATCH 2/3] sched,fair: Fix local starvation Peter Zijlstra
2016-05-10 20:21 ` Ingo Molnar
2016-05-10 22:23 ` Peter Zijlstra
2016-05-20 21:24 ` Matt Fleming
2016-05-21 14:04 ` Mike Galbraith
2016-05-21 19:00 ` Mike Galbraith [this message]
2016-05-22 7:00 ` [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Mike Galbraith
2016-05-22 9:36 ` Peter Zijlstra
2016-05-22 9:52 ` Mike Galbraith
2016-05-22 10:33 ` Peter Zijlstra
2016-05-23 9:19 ` Peter Zijlstra
2016-05-23 9:40 ` Mike Galbraith
2016-05-23 10:13 ` Wanpeng Li
2016-05-23 10:26 ` Mike Galbraith
2016-05-23 12:28 ` Peter Zijlstra
2016-05-25 7:12 ` [tip:sched/urgent] sched/core: Fix remote wakeups tip-bot for Peter Zijlstra
2016-05-22 6:50 ` [PATCH 2/3] sched,fair: Fix local starvation Wanpeng Li
2016-05-22 7:15 ` Mike Galbraith
2016-05-22 7:27 ` Wanpeng Li
2016-05-22 7:32 ` Mike Galbraith
2016-05-22 7:42 ` Wanpeng Li
2016-05-22 8:04 ` Mike Galbraith
2016-05-22 8:24 ` Wanpeng Li
2016-05-22 8:39 ` Mike Galbraith
2016-05-22 8:50 ` Wanpeng Li
2016-05-10 17:43 ` [PATCH 3/3] sched: Kill sched_class::task_waking Peter Zijlstra
2016-05-11 5:55 ` [PATCH 0/3] sched: Fix wakeup preemption regression Mike Galbraith
2016-05-12 9:56 ` Pavan Kondeti
2016-05-12 10:52 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1463857236.10353.5.camel@gmail.com \
--to=umgwanakikbuti@gmail.com \
--cc=ahh@google.com \
--cc=bsegall@google.com \
--cc=byungchul.park@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mingo@kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=pkondeti@codeaurora.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.