[tip:sched/core] sched: Set an initial value of runnable avg for new forked task

All of lore.kernel.org
 help / color / mirror / Atom feed

From: tip-bot for Alex Shi <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
	peterz@infradead.org, pjt@google.com, alex.shi@intel.com,
	guz.fnst@cn.fujitsu.com, tglx@linutronix.de
Subject: [tip:sched/core] sched: Set an initial value of runnable avg for new forked task
Date: Thu, 27 Jun 2013 02:01:47 -0700	[thread overview]
Message-ID: <tip-a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648@git.kernel.org> (raw)
In-Reply-To: <1371694737-29336-4-git-send-email-alex.shi@intel.com>

Commit-ID:  a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Gitweb:     http://git.kernel.org/tip/a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648
Author:     Alex Shi <alex.shi@intel.com>
AuthorDate: Thu, 20 Jun 2013 10:18:47 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 27 Jun 2013 10:07:30 +0200

sched: Set an initial value of runnable avg for new forked task

We need to initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task. Otherwise random values of above variables cause a
mess when a new task is enqueued:

    enqueue_task_fair
        enqueue_entity
            enqueue_entity_load_avg

and make fork balancing imbalance due to incorrect load_avg_contrib.

Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().

PeterZ said:

> So the 'problem' is that our running avg is a 'floating' average; ie. it
> decays with time. Now we have to guess about the future of our newly
> spawned task -- something that is nigh impossible seeing these CPU
> vendors keep refusing to implement the crystal ball instruction.
>
> So there's two asymptotic cases we want to deal well with; 1) the case
> where the newly spawned program will be 'nearly' idle for its lifetime;
> and 2) the case where its cpu-bound.
>
> Since we have to guess, we'll go for worst case and assume its
> cpu-bound; now we don't want to make the avg so heavy adjusting to the
> near-idle case takes forever. We want to be able to quickly adjust and
> lower our running avg.
>
> Now we also don't want to make our avg too light, such that it gets
> decremented just for the new task not having had a chance to run yet --
> even if when it would run, it would be more cpu-bound than not.
>
> So what we do is we make the initial avg of the same duration as that we
> guess it takes to run each task on the system at least once -- aka
> sched_slice().
>
> Of course we can defeat this with wakeup/fork bombs, but in the 'normal'
> case it should be good enough.

Paul also contributed most of the code comments in this commit.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Paul Turner <pjt@google.com>
[peterz; added explanation of sched_slice() usage]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  |  6 ++----
 kernel/sched/fair.c  | 24 ++++++++++++++++++++++++
 kernel/sched/sched.h |  2 ++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0241b1b..729e7fc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1611,10 +1611,6 @@ static void __sched_fork(struct task_struct *p)
 	p->se.vruntime			= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
-#ifdef CONFIG_SMP
-	p->se.avg.runnable_avg_period = 0;
-	p->se.avg.runnable_avg_sum = 0;
-#endif
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
@@ -1758,6 +1754,8 @@ void wake_up_new_task(struct task_struct *p)
 	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
 #endif
 
+	/* Initialize new task's runnable average */
+	init_task_runnable_average(p);
 	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
 	p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 36eadaa..e1602a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	return calc_delta_fair(sched_slice(cfs_rq, se), se);
 }
 
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void init_task_runnable_average(struct task_struct *p)
+{
+	u32 slice;
+
+	p->se.avg.decay_count = 0;
+	slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
+	p->se.avg.runnable_avg_sum = slice;
+	p->se.avg.runnable_avg_period = slice;
+	__update_task_entity_contrib(&p->se);
+}
+#else
+void init_task_runnable_average(struct task_struct *p)
+{
+}
+#endif
+
 /*
  * Update the current task's runtime statistics. Skip current tasks that
  * are not in our scheduling class.
@@ -1527,6 +1547,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 	 * We track migrations using entity decay_count <= 0, on a wake-up
 	 * migration we use a negative decay count to track the remote decays
 	 * accumulated while sleeping.
+	 *
+	 * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
+	 * are seen by enqueue_entity_load_avg() as a migration with an already
+	 * constructed load_avg_contrib.
 	 */
 	if (unlikely(se->avg.decay_count <= 0)) {
 		se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 31d25f8..9c65d46 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1048,6 +1048,8 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
 
 extern void update_idle_cpu_load(struct rq *this_rq);
 
+extern void init_task_runnable_average(struct task_struct *p);
+
 #ifdef CONFIG_PARAVIRT
 static inline u64 steal_ticks(u64 steal)
 {

next prev parent reply	other threads:[~2013-06-27  9:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20  2:18 [Resend patch v8 0/13] use runnable load in schedule balance Alex Shi
2013-06-20  2:18 ` [Resend patch v8 01/13] Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" Alex Shi
2013-06-26  5:05   ` Alex Shi
2013-06-26 20:30     ` Ingo Molnar
2013-06-27  1:07       ` Alex Shi
2013-06-27  9:01     ` [tip:sched/core] " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 02/13] sched: move few runnable tg variables into CONFIG_SMP Alex Shi
2013-06-27  9:01   ` [tip:sched/core] sched: Move a " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 03/13] sched: set initial value of runnable avg for new forked task Alex Shi
2013-06-27  9:01   ` tip-bot for Alex Shi [this message]
2013-06-20  2:18 ` [Resend patch v8 04/13] sched: fix slept time double counting in enqueue entity Alex Shi
2013-06-27  9:01   ` [tip:sched/core] sched: Fix sleep time double accounting " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 05/13] sched: update cpu load after task_tick Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched: Update " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 06/13] sched: compute runnable load avg in cpu_load and cpu_avg_load_per_task Alex Shi
2013-06-20 13:29   ` Vincent Guittot
2013-06-24  9:06   ` Alex Shi
2013-06-24 10:54     ` Paul Turner
2013-06-24 11:04     ` Vincent Guittot
2013-06-24 11:06       ` Paul Turner
2013-06-24 14:56         ` Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched: Compute " tip-bot for Alex Shi
2013-06-27 13:30   ` [Resend patch v8 06/13] sched: compute " Alex Shi
2013-06-20  2:18 ` [Resend patch v8 07/13] sched: consider runnable load average in move_tasks Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched: Consider runnable load average in move_tasks() tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 08/13] sched/tg: remove blocked_load_avg in balance Alex Shi
2013-06-20  2:18 ` [Resend patch v8 09/13] sched: change cfs_rq load avg to unsigned long Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched: Change " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 10/13] sched/tg: use 'unsigned long' for load variable in task group Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched/tg: Use " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 11/13] sched/cfs_rq: change atomic64_t removed_load to atomic_long_t Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched/cfs_rq: Change " tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 12/13] sched/tg: remove tg.load_weight Alex Shi
2013-06-27  9:02   ` [tip:sched/core] sched/tg: Remove tg.load_weight tip-bot for Alex Shi
2013-06-20  2:18 ` [Resend patch v8 13/13] sched: get_rq_runnable_load() can be static and inline Alex Shi
2013-06-27  9:03   ` [tip:sched/core] sched: Change get_rq_runnable_load() to " tip-bot for Alex Shi
2013-06-24  3:15 ` [Resend patch v8 0/13] use runnable load in schedule balance Alex Shi
2013-06-24 10:40   ` Paul Turner
2013-06-24 15:37     ` Alex Shi
2013-06-25 13:13       ` Alex Shi
2013-06-28 10:56       ` Paul Turner
2013-06-28 11:07         ` Peter Zijlstra
2013-06-28 11:12           ` Alex Shi
2013-10-28 10:25           ` Frederic Weisbecker
2013-10-28 12:22             ` Peter Zijlstra
     [not found]         ` <CAGjg+kGUo8vqv6hzobuyNoQjipLBXXofZ5q1rUyEehbD3cba9A@mail.gmail.com>
2013-06-28 16:00           ` Paul Turner
2013-07-09  8:53             ` Alex Shi
2013-06-26 14:27 ` Peter Zijlstra
2013-06-26 15:23   ` Alex Shi

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0241b1b dfblob:729e7fc dfblob:36eadaa dfblob:e1602a0
dfblob:31d25f8 dfblob:9c65d46 )
 OR (
bs:"[tip:sched/core] sched: Set an initial value of runnable avg for new forked task" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-a75cdaa915e42ef0e6f38dc7f2a6a1deca91d648@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=alex.shi@intel.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.