linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Fleming <matt@codeblueprint.co.uk>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Yuyang Du <yuyang.du@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>
Subject: [PATCH] sched/fair: Do not decay new task load on first enqueue
Date: Fri, 23 Sep 2016 12:58:08 +0100	[thread overview]
Message-ID: <20160923115808.2330-1-matt@codeblueprint.co.uk> (raw)

Since commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new
tasks") ::last_update_time will be set to a non-zero value in
post_init_entity_util_avg(), which leads to p->se.avg.load_avg being
decayed on enqueue before the task has even had a chance to run.

For a NICE_0 task the sequence of events leading up to this with
example load average changes might be,

  sched_fork()
    init_entity_runnable_average()
      p->se.avg.load_avg = scale_load_down(se->load.weight);	// 1024

  wake_up_new_task()
    post_init_entity_util_avg()
      attach_entity_load_avg()
        p->se.last_update_time = cfs_rq->avg.last_update_time;

    activate_task()
      enqueue_task()
        ...
          enqueue_entity_load_avg()
            migrated = !sa->last_update_time			// false
            if (!migrated)
                    __update_load_avg()
                      p->se.avg.load_avg = 1002

This causes a performance regression for fork intensive workloads like
hackbench. When balancing on fork we can end up picking the same CPU
to enqueue on over and over. This leads to huge congestion when trying
to simultaneously wake up tasks that are all on the same runqueue, and
causes lots of migrations on wake up.

The behaviour since commit 7dc603c9028e essentially defeats the
scheduler's attempt to balance on fork(). Before, ::runnable_load_avg
likely had a non-zero value when the hackbench tasks were dequeued
(the fork()'d tasks immediately block reading on pipe/socket) but now
the load balancer sees the CPU as having no runnable load.

Arguably the real problem is that balancing on fork doesn't look at
the blocked contribution of tasks, only the runnable load and it's
possible for the two metrics to be wildly different on a relatively
idle system.

But it still doesn't seem quite right to update a task's load_avg
before it runs for the first time.

Here are the results of running hackbench before 7dc603c9028e (old
behaviour), with 7dc603c9028e applied (exiting behaviour), and after
7dc603c9028e with this patch on top (new behaviour),

hackbench-process-sockets

                         4.7.0-rc5             4.7.0-rc5             4.7.0-rc5
                            before          7dc603c9028e                 after
Amean    1        0.0611 (  0.00%)      0.0693 (-13.32%)      0.0600 (  1.87%)
Amean    4        0.1777 (  0.00%)      0.1730 (  2.65%)      0.1790 ( -0.72%)
Amean    7        0.2771 (  0.00%)      0.2816 ( -1.60%)      0.2741 (  1.08%)
Amean    12       0.3851 (  0.00%)      0.4167 ( -8.20%)      0.3751 (  2.60%)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8fb4d1942c14..4a2d3ff772f8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	int migrated, decayed;
 
 	migrated = !sa->last_update_time;
-	if (!migrated) {
+	if (!migrated && se->sum_exec_runtime) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);
-- 
2.10.0

             reply	other threads:[~2016-09-23 11:58 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23 11:58 Matt Fleming [this message]
2016-09-23 14:30 ` [PATCH] sched/fair: Do not decay new task load on first enqueue Vincent Guittot
2016-09-27 13:48   ` Dietmar Eggemann
2016-09-27 19:24     ` Matt Fleming
2016-09-27 19:21   ` Matt Fleming
2016-09-28 10:14 ` Peter Zijlstra
2016-09-28 11:06   ` Dietmar Eggemann
2016-09-28 11:19     ` Peter Zijlstra
2016-09-28 11:31       ` Dietmar Eggemann
2016-09-28 11:46         ` Vincent Guittot
2016-09-28 12:00           ` Vincent Guittot
2016-10-04 21:25             ` Matt Fleming
2016-10-04 20:16           ` Matt Fleming
2016-09-28 12:27         ` Vincent Guittot
2016-09-28 13:13           ` Vincent Guittot
2016-09-29 16:15             ` Dietmar Eggemann
2016-10-03 13:05               ` Vincent Guittot
2016-09-28 17:59       ` Dietmar Eggemann
2016-09-28 19:37   ` Matt Fleming
2016-09-30 20:30     ` Matt Fleming
2016-10-09  3:39     ` Wanpeng Li
2016-10-10 10:01       ` Matt Fleming
2016-10-10 10:09         ` Wanpeng Li
2016-10-11 10:27           ` Matt Fleming
2016-10-10 12:29         ` Vincent Guittot
2016-10-10 13:54           ` Dietmar Eggemann
2016-10-10 18:29             ` Vincent Guittot
2016-10-11  9:44               ` Dietmar Eggemann
2016-10-11 10:39                 ` Matt Fleming
2016-10-18 10:11                   ` Matt Fleming
2016-10-10 17:34           ` Vincent Guittot
2016-10-11 10:24             ` Matt Fleming
2016-10-11 13:14               ` Vincent Guittot
2016-10-11 18:57                 ` Matt Fleming
2016-10-12  7:41                   ` Vincent Guittot
2016-10-18 11:09                     ` Peter Zijlstra
2016-10-18 15:19                       ` Vincent Guittot
2016-10-18 10:29               ` Matt Fleming
2016-10-18 11:10                 ` Peter Zijlstra
2016-10-18 11:29                   ` Matt Fleming
2016-10-18 12:15                     ` Peter Zijlstra
2016-10-19  6:38                       ` Vincent Guittot
2016-10-19  9:53                         ` Peter Zijlstra
2016-11-09 16:53                           ` Vincent Guittot
2016-10-04 20:11   ` Matt Fleming
2016-10-09  5:57 ` [lkp] [sched/fair] f54c5d4e28: hackbench.throughput 10.6% improvement kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160923115808.2330-1-matt@codeblueprint.co.uk \
    --to=matt@codeblueprint.co.uk \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).