linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Do not decay new task load on first enqueue
@ 2016-09-23 11:58 Matt Fleming
  2016-09-23 14:30 ` Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Matt Fleming @ 2016-09-23 11:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Matt Fleming, Mike Galbraith, Yuyang Du,
	Vincent Guittot, Dietmar Eggemann

Since commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new
tasks") ::last_update_time will be set to a non-zero value in
post_init_entity_util_avg(), which leads to p->se.avg.load_avg being
decayed on enqueue before the task has even had a chance to run.

For a NICE_0 task the sequence of events leading up to this with
example load average changes might be,

  sched_fork()
    init_entity_runnable_average()
      p->se.avg.load_avg = scale_load_down(se->load.weight);	// 1024

  wake_up_new_task()
    post_init_entity_util_avg()
      attach_entity_load_avg()
        p->se.last_update_time = cfs_rq->avg.last_update_time;

    activate_task()
      enqueue_task()
        ...
          enqueue_entity_load_avg()
            migrated = !sa->last_update_time			// false
            if (!migrated)
                    __update_load_avg()
                      p->se.avg.load_avg = 1002

This causes a performance regression for fork intensive workloads like
hackbench. When balancing on fork we can end up picking the same CPU
to enqueue on over and over. This leads to huge congestion when trying
to simultaneously wake up tasks that are all on the same runqueue, and
causes lots of migrations on wake up.

The behaviour since commit 7dc603c9028e essentially defeats the
scheduler's attempt to balance on fork(). Before, ::runnable_load_avg
likely had a non-zero value when the hackbench tasks were dequeued
(the fork()'d tasks immediately block reading on pipe/socket) but now
the load balancer sees the CPU as having no runnable load.

Arguably the real problem is that balancing on fork doesn't look at
the blocked contribution of tasks, only the runnable load and it's
possible for the two metrics to be wildly different on a relatively
idle system.

But it still doesn't seem quite right to update a task's load_avg
before it runs for the first time.

Here are the results of running hackbench before 7dc603c9028e (old
behaviour), with 7dc603c9028e applied (exiting behaviour), and after
7dc603c9028e with this patch on top (new behaviour),

hackbench-process-sockets

                         4.7.0-rc5             4.7.0-rc5             4.7.0-rc5
                            before          7dc603c9028e                 after
Amean    1        0.0611 (  0.00%)      0.0693 (-13.32%)      0.0600 (  1.87%)
Amean    4        0.1777 (  0.00%)      0.1730 (  2.65%)      0.1790 ( -0.72%)
Amean    7        0.2771 (  0.00%)      0.2816 ( -1.60%)      0.2741 (  1.08%)
Amean    12       0.3851 (  0.00%)      0.4167 ( -8.20%)      0.3751 (  2.60%)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8fb4d1942c14..4a2d3ff772f8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	int migrated, decayed;
 
 	migrated = !sa->last_update_time;
-	if (!migrated) {
+	if (!migrated && se->sum_exec_runtime) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-11-09 16:54 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-23 11:58 [PATCH] sched/fair: Do not decay new task load on first enqueue Matt Fleming
2016-09-23 14:30 ` Vincent Guittot
2016-09-27 13:48   ` Dietmar Eggemann
2016-09-27 19:24     ` Matt Fleming
2016-09-27 19:21   ` Matt Fleming
2016-09-28 10:14 ` Peter Zijlstra
2016-09-28 11:06   ` Dietmar Eggemann
2016-09-28 11:19     ` Peter Zijlstra
2016-09-28 11:31       ` Dietmar Eggemann
2016-09-28 11:46         ` Vincent Guittot
2016-09-28 12:00           ` Vincent Guittot
2016-10-04 21:25             ` Matt Fleming
2016-10-04 20:16           ` Matt Fleming
2016-09-28 12:27         ` Vincent Guittot
2016-09-28 13:13           ` Vincent Guittot
2016-09-29 16:15             ` Dietmar Eggemann
2016-10-03 13:05               ` Vincent Guittot
2016-09-28 17:59       ` Dietmar Eggemann
2016-09-28 19:37   ` Matt Fleming
2016-09-30 20:30     ` Matt Fleming
2016-10-09  3:39     ` Wanpeng Li
2016-10-10 10:01       ` Matt Fleming
2016-10-10 10:09         ` Wanpeng Li
2016-10-11 10:27           ` Matt Fleming
2016-10-10 12:29         ` Vincent Guittot
2016-10-10 13:54           ` Dietmar Eggemann
2016-10-10 18:29             ` Vincent Guittot
2016-10-11  9:44               ` Dietmar Eggemann
2016-10-11 10:39                 ` Matt Fleming
2016-10-18 10:11                   ` Matt Fleming
2016-10-10 17:34           ` Vincent Guittot
2016-10-11 10:24             ` Matt Fleming
2016-10-11 13:14               ` Vincent Guittot
2016-10-11 18:57                 ` Matt Fleming
2016-10-12  7:41                   ` Vincent Guittot
2016-10-18 11:09                     ` Peter Zijlstra
2016-10-18 15:19                       ` Vincent Guittot
2016-10-18 10:29               ` Matt Fleming
2016-10-18 11:10                 ` Peter Zijlstra
2016-10-18 11:29                   ` Matt Fleming
2016-10-18 12:15                     ` Peter Zijlstra
2016-10-19  6:38                       ` Vincent Guittot
2016-10-19  9:53                         ` Peter Zijlstra
2016-11-09 16:53                           ` Vincent Guittot
2016-10-04 20:11   ` Matt Fleming
2016-10-09  5:57 ` [lkp] [sched/fair] f54c5d4e28: hackbench.throughput 10.6% improvement kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).