From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752783AbcI1KOh (ORCPT ); Wed, 28 Sep 2016 06:14:37 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:42853 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752096AbcI1KO2 (ORCPT ); Wed, 28 Sep 2016 06:14:28 -0400 Date: Wed, 28 Sep 2016 12:14:22 +0200 From: Peter Zijlstra To: Matt Fleming Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Mike Galbraith , Yuyang Du , Vincent Guittot , Dietmar Eggemann Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue Message-ID: <20160928101422.GR5016@twins.programming.kicks-ass.net> References: <20160923115808.2330-1-matt@codeblueprint.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160923115808.2330-1-matt@codeblueprint.co.uk> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8fb4d1942c14..4a2d3ff772f8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > int migrated, decayed; > > migrated = !sa->last_update_time; > - if (!migrated) { > + if (!migrated && se->sum_exec_runtime) { > __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, > se->on_rq * scale_load_down(se->load.weight), > cfs_rq->curr == se, NULL); Hrmm,.. so I see the problem, but I think we're working around it. So the problem is that time moves between wake_up_new_task() doing post_init_entity_util_avg(), which attaches us to the cfs_rq, and activate_task() which enqueues us. Part of the problem is that we do not in fact seem to do update_rq_clock() before post_init_entity_util_avg(), which makes the delta larger than it should be. The other problem is that activate_task()->enqueue_task() does do update_rq_clock() (again, after fixing), creating the delta. Which suggests we do something like the below (not compile tested or anything, also I ran out of tea again). While staring at this, I don't think we can still hit vruntime_normalized() with a new task, so I _think_ we can remove that !se->sum_exec_runtime clause there (and rejoice), no? --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7e7463aa399a..cc59bd4ab809 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -754,9 +754,16 @@ static void set_load_weight(struct task_struct *p) static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { - update_rq_clock(rq); + /* + * For ENQUEUE_RESTORE, DEQUEUE_SAVE will have updated the rq-clock, + * for ENQUEUE_NEW wake_up_new_task() will have. + */ + if (!(flags & (ENQUEUE_RESTORE | ENQUEUE_NEW))) + update_rq_clock(rq); + if (!(flags & ENQUEUE_RESTORE)) sched_info_queued(rq, p); + p->sched_class->enqueue_task(rq, p, flags); } @@ -2577,9 +2584,11 @@ void wake_up_new_task(struct task_struct *p) __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); #endif rq = __task_rq_lock(p, &rf); + + update_rq_clock(rq); post_init_entity_util_avg(&p->se); + activate_task(rq, p, ENQUEUE_NEW); - activate_task(rq, p, 0); p->on_rq = TASK_ON_RQ_QUEUED; trace_sched_wakeup_new(p); check_preempt_curr(rq, p, WF_FORK); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7c7e5745038b..3982d7dc9bff 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1193,6 +1193,7 @@ extern const u32 sched_prio_to_wmult[40]; #else #define ENQUEUE_MIGRATED 0x00 #endif +#define ENQUEUE_NEW 0x40 #define RETRY_TASK ((void *)-1UL)