From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753135AbcI1LTQ (ORCPT ); Wed, 28 Sep 2016 07:19:16 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:53011 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752117AbcI1LTQ (ORCPT ); Wed, 28 Sep 2016 07:19:16 -0400 Date: Wed, 28 Sep 2016 13:19:12 +0200 From: Peter Zijlstra To: Dietmar Eggemann Cc: Matt Fleming , Ingo Molnar , linux-kernel@vger.kernel.org, Mike Galbraith , Yuyang Du , Vincent Guittot Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue Message-ID: <20160928111912.GU5016@twins.programming.kicks-ass.net> References: <20160923115808.2330-1-matt@codeblueprint.co.uk> <20160928101422.GR5016@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 28, 2016 at 12:06:43PM +0100, Dietmar Eggemann wrote: > On 28/09/16 11:14, Peter Zijlstra wrote: > > On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote: > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index 8fb4d1942c14..4a2d3ff772f8 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) > >> int migrated, decayed; > >> > >> migrated = !sa->last_update_time; > >> - if (!migrated) { > >> + if (!migrated && se->sum_exec_runtime) { > >> __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, > >> se->on_rq * scale_load_down(se->load.weight), > >> cfs_rq->curr == se, NULL); > > > > > > Hrmm,.. so I see the problem, but I think we're working around it. > > > > So the problem is that time moves between wake_up_new_task() doing > > post_init_entity_util_avg(), which attaches us to the cfs_rq, and > > activate_task() which enqueues us. > > > > Part of the problem is that we do not in fact seem to do > > update_rq_clock() before post_init_entity_util_avg(), which makes the > > delta larger than it should be. > > Yes, this is what I see as well. I always thought that the update is > done in task_fork_fair() so it's bounded but as I know now, this update > is only for the waker. In case the cpu was idle before the delta can be > pretty big. > > > The other problem is that activate_task()->enqueue_task() does do > > update_rq_clock() (again, after fixing), creating the delta. > > Not sure what you mean by 'after fixing' but the se is initialized with > a possibly stale 'now' value in post_init_entity_util_avg()-> > attach_entity_load_avg() before the clock is updated in > activate_task()->enqueue_task(). I meant that after I fix the above issue of calling post_init with a stale clock. So the + update_rq_clock(rq) in the patch. > > Which suggests we do something like the below (not compile tested or > > anything, also I ran out of tea again). > > I'll give it a try. Plenty of coffee here ... > > > > > While staring at this, I don't think we can still hit > > vruntime_normalized() with a new task, so I _think_ we can remove that > > !se->sum_exec_runtime clause there (and rejoice), no? > > I'm afraid that with accurate timing we will get the same situation that > we add and subtract the same amount of load (probably 1024 now and not > 1002 (or less)) to/from cfs_rq->runnable_load_avg for the initial (fork) > hackbench run. > After all, it's 'runnable' based. The idea was that since we now update rq clock before post_init and then leave it be, both post_init and enqueue see the exact same timestamp, and the delta is 0, resulting in no aging. Or did I fail to make that happen?