From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753135AbcI1LTQ (ORCPT <rfc822;w@1wt.eu>);
        Wed, 28 Sep 2016 07:19:16 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:53011 "EHLO
        bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752117AbcI1LTQ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 28 Sep 2016 07:19:16 -0400
Date: Wed, 28 Sep 2016 13:19:12 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>,
        Ingo Molnar <mingo@kernel.org>, linux-kernel@vger.kernel.org,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Yuyang Du <yuyang.du@intel.com>,
        Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
Message-ID: <20160928111912.GU5016@twins.programming.kicks-ass.net>
References: <20160923115808.2330-1-matt@codeblueprint.co.uk>
 <20160928101422.GR5016@twins.programming.kicks-ass.net>
 <dba1266f-6e48-ff3d-379e-fd81545fcdec@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <dba1266f-6e48-ff3d-379e-fd81545fcdec@arm.com>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Sep 28, 2016 at 12:06:43PM +0100, Dietmar Eggemann wrote:
> On 28/09/16 11:14, Peter Zijlstra wrote:
> > On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote:
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 8fb4d1942c14..4a2d3ff772f8 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >>  	int migrated, decayed;
> >>  
> >>  	migrated = !sa->last_update_time;
> >> -	if (!migrated) {
> >> +	if (!migrated && se->sum_exec_runtime) {
> >>  		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> >>  			se->on_rq * scale_load_down(se->load.weight),
> >>  			cfs_rq->curr == se, NULL);
> > 
> > 
> > Hrmm,.. so I see the problem, but I think we're working around it.
> > 
> > So the problem is that time moves between wake_up_new_task() doing
> > post_init_entity_util_avg(), which attaches us to the cfs_rq, and
> > activate_task() which enqueues us.
> > 
> > Part of the problem is that we do not in fact seem to do
> > update_rq_clock() before post_init_entity_util_avg(), which makes the
> > delta larger than it should be.
> 
> Yes, this is what I see as well. I always thought that the update is
> done in task_fork_fair() so it's bounded but as I know now, this update
> is only for the waker. In case the cpu was idle before the delta can be
> pretty big.
> 
> > The other problem is that activate_task()->enqueue_task() does do
> > update_rq_clock() (again, after fixing), creating the delta.
> 
> Not sure what you mean by 'after fixing' but the se is initialized with
> a possibly stale 'now' value in post_init_entity_util_avg()->
> attach_entity_load_avg() before the clock is updated in
> activate_task()->enqueue_task().

I meant that after I fix the above issue of calling post_init with a
stale clock. So the + update_rq_clock(rq) in the patch.

> > Which suggests we do something like the below (not compile tested or
> > anything, also I ran out of tea again).
> 
> I'll give it a try. Plenty of coffee here ...
> 
> > 
> > While staring at this, I don't think we can still hit
> > vruntime_normalized() with a new task, so I _think_ we can remove that
> > !se->sum_exec_runtime clause there (and rejoice), no?
> 
> I'm afraid that with accurate timing we will get the same situation that
> we add and subtract the same amount of load (probably 1024 now and not
> 1002 (or less)) to/from cfs_rq->runnable_load_avg for the initial (fork)
> hackbench run.
> After all, it's 'runnable' based.

The idea was that since we now update rq clock before post_init and then
leave it be, both post_init and enqueue see the exact same timestamp,
and the delta is 0, resulting in no aging.

Or did I fail to make that happen?