From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751720AbcEJJrC (ORCPT <rfc822;w@1wt.eu>);
	Tue, 10 May 2016 05:47:02 -0400
Received: from mga09.intel.com ([134.134.136.24]:61239 "EHLO mga09.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751370AbcEJJrA (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 10 May 2016 05:47:00 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,604,1455004800"; 
   d="scan'208";a="962548385"
Date: Tue, 10 May 2016 10:05:27 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org,
        bsegall@google.com, pjt@google.com, vincent.guittot@linaro.org,
        dietmar.eggemann@arm.com, juri.lelli@arm.com
Subject: Re: [PATCH v3 03/12] sched/fair: Change the variable to hold the
 number of periods to 32bit
Message-ID: <20160510020527.GS16093@intel.com>
References: <1462305773-7832-1-git-send-email-yuyang.du@intel.com>
 <1462305773-7832-4-git-send-email-yuyang.du@intel.com>
 <20160505111308.GA22310@e105550-lin.cambridge.arm.com>
 <20160505181932.GI16093@intel.com>
 <20160510091020.GA26895@e105550-lin.cambridge.arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160510091020.GA26895@e105550-lin.cambridge.arm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 10, 2016 at 10:10:21AM +0100, Morten Rasmussen wrote:

> > > > 2. If m < 32*64, the chance to be here is very very low, but if so,
> > > 
> > > Should that be: n < 32*64 ?

Sorry, I overlooked this comment. Yes, it should be n < 32*64.

> > > 
> > > Talking about 32*64, I don't get why we don't use LOAD_AVG_MAX_N. I had
> > > a patch ready to post for that:
> > >
> > > >From 5055e5f82c8d207880035c2ec4ecf1ac1e7f9e91 Mon Sep 17 00:00:00 2001
> > > From: Morten Rasmussen <morten.rasmussen@arm.com>
> > > Date: Mon, 11 Apr 2016 15:41:37 +0100
> > > Subject: [PATCH] sched/fair: Fix decay to zero period in decay_load()
> > > 
> > > In __compute_runnable_contrib() we are happy with returning LOAD_AVG_MAX
> > > when the update period n >= LOAD_AVG_MAX_N (=345), so we should be happy
> > > with returning zero for n >= LOAD_AVG_MAX_N when decaying in
> > > decay_load() as well instead of only returning zero for n >
> > > LOAD_AVG_PERIOD * 63 (=2016).
> > 
> > So basically, you want to add another rule in addition to the exponential
> > decay rule.
> 
> No, not at all. I want to make the 'rules' symmetrical for accumulation
> and decay exactly like the patch does.

"Make the rule xxx" != change rule or add rule?
 
> > > >    the task's sched avgs MAY NOT be decayed to 0, depending on how
> > > >    big its sums are, and the chance to 0 is still good as load_sum
> > > >    is way less than ~0ULL and util_sum way less than ~0U.
> > > 
> > > I don't get the last bit about load_sum < ~0ULL and util_sum < ~0U.
> > > Whether you get to zero depends on the sums (as you say) and the actual
> > > value of 'n'. It is true that you might get to zero even if n <
> > > LOAD_AVG_MAX_N if the sums are small.
> >  
> > Frankly, util Ben brought it up, I didn't think a task sleeping so long
> > is even possible. But I do admit it may happen.
> > 
> > However, I will say this. A task sleeping so long is already very rare,
> > and among all those long sleep cases, the chance that after waking up the
> > avgs will not be decayed to zero is much less than 0.5 in a million
> > (=32*64/2^32=1/2^21), assuming the sleeping time is uniformly distributed.
> 
> You can't just ignore cases because they have a low probability. Going
> by that logic we could remove a lot of synchronization overhead in the
> kernel.
> 
> My concern is whether we introduce any assumptions that might hit us
> later when everybody has forgotten about them. This one would be
> extremely hard to debug later.

_NO_, that was just saying chance is very low. No any intent to say nor did I
say low chance doesn't matter. That was why I said the following paragraph.
 
> > > > Nevertheless, what really maters is what happens in the worst-case
> > > > scenario, which is when (u32)m = 0? So in that case, it would be like
> > > > after so long a sleep, we treat the task as it never slept, and has the
> > > > same sched averages as before it slept. That is actually not bad or
> > > > nothing to worry about, and think of it as the same as how we treat
> > > > a new born task.
> > > 
> > > There is subtle but important difference between not decaying a task
> > > correctly and adding new task: The sleeping task is already accounted
> > > for in the cfs_rq.avg.{load,util}_sum. The sleeping task's contribution
> > > to cfs_rq.avg has been decayed correctly in the mean time which means
> > > that by not guaranteeing a decay of the se.avg at wake-up you introduce
> > > a discrepancy between the task's owen view of its contribution (se.avg)
> > > and the cfs_rq view (cfs_rq.avg). That might lead to trouble if the task
> > > is migrated at wake-up as we remove se.avg amount of contribution from
> > > the previous cpu's cfs_rq.avg. In other words, you remove too much from
> > > the cfs_rq.avg.
> > > 
> > > The discrepancy will decay and disappear after LOAD_AVG_MAX_N ms, which
> > > might be acceptable, but it is not a totally harmless change IMHO.
> > 
> > That is just an explanation, :) nevermind, or I really don't think that is
> > a deal.
> 
> I don't really follow. I analysed the implications of the overflow that
> you are introducing. In my opinion this is what you should have done
> before proposing this patch. I think it is essential to understand what
> assumptions we make and introducing new ones should be carefully
> considered. I think it is a big 'deal' if you are not more careful when you
> are submitting patches and just ignore feedback. We spent a lot of time
> reviewing them.

So I agree "it is not a totally harmless change". But what is the deal/impact
of the harm? The harm in the worse case scenario will not hurt anything, IMHO,
and just an opinion.

Thank you very much for the rewiew. Really appreciate it.