Re: [PATCH] sched/fair: update scale invariance of pelt

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yuyang Du <yuyang.du@intel.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: peterz@infradead.org, mingo@kernel.org,
	linux-kernel@vger.kernel.org, Morten.Rasmussen@arm.com,
	linaro-kernel@lists.linaro.org, dietmar.eggemann@arm.com,
	pjt@google.com, bsegall@google.com
Subject: Re: [PATCH] sched/fair: update scale invariance of pelt
Date: Mon, 14 Dec 2015 08:26:58 +0800	[thread overview]
Message-ID: <20151214002658.GD28098@intel.com> (raw)
In-Reply-To: <1448372970-8764-1-git-send-email-vincent.guittot@linaro.org>

Hi Vincent,

I don't quite catch what this is doing, maybe I need more time
to ramp up to the gory detail difficult like this.

Do you scale or not scale? You seem removed the scaling, but added it
after "Remainder of delta accrued against u_0"..

Thanks,
Yuyang

On Tue, Nov 24, 2015 at 02:49:30PM +0100, Vincent Guittot wrote:
> The current implementation of load tracking invariance scales the load
> tracking value with current frequency and uarch performance (only for
> utilization) of the CPU.
> 
> One main result of the current formula is that the figures are capped by
> the current capacity of the CPU. This limitation is the main reason of not
> including the uarch invariance (arch_scale_cpu_capacity) in the calculation
> of load_avg because capping the load can generate erroneous system load
> statistic as described with this example [1]
> 
> Instead of scaling the complete value of PELT algo, we should only scale
> the running time by the current capacity of the CPU. It seems more correct
> to only scale the running time because the non running time of a task
> (sleeping or waiting for a runqueue) is the same whatever the current freq
> and the compute capacity of the CPU.
> 
> Then, one main advantage of this change is that the load of a task can
> reach max value whatever the current freq and the uarch of the CPU on which
> it run. It will just take more time at a lower freq than a max freq or on a
> "little" CPU compared to a "big" one. The load and the utilization stay
> invariant across system so we can still compared them between CPU but with
> a wider range of values.
> 
> With this change, we don't have to test if a CPU is overloaded or not in
> order to use one metric (util) or another (load) as all metrics are always
> valid.
> 
> I have put below some examples of duration to reach some typical load value
> according to the capacity of the CPU with current implementation
> and with this patch. 
> 
> Util (%)     max capacity  half capacity(mainline)  half capacity(w/ patch)
> 972 (95%)    138ms	   not reachable	    276ms
> 486 (47.5%)  30ms	   138ms		     60ms
> 256 (25%)    13ms	    32ms		     26ms
> 
> We can see that at half capacity, we need twice the duration of max
> capacity with this patch whereas we have a non linear increase of the
> duration with current implementation.
> 
> [1] https://lkml.org/lkml/2014/12/18/128
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 824aa9f..f2a18e1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2560,10 +2560,9 @@ static __always_inline int
>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
>  {
> -	u64 delta, scaled_delta, periods;
> +	u64 delta, periods;
>  	u32 contrib;
> -	unsigned int delta_w, scaled_delta_w, decayed = 0;
> -	unsigned long scale_freq, scale_cpu;
> +	unsigned int delta_w, decayed = 0;
>  
>  	delta = now - sa->last_update_time;
>  	/*
> @@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		return 0;
>  	sa->last_update_time = now;
>  
> -	scale_freq = arch_scale_freq_capacity(NULL, cpu);
> -	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> +	if (running) {
> +		delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu));
> +		delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu));
> +	}
>  
>  	/* delta_w is the amount already accumulated against our next period */
>  	delta_w = sa->period_contrib;
> @@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		 * period and accrue it.
>  		 */
>  		delta_w = 1024 - delta_w;
> -		scaled_delta_w = cap_scale(delta_w, scale_freq);
>  		if (weight) {
> -			sa->load_sum += weight * scaled_delta_w;
> +			sa->load_sum += weight * delta_w;
>  			if (cfs_rq) {
>  				cfs_rq->runnable_load_sum +=
> -						weight * scaled_delta_w;
> +						weight * delta_w;
>  			}
>  		}
>  		if (running)
> -			sa->util_sum += scaled_delta_w * scale_cpu;
> +			sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT;
>  
>  		delta -= delta_w;
>  
> @@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  
>  		/* Efficiently calculate \sum (1..n_period) 1024*y^i */
>  		contrib = __compute_runnable_contrib(periods);
> -		contrib = cap_scale(contrib, scale_freq);
>  		if (weight) {
>  			sa->load_sum += weight * contrib;
>  			if (cfs_rq)
>  				cfs_rq->runnable_load_sum += weight * contrib;
>  		}
>  		if (running)
> -			sa->util_sum += contrib * scale_cpu;
> +			sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
>  	}
>  
>  	/* Remainder of delta accrued against u_0` */
> -	scaled_delta = cap_scale(delta, scale_freq);
>  	if (weight) {
> -		sa->load_sum += weight * scaled_delta;
> +		sa->load_sum += weight * delta;
>  		if (cfs_rq)
> -			cfs_rq->runnable_load_sum += weight * scaled_delta;
> +			cfs_rq->runnable_load_sum += weight * delta;
>  	}
>  	if (running)
> -		sa->util_sum += scaled_delta * scale_cpu;
> +		sa->util_sum += delta << SCHED_CAPACITY_SHIFT;
>  
>  	sa->period_contrib += delta;
>  
> -- 
> 1.9.1

next prev parent reply	other threads:[~2015-12-14  8:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 13:49 [PATCH] sched/fair: update scale invariance of pelt Vincent Guittot
2015-11-25  9:24 ` Peter Zijlstra
2015-11-25 11:25   ` Vincent Guittot
2015-12-08 17:04 ` Morten Rasmussen
2015-12-15 10:18   ` Vincent Guittot
2015-12-14  0:26 ` Yuyang Du [this message]
2015-12-15 10:21   ` Vincent Guittot
  -- strict thread matches above, loose matches on Subject: below --
2017-03-28 15:35 [PATCH] sched/fair: update scale invariance of PELT Vincent Guittot
2017-03-29  8:05 ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151214002658.GD28098@intel.com \
    --to=yuyang.du@intel.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox