All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luca Abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>,
	mingo@redhat.com, rjw@rjwysocki.net, viresh.kumar@linaro.org,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	tglx@linutronix.de, vincent.guittot@linaro.org,
	rostedt@goodmis.org, claudio@evidence.eu.com,
	tommaso.cucinotta@santannapisa.it, bristot@redhat.com,
	mathieu.poirier@linaro.org, tkjos@android.com, joelaf@google.com,
	andresoportus@google.com, morten.rasmussen@arm.com,
	dietmar.eggemann@arm.com, patrick.bellasi@arm.com,
	Ingo Molnar <mingo@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
Date: Tue, 25 Jul 2017 09:03:08 +0200	[thread overview]
Message-ID: <20170725090308.2cca53c0@luca> (raw)
In-Reply-To: <20170724164349.clzsajrwxtobyqkm@hirez.programming.kicks-ass.net>

Hi Peter,

On Mon, 24 Jul 2017 18:43:49 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote:
> > On 19/07/17 13:00, Peter Zijlstra wrote:  
> > > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:  
> > > > On 19/07/17 09:21, Peter Zijlstra wrote:  
> > > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:  
> > > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > > > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > > > > >  		return;
> > > > > >  
> > > > > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > > > -	dl_se->runtime -= delta_exec;
> > > > > > +	/*
> > > > > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > > > > +	 *
> > > > > > +	 * For the others, we still need to scale reservation parameters
> > > > > > +	 * according to current frequency and CPU maximum capacity.
> > > > > > +	 */
> > > > > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > > > > +						 rq,
> > > > > > +						 &curr->dl);
> > > > > > +	} else {
> > > > > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > > > +
> > > > > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > > > +	}
> > > > > > +
> > > > > > +	dl_se->runtime -= scaled_delta_exec;
> > > > > >    
> > > > > 
> > > > > This I don't get...   
> > > > 
> > > > 
> > > > Considering that we use GRUB's active utilization to drive clock
> > > > frequency selection, rationale is that GRUB tasks don't need any special
> > > > scaling, as their delta_exec is already scaled according to GRUB rules.
> > > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > > > scaled considering current frequency (and CPU max capacity), otherwise
> > > > they are going to receive less runtime than granted at AC, when
> > > > frequency is reduced.  
> > > 
> > > I don't think that quite works out. Given that the frequency selection
> > > will never quite end up at exactly the same fraction (if the hardware
> > > listens to your requests at all).
> > >   
> > 
> > It's an approximation yes (how big it depends on the granularity of the
> > available frequencies). But, for the !GRUB tasks it should be OK, as we
> > always select a frequency (among the available ones) bigger than the
> > current active utilization.
> > 
> > Also, for platforms/archs that don't redefine arch_scale_* this is not
> > used. In case they are defined instead the assumption is that either hw
> > listens to requests or scaling factors can be derived in some other ways
> > (avgs?).
> >   
> > > Also, by not scaling the GRUB stuff, don't you run the risk of
> > > attempting to hand out more idle time than there actually is?  
> > 
> > The way I understand it is that for GRUB tasks we always scale
> > considering the "correct" factor. Then frequency could be higher, but
> > this spare idle time will be reclaimed by other GRUB tasks.  
> 
> I'm still confused..
> 
> So GRUB does:
> 
> 	dq = Uact -dt
> 
> right?

Right. This is what the original (single processor) GRUB did. And this
was used by the "GRUB-PA" algorithm:
https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf

(basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU
frequency based on Uact)


> Now, you do DVFS using that same Uact. If we lower the clock, we need
> more time, so would we then not end up with something like:
> 
> 	dq = 1/Uact -dt

Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to
give more runtime to the task... Since Uact is < 1, doing
	dq = - Uact * dt
means that we decrease the current runtime by a smaller amount of time.
And so we end up giving more runtime to the task: instead of giving 
dl_runtime every dl_period, we give "dl_runtime / Uact" every
dl_period... And since the CPU is slower (by a ratio Uact), this is
equivalent to giving dl_runtime at the maximum CPU speed / frequency
(at least, in theory :).


> After all; our budget assignment is such that we're able to complete
> our work at max freq. Therefore, when we lower the frequency, we'll have
> to increase budget pro rata, otherwise we'll not complete our work and
> badness happens.

Right. But instead of increasing dl_runtime, GRUB-PA decreases the
amount of time accounted to the current runtime.


> Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
> double the time to complete.
> 
> Now, if we fold these two together, you'd get:
> 
> 	dq = Uact/Uact -dt = -dt

Not sure why " / Uact"... According to the GRUB-PA algorithm, you just
do
	dq = - Uact * dt = -0.5dt
and you end up giving the CPU to the task for 2 * dl_runtime every
dl_period (as expected)

> Because, after all, if we lowered the clock to consume all idle time,
> there's no further idle time to reclaim.

The idea of GRUB-PA is that you do not change dl_runtime... So, the
task is still "nominally reserved" dl_runtime every dl_period (in
this example, 1/2*dl_period every dl_period)... It is the reclaiming
mechanism that allows the task to execute for dl_runtime/Uact every
dl_period (in this example, for dl_period every dl_period, so it
reclaims 1/2dl_period every dl_period).


> Now, of course, our DVFS assignment isn't as precise nor as
> deterministic as this, so we'll get a slightly different ratio, lets
> call that Udvfs.

This is where GRUB-PA starts to have issues... :)
The implicit assumption is (I think) that is the DVFS mechanism cannot
assign exactly the requested frequency then it makes a "conservative"
assignment (assigning a frequency higher than the requested one).


> So would then not GRUB change into something like:
> 
> 	dq = Uact/Udvfs -dt
> 
> Allowing it to only reclaim that idle time that exists because our DVFS
> level is strictly higher than required?

I think GRUB should still do "dq = -Uact * dt", trying to reclaim all
the idle CPU time (up to the configured limit, of course).



				Luca

> 
> This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it
> to reclaim just the additional 100Mhz of idle time.
> 
> 
> Or am I completely off the rails now?

  reply	other threads:[~2017-07-25  7:03 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
2017-07-07  7:20   ` Viresh Kumar
2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
2017-07-07  7:21   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
2017-07-07  3:56   ` Joel Fernandes
2017-07-07  3:56     ` Joel Fernandes
2017-07-07 10:43     ` Juri Lelli
2017-07-07 10:43       ` Juri Lelli
2017-07-07 10:46       ` Thomas Gleixner
2017-07-07 10:46         ` Thomas Gleixner
2017-07-07 10:53         ` Juri Lelli
2017-07-07 10:53           ` Juri Lelli
2017-07-07 13:11           ` Rafael J. Wysocki
2017-07-07 13:11             ` Rafael J. Wysocki
2017-07-07 21:58             ` Steven Rostedt
2017-07-07 21:58               ` Steven Rostedt
2017-07-07 22:07               ` Joel Fernandes
2017-07-07 22:07                 ` Joel Fernandes
2017-07-07 22:15                 ` Steven Rostedt
2017-07-07 22:15                   ` Steven Rostedt
2017-07-07 22:57                   ` Joel Fernandes
2017-07-07 22:57                     ` Joel Fernandes
2017-07-11 12:37     ` Peter Zijlstra
2017-07-11 12:37       ` Peter Zijlstra
2017-07-07  7:21   ` Viresh Kumar
2017-07-11 16:18   ` Peter Zijlstra
2017-07-11 17:02     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
2017-07-07  3:26   ` Joel Fernandes
2017-07-07  3:26     ` Joel Fernandes
2017-07-07  8:58   ` Viresh Kumar
2017-07-07 10:59     ` Juri Lelli
2017-07-10  7:46       ` Joel Fernandes
2017-07-10  7:46         ` Joel Fernandes
2017-07-10  7:05   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
2017-07-07  8:59   ` Viresh Kumar
2017-07-11 16:17   ` Peter Zijlstra
2017-07-11 17:18     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
2017-07-07 22:04   ` Steven Rostedt
2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
2017-07-19  7:21   ` Peter Zijlstra
2017-07-19  9:20     ` Juri Lelli
2017-07-19 11:00       ` Peter Zijlstra
2017-07-19 11:16         ` Juri Lelli
2017-07-24 16:43           ` Peter Zijlstra
2017-07-25  7:03             ` Luca Abeni [this message]
2017-07-25 13:51               ` Peter Zijlstra
2017-07-26 13:50                 ` luca abeni
2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
2017-07-06 16:08   ` Juri Lelli
2017-07-06 16:15   ` Peter Zijlstra
2017-07-06 21:08 ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725090308.2cca53c0@luca \
    --to=luca.abeni@santannapisa.it \
    --cc=andresoportus@google.com \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tkjos@android.com \
    --cc=tommaso.cucinotta@santannapisa.it \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.