Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Luca Abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>,
	mingo@redhat.com, rjw@rjwysocki.net, viresh.kumar@linaro.org,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	tglx@linutronix.de, vincent.guittot@linaro.org,
	rostedt@goodmis.org, claudio@evidence.eu.com,
	tommaso.cucinotta@santannapisa.it, bristot@redhat.com,
	mathieu.poirier@linaro.org, tkjos@android.com, joelaf@google.com,
	andresoportus@google.com, morten.rasmussen@arm.com,
	dietmar.eggemann@arm.com, patrick.bellasi@arm.com,
	Ingo Molnar <mingo@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
Date: Tue, 25 Jul 2017 09:03:08 +0200	[thread overview]
Message-ID: <20170725090308.2cca53c0@luca> (raw)
In-Reply-To: <20170724164349.clzsajrwxtobyqkm@hirez.programming.kicks-ass.net>

Hi Peter,

On Mon, 24 Jul 2017 18:43:49 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote:
> > On 19/07/17 13:00, Peter Zijlstra wrote:  
> > > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:  
> > > > On 19/07/17 09:21, Peter Zijlstra wrote:  
> > > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:  
> > > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > > > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > > > > >  		return;
> > > > > >  
> > > > > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > > > -	dl_se->runtime -= delta_exec;
> > > > > > +	/*
> > > > > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > > > > +	 *
> > > > > > +	 * For the others, we still need to scale reservation parameters
> > > > > > +	 * according to current frequency and CPU maximum capacity.
> > > > > > +	 */
> > > > > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > > > > +						 rq,
> > > > > > +						 &curr->dl);
> > > > > > +	} else {
> > > > > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > > > +
> > > > > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > > > +	}
> > > > > > +
> > > > > > +	dl_se->runtime -= scaled_delta_exec;
> > > > > >    
> > > > > 
> > > > > This I don't get...   
> > > > 
> > > > 
> > > > Considering that we use GRUB's active utilization to drive clock
> > > > frequency selection, rationale is that GRUB tasks don't need any special
> > > > scaling, as their delta_exec is already scaled according to GRUB rules.
> > > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > > > scaled considering current frequency (and CPU max capacity), otherwise
> > > > they are going to receive less runtime than granted at AC, when
> > > > frequency is reduced.  
> > > 
> > > I don't think that quite works out. Given that the frequency selection
> > > will never quite end up at exactly the same fraction (if the hardware
> > > listens to your requests at all).
> > >   
> > 
> > It's an approximation yes (how big it depends on the granularity of the
> > available frequencies). But, for the !GRUB tasks it should be OK, as we
> > always select a frequency (among the available ones) bigger than the
> > current active utilization.
> > 
> > Also, for platforms/archs that don't redefine arch_scale_* this is not
> > used. In case they are defined instead the assumption is that either hw
> > listens to requests or scaling factors can be derived in some other ways
> > (avgs?).
> >   
> > > Also, by not scaling the GRUB stuff, don't you run the risk of
> > > attempting to hand out more idle time than there actually is?  
> > 
> > The way I understand it is that for GRUB tasks we always scale
> > considering the "correct" factor. Then frequency could be higher, but
> > this spare idle time will be reclaimed by other GRUB tasks.  
> 
> I'm still confused..
> 
> So GRUB does:
> 
> 	dq = Uact -dt
> 
> right?

Right. This is what the original (single processor) GRUB did. And this
was used by the "GRUB-PA" algorithm:
https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf

(basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU
frequency based on Uact)


> Now, you do DVFS using that same Uact. If we lower the clock, we need
> more time, so would we then not end up with something like:
> 
> 	dq = 1/Uact -dt

Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to
give more runtime to the task... Since Uact is < 1, doing
	dq = - Uact * dt
means that we decrease the current runtime by a smaller amount of time.
And so we end up giving more runtime to the task: instead of giving 
dl_runtime every dl_period, we give "dl_runtime / Uact" every
dl_period... And since the CPU is slower (by a ratio Uact), this is
equivalent to giving dl_runtime at the maximum CPU speed / frequency
(at least, in theory :).


> After all; our budget assignment is such that we're able to complete
> our work at max freq. Therefore, when we lower the frequency, we'll have
> to increase budget pro rata, otherwise we'll not complete our work and
> badness happens.

Right. But instead of increasing dl_runtime, GRUB-PA decreases the
amount of time accounted to the current runtime.


> Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
> double the time to complete.
> 
> Now, if we fold these two together, you'd get:
> 
> 	dq = Uact/Uact -dt = -dt

Not sure why " / Uact"... According to the GRUB-PA algorithm, you just
do
	dq = - Uact * dt = -0.5dt
and you end up giving the CPU to the task for 2 * dl_runtime every
dl_period (as expected)

> Because, after all, if we lowered the clock to consume all idle time,
> there's no further idle time to reclaim.

The idea of GRUB-PA is that you do not change dl_runtime... So, the
task is still "nominally reserved" dl_runtime every dl_period (in
this example, 1/2*dl_period every dl_period)... It is the reclaiming
mechanism that allows the task to execute for dl_runtime/Uact every
dl_period (in this example, for dl_period every dl_period, so it
reclaims 1/2dl_period every dl_period).


> Now, of course, our DVFS assignment isn't as precise nor as
> deterministic as this, so we'll get a slightly different ratio, lets
> call that Udvfs.

This is where GRUB-PA starts to have issues... :)
The implicit assumption is (I think) that is the DVFS mechanism cannot
assign exactly the requested frequency then it makes a "conservative"
assignment (assigning a frequency higher than the requested one).


> So would then not GRUB change into something like:
> 
> 	dq = Uact/Udvfs -dt
> 
> Allowing it to only reclaim that idle time that exists because our DVFS
> level is strictly higher than required?

I think GRUB should still do "dq = -Uact * dt", trying to reclaim all
the idle CPU time (up to the configured limit, of course).



				Luca

> 
> This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it
> to reclaim just the additional 100Mhz of idle time.
> 
> 
> Or am I completely off the rails now?

next prev parent reply	other threads:[~2017-07-25  7:03 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
2017-07-07  7:20   ` Viresh Kumar
2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
2017-07-07  7:21   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
2017-07-07  3:56   ` Joel Fernandes
2017-07-07 10:43     ` Juri Lelli
2017-07-07 10:46       ` Thomas Gleixner
2017-07-07 10:53         ` Juri Lelli
2017-07-07 13:11           ` Rafael J. Wysocki
2017-07-07 21:58             ` Steven Rostedt
2017-07-07 22:07               ` Joel Fernandes
2017-07-07 22:15                 ` Steven Rostedt
2017-07-07 22:57                   ` Joel Fernandes
2017-07-11 12:37     ` Peter Zijlstra
2017-07-07  7:21   ` Viresh Kumar
2017-07-11 16:18   ` Peter Zijlstra
2017-07-11 17:02     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
2017-07-07  3:26   ` Joel Fernandes
2017-07-07  8:58   ` Viresh Kumar
2017-07-07 10:59     ` Juri Lelli
2017-07-10  7:46       ` Joel Fernandes
2017-07-10  7:05   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
2017-07-07  8:59   ` Viresh Kumar
2017-07-11 16:17   ` Peter Zijlstra
2017-07-11 17:18     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
2017-07-07 22:04   ` Steven Rostedt
2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
2017-07-19  7:21   ` Peter Zijlstra
2017-07-19  9:20     ` Juri Lelli
2017-07-19 11:00       ` Peter Zijlstra
2017-07-19 11:16         ` Juri Lelli
2017-07-24 16:43           ` Peter Zijlstra
2017-07-25  7:03             ` Luca Abeni [this message]
2017-07-25 13:51               ` Peter Zijlstra
2017-07-26 13:50                 ` luca abeni
2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
2017-07-06 16:08   ` Juri Lelli
2017-07-06 16:15   ` Peter Zijlstra
2017-07-06 21:08 ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725090308.2cca53c0@luca \
    --to=luca.abeni@santannapisa.it \
    --cc=andresoportus@google.com \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tkjos@android.com \
    --cc=tommaso.cucinotta@santannapisa.it \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).