Re: [RFC 08/14] sched/tune: add detailed documentation

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steve Muckle <steve.muckle@linaro.org>
To: Patrick Bellasi <patrick.bellasi@arm.com>,
	Ricky Liang <jcliang@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [RFC 08/14] sched/tune: add detailed documentation
Date: Wed, 9 Sep 2015 13:16:10 -0700	[thread overview]
Message-ID: <55F0938A.9000607@linaro.org> (raw)
In-Reply-To: <20150903091849.GB15649@e105326-lin>

Hi Patrick,

On 09/03/2015 02:18 AM, Patrick Bellasi wrote:
> In my view, one of the main goals of sched-DVFS is actually that to be
> a solid and generic replacement of different CPUFreq governors.
> Being driven by the scheduler, sched-DVFS can exploit information on
> CPU demand of active tasks in order to select the optimal Operating
> Performance Point (OPP) using a "proactive" approach instead of the
> "reactive" approach commonly used by existing governors.

I'd agree that with knowledge of CPU demand on a per-task basis, rather
than the aggregate per-CPU demand that cpufreq governors use today, it
is possible to proactively address changes in CPU demand which result
from task migrations, task creation and exit, etc.

That said I believe setting the OPP based on a particular given
historical profile of task load still relies on a heuristic algorithm of
some sort where there is no single right answer. I am concerned about
whether sched-dvfs and SchedTune, as currently proposed, will support
enough of a range of possible heuristics/policies to effectively replace
the existing cpufreq governors.

The two most popular governors for normal operation in the mobile world:

* ondemand: Samples periodically, CPU usage calculated as simple busy
fraction of last X ms window of time. Goes straight to fmax when load
exceeds up_threshold tunable %, otherwise scales frequency
proportionally with load. Can stay at fmax longer if requested before
re-evaluating by configuring the sampling_down_factor tunable.

* interactive: Samples periodically, CPU usage calculated as simple busy
fraction of last Xms window of time. Goes to an intermediate tunable
freq (hispeed_freq) when load exceeds a user definable threshold
(go_hispeed_load). Otherwise strives to maintain the CPU usage set by
the user in the "target_loads" array. Other knobs that affect behavior
include min_sample_time (min time to spend at a freq before slowing
down) and above_hispeed_delay (allows various delays to further raise
speed above hispeed freq).

It's also worth noting that mobile vendors typically add all sorts of
hacks on top of the existing cpufreq governors which further complicate
policy.

The current proposal:

* sched-dvfs/schedtune: Event driven, CPU usage calculated using
exponential moving average. AFAICS tries to maintain some % of idle
headroom, but if that headroom doesn't exist at task_tick_fair(), goes
to max frequency. Schedtune provides a way to boost/inflate the demand
of individual tasks or overall system demand.

This looks a bit like ondemand to me but without the
sampling_down_factor functionality and using per-entity load tracking
instead of a simple window-based aggregate CPU usage. The interactive
functionality would require additional knobs. I don't think schedtune
will allow for tuning the latency of CPU frequency changes
(min_sample_time, above_hispeed_delay, etc).

A separate but related concern - in the (IMO likely, given the above)
case that folks want to tinker with that policy, it now means they're
hacking the scheduler as opposed to a self-contained frequency policy
plugin.

Another issue with policy (but not specific to this proposal) is that
putting a bunch of it in the CPU frequency selection may derail the
efforts of the EAS algorithm, which I'm still working on digesting.
Perhaps a unified sched/cpufreq policy could go there.

thanks,
Steve

next prev parent reply	other threads:[~2015-09-09 20:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-19 18:47 [RFC PATCH 00/14] sched: Central, scheduler-driven, power-perfomance control Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 01/14] sched/cpufreq_sched: use static key for cpu frequency selection Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 02/14] sched/fair: add triggers for OPP change requests Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 03/14] sched/{core,fair}: trigger OPP change request on fork() Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 04/14] sched/{fair,cpufreq_sched}: add reset_capacity interface Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 05/14] sched/fair: jump to max OPP when crossing UP threshold Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 06/14] sched/cpufreq_sched: modify pcpu_capacity handling Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 07/14] sched/fair: cpufreq_sched triggers for load balancing Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 08/14] sched/tune: add detailed documentation Patrick Bellasi
2015-09-02  6:49   ` [RFC,08/14] " Ricky Liang
2015-09-03  9:18     ` [RFC 08/14] " Patrick Bellasi
2015-09-04  7:59       ` Ricky Liang
2015-09-09 20:16       ` Steve Muckle [this message]
2015-09-11 11:09         ` Patrick Bellasi
2015-09-14 20:00           ` Steve Muckle
2015-09-15 15:00             ` Patrick Bellasi
2015-09-15 15:19               ` Peter Zijlstra
2015-09-16  0:34                 ` Steve Muckle
2015-09-16  7:47                   ` Ingo Molnar
2015-09-15 23:55               ` Steve Muckle
2015-09-16  9:26                 ` Juri Lelli
2015-09-16 13:49                   ` Vincent Guittot
2015-09-16 10:03                 ` Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 09/14] sched/tune: add sysctl interface to define a boost value Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 10/14] sched/fair: add function to convert boost value into "margin" Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 11/14] sched/fair: add boosted CPU usage Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 12/14] sched/tune: add initial support for CGroups based boosting Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 13/14] sched/tune: compute and keep track of per CPU boost value Patrick Bellasi
2015-08-19 18:47 ` [RFC PATCH 14/14] sched/{fair,tune}: track RUNNABLE tasks impact on " Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F0938A.9000607@linaro.org \
    --to=steve.muckle@linaro.org \
    --cc=corbet@lwn.net \
    --cc=jcliang@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).