Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization data from the scheduler

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steve Muckle <steve.muckle@linaro.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Juri Lelli <juri.lelli@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization data from the scheduler
Date: Wed, 2 Mar 2016 19:20:38 -0800	[thread overview]
Message-ID: <56D7AD86.8080702@linaro.org> (raw)
In-Reply-To: <CAJZ5v0hi+RZUkWFGDPpftWxcCP-1v6675FY14j19AoRM=e=13Q@mail.gmail.com>

On 03/01/2016 12:20 PM, Rafael J. Wysocki wrote:
>> I'm specifically worried about the check below where we omit a CPU's
>> capacity request if its last update came before the last sample time.
>>
>> Say there are 2 CPUs in a frequency domain, HZ is 100 and the sample
>> delay here is 4ms.
> 
> Yes, that's the case I clearly didn't take into consideration. :-)
> 
> My assumption was that the sample delay would always be greater than
> the typical update rate which of course need not be the case.
> 
> The reason I added the check at all was that the numbers from the
> other CPUs may become stale if those CPUs are idle for too long, so at
> one point the contributions from them need to be discarded.  Question
> is when that point is and since sample delay may be arbitrary, that
> mechanism has to be more complex.

Yeah this has been an open issue on our end as well. Sampling-based
governors of course solved this primarily via their fundamental nature
and sampling rate. The interactive governor also has a separate tunable
IIRC which specified how long a CPU may have its sampling timer deferred
due to idle when running @ > fmin (the "slack timer").

Decoupling the CPU update staleness limit from the freq change rate
limit via a separate tunable would be valuable IMO. Would you be
amenable to a patch that did that?

>>> Like I said in my reply to Peter in that thread, using RELATION_L here is likely
>>> to make us avoid the min frequency almost entirely even if the system is almost
>>> completely idle.  I don't think that would be OK really.
>>>
>>> That said my opinion about this particular item isn't really strong.
>>
>> I think the calculation for required CPU bandwidth needs tweaking.
> 
> The reason why I used that particular formula was that ondemand used
> it.  Of course, the input to it is different in ondemand, but the idea
> here is to avoid departing from it too much.
> 
>> Aside from always wanting something past fmin, currently the amount of
>> extra CPU capacity given for a particular % utilization depends on how
>> high the platform's fmin happens to be, even if the fmax speeds are the
>> same. For example given two platforms with the following available
>> frequencies (MHz):
>>
>> platform A: 100, 300, 500, 700, 900, 1100
>> platform B: 500, 700, 900, 1100
> 
> The frequencies may not determine raw performance, though, so 500 MHz
> in platform A may correspond to 700 MHz in platform B.  You never
> know.

My example here was solely intended to illustrate that the current
algorithm itself introduces an inconsistency in policy when other things
are equal. Depending on the fmin value, this ondemand-style calculation
will give a more or less generous amount of CPU bandwidth headroom to a
platform with a higher fmin.

It'd be good to be able to express the desired amount of CPU bandwidth
headroom in such a way that it doesn't depend on the platform's fmin
value, since CPU headroom is a critical factor in tuning a platform's
governor for optimal power and performance.

> 
>>
>> A 50% utilization load on platform A will want 600 MHz (rounding up to
>> 700 MHz perhaps) whereas platform B will want 800 MHz (again likely
>> rounding up to 900 MHz), even though the load consumes 550 MHz on both
>> platforms.
>>
>> One possibility would be something like we had in schedfreq, getting the
>> absolute CPU bw requirement (util/max) * fmax and then adding some %
>> margin, which I think is more consistent. It is true that it means
>> figuring out what the right margin is and now there's a magic number
>> (and potentially a tunable), but it would be more consistent.
>>
> 
> What the picture is missing is the information on how much more
> performance you get by running in a higher P-state (or OPP if you
> will).  We don't have that information, however, and relying on
> frequency values here generally  doesn't help.

Why does the frequency value not help? It is true there may be issues of
a workload being memory bound and not responding quite linearly to
increasing frequency, but that would pose a problem for the current
algorithm also. Surely it's better to attempt a consistent policy which
doesn't vary based on a platform's fmin value?

> Moreover, since 0 utilization gets you to run in f_min no matter what,
> if you treat f_max as an absolute, you're going to underutilize the
> P-states in the upper half of the available range.

Sorry I didn't follow. What do you mean by underutilize the upper half
of the range? I don't see how using RELATION_L with (util/max) * fmax *
(headroom) wouldn't be correct in that regard.

thanks,
Steve

next prev parent reply	other threads:[~2016-03-03  3:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 23:16 [RFC/RFT][PATCH 0/1] cpufreq: New governor based on scheduler-provided utilization data Rafael J. Wysocki
2016-02-21 23:18 ` [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler Rafael J. Wysocki
2016-02-22 14:16   ` Juri Lelli
2016-02-22 23:02     ` Rafael J. Wysocki
2016-02-23  7:20       ` Steve Muckle
2016-02-24  1:38         ` Rafael J. Wysocki
2016-02-25 11:01       ` Juri Lelli
2016-02-26  2:36         ` Rafael J. Wysocki
2016-03-01 14:56           ` Juri Lelli
2016-03-01 20:26             ` Rafael J. Wysocki
2016-02-24  1:20 ` [RFC/RFT][PATCH v2 0/2] cpufreq: New governor based on scheduler-provided utilization data Rafael J. Wysocki
2016-02-24  1:22   ` [RFC/RFT][PATCH v2 1/2] cpufreq: New governor using utilization data from the scheduler Rafael J. Wysocki
2016-02-25 21:14     ` [RFC/RFT][PATCH v4 " Rafael J. Wysocki
2016-02-27  0:21       ` Rafael J. Wysocki
2016-02-27  4:33       ` Steve Muckle
2016-02-27 15:24         ` Rafael J. Wysocki
2016-03-01  4:10           ` Steve Muckle
2016-03-01 20:20             ` Rafael J. Wysocki
2016-03-03  3:20               ` Steve Muckle [this message]
2016-03-03  3:35                 ` Steve Muckle
2016-03-03 19:20                 ` Rafael J. Wysocki
2016-02-24  1:28   ` [RFC/RFT][PATCH v2 2/2] cpufreq: schedutil: Switching frequencies from interrupt context Rafael J. Wysocki
2016-02-24 23:30     ` [RFC/RFT][PATCH v3 " Rafael J. Wysocki
2016-02-25  9:08       ` Peter Zijlstra
2016-02-25  9:12         ` Peter Zijlstra
2016-02-25 11:11           ` Rafael J. Wysocki
2016-02-25 11:10         ` Rafael J. Wysocki
2016-02-25 11:52           ` Peter Zijlstra
2016-02-25 20:54             ` Rafael J. Wysocki
2016-02-25 21:20     ` [RFC/RFT][PATCH v4 " Rafael J. Wysocki
2016-03-03 14:27 ` [RFC/RFT][PATCH 0/1] cpufreq: New governor based on scheduler-provided utilization data Ingo Molnar
2016-03-03 17:15   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D7AD86.8080702@linaro.org \
    --to=steve.muckle@linaro.org \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).