RE: [PATCH 5/5] cpufreq: intel_pstate: Document the current behavior and user interface

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>
Cc: 'Srinivas Pandruvada' <srinivas.pandruvada@linux.intel.com>,
	'LKML' <linux-kernel@vger.kernel.org>,
	'Jonathan Corbet' <corbet@lwn.net>,
	'Linux PM' <linux-pm@vger.kernel.org>,
	Doug Smythies <dsmythies@telus.net>
Subject: RE: [PATCH 5/5] cpufreq: intel_pstate: Document the current behavior and user interface
Date: Sun, 26 Mar 2017 23:32:37 -0700	[thread overview]
Message-ID: <001f01d2a6c3$ef7f0c00$ce7d2400$@net> (raw)
In-Reply-To: qpybcdyUgZGlLqpydcsQsL

On 2017.03.22 16:32 Rafael J. Wysocki wrote:

I realize that there is tradeoff between a succinct and brief
document and having to write a full book, but I have a couple of
comments anyhow.

> Add a document describing the current behavior and user space
> interface of the intel_pstate driver in the RST format and
> drop the existing outdated intel_pstate.txt document.

... [cut]...

> +The second variant of the ``powersave`` P-state selection algorithm, used in all
> +of the other cases (generally, on processors from the Core line, so it is
> +referred to as the "Core" algorithm), is based on the values read from the APERF
> +and MPERF feedback registers alone

And target pstate over the last sample interval.

> and it does not really take CPU utilization
> +into account explicitly.  Still, it causes the CPU P-state to ramp up very
> +quickly in response to increased utilization which is generally desirable in
> +server environments.

It will only ramp up quickly if another CPU has already ramped up such that the
effective pstate is much higher than the target, giving a very very high "load"
(actually scaled_busy) see comments further down.

... [cut]...

> +Turbo P-states Support
> +======================
...
> +Some processors allow multiple cores to be in turbo P-states at the same time,
> +but the maximum P-state that can be set for them generally depends on the number
> +of cores running concurrently.  The maximum turbo P-state that can be set for 3
> +cores at the same time usually is lower than the analogous maximum P-state for
> +2 cores, which in turn usually is lower than the maximum turbo P-state that can
> +be set for 1 core.  The one-core maximum turbo P-state is thus the maximum
> +supported one overall.

The above segment was retained because it is relevant to footnote 1 below.

...[cut]...

> +For example, the default values of the PID controller parameters for the Sandy
> +Bridge generation of processors are
> +
> +| ``deadband`` = 0
> +| ``d_gain_pct`` = 0
> +| ``i_gain_pct`` = 0
> +| ``p_gain_pct`` = 20
> +| ``sample_rate_ms`` = 10
> +| ``setpoint`` = 97
> +
> +If the derivative and integral coefficients in the PID algorithm are both equal
> +to 0 (which is the case above), the next P-State value will be equal to:
> +
> +  ``current_pstate`` - ((``setpoint`` - ``current_load``) * ``p_gain_pct``)
> +
> +where ``current_pstate`` is the P-state currently set for the given CPU and
> +``current_load`` is the current load estimate for it based on the current values
> +of feedback registers.

While mentioned earlier, it should be emphasized again here that this
"current_load" might be, and very often is, very very different than
the actual load on the CPU. It can be as high as the ratio of the maximum
P state / minimum P state. I.E. for my older i7 processor it can be
38/16 *100% = 237.5%. For more recent processors, that maximum can be much
higher. This is how this control algorithm can achieve a very rapid ramp
of pstate on a CPU that was previously idle, with these settings, and when
other CPUs were already active and ramped up.

> +
> +If ``current_pstate`` is 8 (in the internal representation used by
> +``intel_pstate``) and ``current_load`` is 100 (in percent), the next P-state
> +value will be:
> +
> +	8 - ((97 - 100) * 0.2) = 8.6
> +
> +which will be rounded up to 9, so the P-state value goes up by 1 in this case.
> +If the load does not change during the next interval between invocations of the
> +driver's utilization update callback for the CPU in question, the P-state value
> +will go up by 1 again and so on, as long as the load exceeds the ``setpoint``
> +value (or until the maximum P-state is reached).

No, only if the "load" exceeds the setpoint by at least 0.5/p_gain+setpoint,
Or for these settings, 99.5. The point being that p_gain and setpoint effect
each other in terms of system response.

Suggest it would be worth a fast ramp up example here. Something like:
Minimum pstate = 16; Maximum pstate = 38.

Current pstate = 16,
Effective pstate over the last interval, due to another CPU = 38
"load" = 237.5%

16 - ((97-237.5) * 0.2) = 44.1, which would be clamped to 38.

Footnote 1: Readers might argue that, due to multiple cores being active
at one time, we would never actually get a "load" of 237.5 in the above example.
That is true, but it can get very very close. For simplicity of the example, the
suggestion is to ignore it.
A real trace data sample fast ramp up example:

mperf: 9806829 cycles
apref: 10936506 cycles
tsc: 99803828 cycles
freq: 3.7916 GHz ; effective pstate 37.9
old target pstate: 16
duration: 29.26 milliseconds
load (actual): 9.83%
"load" (scaled)busy): 236
New target pstate: 38

... Doug

next prev parent reply	other threads:[~2017-03-27  6:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22 22:50 [PATCH 0/5] cpufreq: intel_pstate: HW support changes, limits rework and documentation Rafael J. Wysocki
2017-03-22 22:52 ` [PATCH 1/5] cpufreq: intel_pstate: Support HWP processors in all operation modes Rafael J. Wysocki
2017-03-22 22:53 ` [PATCH 2/5] cpufreq: intel_pstate: Use load-based P-state selection more widely Rafael J. Wysocki
2017-03-22 22:58 ` [PATCH 3/5] cpufreq: intel_pstate: Active mode P-state limits rework Rafael J. Wysocki
2017-03-22 23:00 ` [PATCH 4/5] cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq Rafael J. Wysocki
2017-03-22 23:32 ` [PATCH 5/5] cpufreq: intel_pstate: Document the current behavior and user interface Rafael J. Wysocki
2017-03-30 21:01   ` [Update][PATCH v2 " Rafael J. Wysocki
2017-04-18 14:24     ` Rafael J. Wysocki
2017-05-05 21:38     ` [Resend][PATCH] " Rafael J. Wysocki
2017-05-12 20:47       ` Rafael J. Wysocki
2017-05-12 21:20         ` Jonathan Corbet
2017-05-12 21:42           ` Rafael J. Wysocki
2017-03-27  6:32 ` Doug Smythies [this message]
2017-03-30  0:19   ` [PATCH 5/5] " Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001f01d2a6c3$ef7f0c00$ce7d2400$@net' \
    --to=dsmythies@telus.net \
    --cc=corbet@lwn.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).