Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stratos Karafotis <stratosk@semaphore.gr>
To: Dirk Brandewie <dirk.brandewie@gmail.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Viresh Kumar <viresh.kumar@linaro.org>
Cc: dirk.j.brandewie@intel.com,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Subject: Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
Date: Wed, 21 May 2014 20:22:02 +0300	[thread overview]
Message-ID: <537CE0BA.4020105@semaphore.gr> (raw)
In-Reply-To: <537BDEE4.7080107@intel.com>

On 21/05/2014 02:01 πμ, Dirk Brandewie wrote:
> On 05/20/2014 02:59 PM, Stratos Karafotis wrote:
>> On 21/05/2014 12:31 πμ, Dirk Brandewie wrote:
>>> On 05/20/2014 02:11 PM, Stratos Karafotis wrote:
>>>> Hi all,
>>>>
>>>> Currently, we use the current P state to calculate the busy_scaled factor
>>>> and then the next P state.
>>>>
>>>> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
>>>> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
>>>> core active").
>>>>
>>>> So, in processor families that have different turbo ratio limit
>>>> depending on active cores the current P state as it's considered
>>>> by the driver might be different from the actual current P state.
>>>>
>>>> For example, I use an i7-3770 which reports as maximum turbo ratio limits
>>>> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
>>>> we will calculate as the next P state the value 39. If the active cores
>>>> at that time was 3 or 4 the actual P state will be 38 or 37.
>>>> The current_pstate variable will have the value 39 and this will lead
>>>> to wrong calculation at the next sampling interval.
>>>>
>>>> Trying to find a solution to the above I couldn't find an MSR that
>>>> we could use to get the number of active cores and use the respective
>>>> turbo ratio limit.
>>>>
>>>> I also thought to use the IA32_PERF_STATUS to get the current P state
>>>> and use it in the calculations, but its scope is per core and not per
>>>> thread.
>>>>
>>>> Am I missing something? If the above is correct, any idea how this
>>>> could be resolved?
>>>
>>> The above is correct except the requested pstate is calculated using
>>> max_pstate and turbo_pstate is used as the upper limit when calling
>>> intel_pstate_set_pstate.
>>
>> Thanks for your prompt reply!
>>
>> But when we call intel_pstate_set_pstate we also set the current_pstate
>> to the requested pstate (which it may be, for example, 39).
>>
>>>
>>> The value written to (MSR_IA32_PPERF_CTL is a request that is processed by the
>>> CPU and is clipped internally to the current state of the CPU.  Whether or
>>> not any turbo is available is decided by the CPU asking for the top
>>> turbo bin says give me all that is available.
>>>
>>> Asking for more than the CPU has to give ATM is harmless.
>>
>> Then the CPU clips internally, as you said, to the current actual state.
>> If all cores are active the current state (in CPU) will be 37.
>> Driver will still consider the current_pstate as 39.
>>
>> In next sampling interval, in intel_pstate_get_scaled_busy we calculate
>> the core_busy as core_busy * max_pstate / current_pstate.
>>
>> So, in the above example we have:
>> core_busy = core_busy * 34 / 39
>> and not
>> core_busy = core_busy * 34 / 37
>> as it should be.
> 
> Yes and I know of no good way around it. In practice I haven't seen the
> oscillation in the upper turbo range (which is what this may cause).
> Do you have a workload where this is hurting performance?
> 
> Keep in mind you may have gotten 3.8385 Ghz or any value in the turbo
> range based on the state of the processor.  The processor updates the
> effective frequency a lot faster than we sample.

I guess, there will be no problem if all cores are full busy, because the
CPU internally will go to P state 37.

But if, later, the load decrease and the CPU is for example 50% busy, we
will carry this error in calculations because:

target = cpu->pstate.current_pstate +/- steps

The error will be vanished after some intervals (if we ask for a min P
state, for example, and the load is actually minimum). But the
above error will be introduced once in a while.


Stratos

     prev parent reply	other threads:[~2014-05-21 17:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-20 21:11 [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state Stratos Karafotis
2014-05-20 21:31 ` Dirk Brandewie
2014-05-20 21:59   ` Stratos Karafotis
2014-05-20 23:01     ` Dirk Brandewie
2014-05-21 17:22       ` Stratos Karafotis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537CE0BA.4020105@semaphore.gr \
    --to=stratosk@semaphore.gr \
    --cc=dirk.brandewie@gmail.com \
    --cc=dirk.j.brandewie@intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).