* [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
@ 2014-05-20 21:11 Stratos Karafotis
2014-05-20 21:31 ` Dirk Brandewie
0 siblings, 1 reply; 5+ messages in thread
From: Stratos Karafotis @ 2014-05-20 21:11 UTC (permalink / raw)
To: Dirk Brandewie, Rafael J. Wysocki, Viresh Kumar; +Cc: linux-pm@vger.kernel.org
Hi all,
Currently, we use the current P state to calculate the busy_scaled factor
and then the next P state.
We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
core active").
So, in processor families that have different turbo ratio limit
depending on active cores the current P state as it's considered
by the driver might be different from the actual current P state.
For example, I use an i7-3770 which reports as maximum turbo ratio limits
with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
we will calculate as the next P state the value 39. If the active cores
at that time was 3 or 4 the actual P state will be 38 or 37.
The current_pstate variable will have the value 39 and this will lead
to wrong calculation at the next sampling interval.
Trying to find a solution to the above I couldn't find an MSR that
we could use to get the number of active cores and use the respective
turbo ratio limit.
I also thought to use the IA32_PERF_STATUS to get the current P state
and use it in the calculations, but its scope is per core and not per
thread.
Am I missing something? If the above is correct, any idea how this
could be resolved?
Thanks in advance,
Stratos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
2014-05-20 21:11 [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state Stratos Karafotis
@ 2014-05-20 21:31 ` Dirk Brandewie
2014-05-20 21:59 ` Stratos Karafotis
0 siblings, 1 reply; 5+ messages in thread
From: Dirk Brandewie @ 2014-05-20 21:31 UTC (permalink / raw)
To: Stratos Karafotis, Rafael J. Wysocki, Viresh Kumar
Cc: dirk.j.brandewie, linux-pm@vger.kernel.org
On 05/20/2014 02:11 PM, Stratos Karafotis wrote:
> Hi all,
>
> Currently, we use the current P state to calculate the busy_scaled factor
> and then the next P state.
>
> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
> core active").
>
> So, in processor families that have different turbo ratio limit
> depending on active cores the current P state as it's considered
> by the driver might be different from the actual current P state.
>
> For example, I use an i7-3770 which reports as maximum turbo ratio limits
> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
> we will calculate as the next P state the value 39. If the active cores
> at that time was 3 or 4 the actual P state will be 38 or 37.
> The current_pstate variable will have the value 39 and this will lead
> to wrong calculation at the next sampling interval.
>
> Trying to find a solution to the above I couldn't find an MSR that
> we could use to get the number of active cores and use the respective
> turbo ratio limit.
>
> I also thought to use the IA32_PERF_STATUS to get the current P state
> and use it in the calculations, but its scope is per core and not per
> thread.
>
> Am I missing something? If the above is correct, any idea how this
> could be resolved?
The above is correct except the requested pstate is calculated using
max_pstate and turbo_pstate is used as the upper limit when calling
intel_pstate_set_pstate.
The value written to (MSR_IA32_PPERF_CTL is a request that is processed by the
CPU and is clipped internally to the current state of the CPU. Whether or
not any turbo is available is decided by the CPU asking for the top
turbo bin says give me all that is available.
Asking for more than the CPU has to give ATM is harmless.
>
>
> Thanks in advance,
> Stratos
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
2014-05-20 21:31 ` Dirk Brandewie
@ 2014-05-20 21:59 ` Stratos Karafotis
2014-05-20 23:01 ` Dirk Brandewie
0 siblings, 1 reply; 5+ messages in thread
From: Stratos Karafotis @ 2014-05-20 21:59 UTC (permalink / raw)
To: Dirk Brandewie, Rafael J. Wysocki, Viresh Kumar
Cc: dirk.j.brandewie, linux-pm@vger.kernel.org
On 21/05/2014 12:31 πμ, Dirk Brandewie wrote:
> On 05/20/2014 02:11 PM, Stratos Karafotis wrote:
>> Hi all,
>>
>> Currently, we use the current P state to calculate the busy_scaled factor
>> and then the next P state.
>>
>> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
>> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
>> core active").
>>
>> So, in processor families that have different turbo ratio limit
>> depending on active cores the current P state as it's considered
>> by the driver might be different from the actual current P state.
>>
>> For example, I use an i7-3770 which reports as maximum turbo ratio limits
>> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
>> we will calculate as the next P state the value 39. If the active cores
>> at that time was 3 or 4 the actual P state will be 38 or 37.
>> The current_pstate variable will have the value 39 and this will lead
>> to wrong calculation at the next sampling interval.
>>
>> Trying to find a solution to the above I couldn't find an MSR that
>> we could use to get the number of active cores and use the respective
>> turbo ratio limit.
>>
>> I also thought to use the IA32_PERF_STATUS to get the current P state
>> and use it in the calculations, but its scope is per core and not per
>> thread.
>>
>> Am I missing something? If the above is correct, any idea how this
>> could be resolved?
>
> The above is correct except the requested pstate is calculated using
> max_pstate and turbo_pstate is used as the upper limit when calling
> intel_pstate_set_pstate.
Thanks for your prompt reply!
But when we call intel_pstate_set_pstate we also set the current_pstate
to the requested pstate (which it may be, for example, 39).
>
> The value written to (MSR_IA32_PPERF_CTL is a request that is processed by the
> CPU and is clipped internally to the current state of the CPU. Whether or
> not any turbo is available is decided by the CPU asking for the top
> turbo bin says give me all that is available.
>
> Asking for more than the CPU has to give ATM is harmless.
Then the CPU clips internally, as you said, to the current actual state.
If all cores are active the current state (in CPU) will be 37.
Driver will still consider the current_pstate as 39.
In next sampling interval, in intel_pstate_get_scaled_busy we calculate
the core_busy as core_busy * max_pstate / current_pstate.
So, in the above example we have:
core_busy = core_busy * 34 / 39
and not
core_busy = core_busy * 34 / 37
as it should be.
Stratos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
2014-05-20 21:59 ` Stratos Karafotis
@ 2014-05-20 23:01 ` Dirk Brandewie
2014-05-21 17:22 ` Stratos Karafotis
0 siblings, 1 reply; 5+ messages in thread
From: Dirk Brandewie @ 2014-05-20 23:01 UTC (permalink / raw)
To: Stratos Karafotis, Dirk Brandewie, Rafael J. Wysocki,
Viresh Kumar
Cc: dirk.j.brandewie, linux-pm@vger.kernel.org
On 05/20/2014 02:59 PM, Stratos Karafotis wrote:
> On 21/05/2014 12:31 πμ, Dirk Brandewie wrote:
>> On 05/20/2014 02:11 PM, Stratos Karafotis wrote:
>>> Hi all,
>>>
>>> Currently, we use the current P state to calculate the busy_scaled factor
>>> and then the next P state.
>>>
>>> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
>>> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
>>> core active").
>>>
>>> So, in processor families that have different turbo ratio limit
>>> depending on active cores the current P state as it's considered
>>> by the driver might be different from the actual current P state.
>>>
>>> For example, I use an i7-3770 which reports as maximum turbo ratio limits
>>> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
>>> we will calculate as the next P state the value 39. If the active cores
>>> at that time was 3 or 4 the actual P state will be 38 or 37.
>>> The current_pstate variable will have the value 39 and this will lead
>>> to wrong calculation at the next sampling interval.
>>>
>>> Trying to find a solution to the above I couldn't find an MSR that
>>> we could use to get the number of active cores and use the respective
>>> turbo ratio limit.
>>>
>>> I also thought to use the IA32_PERF_STATUS to get the current P state
>>> and use it in the calculations, but its scope is per core and not per
>>> thread.
>>>
>>> Am I missing something? If the above is correct, any idea how this
>>> could be resolved?
>>
>> The above is correct except the requested pstate is calculated using
>> max_pstate and turbo_pstate is used as the upper limit when calling
>> intel_pstate_set_pstate.
>
> Thanks for your prompt reply!
>
> But when we call intel_pstate_set_pstate we also set the current_pstate
> to the requested pstate (which it may be, for example, 39).
>
>>
>> The value written to (MSR_IA32_PPERF_CTL is a request that is processed by the
>> CPU and is clipped internally to the current state of the CPU. Whether or
>> not any turbo is available is decided by the CPU asking for the top
>> turbo bin says give me all that is available.
>>
>> Asking for more than the CPU has to give ATM is harmless.
>
> Then the CPU clips internally, as you said, to the current actual state.
> If all cores are active the current state (in CPU) will be 37.
> Driver will still consider the current_pstate as 39.
>
> In next sampling interval, in intel_pstate_get_scaled_busy we calculate
> the core_busy as core_busy * max_pstate / current_pstate.
>
> So, in the above example we have:
> core_busy = core_busy * 34 / 39
> and not
> core_busy = core_busy * 34 / 37
> as it should be.
Yes and I know of no good way around it. In practice I haven't seen the
oscillation in the upper turbo range (which is what this may cause).
Do you have a workload where this is hurting performance?
Keep in mind you may have gotten 3.8385 Ghz or any value in the turbo
range based on the state of the processor. The processor updates the
effective frequency a lot faster than we sample.
--Dirk
>
>
> Stratos
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state
2014-05-20 23:01 ` Dirk Brandewie
@ 2014-05-21 17:22 ` Stratos Karafotis
0 siblings, 0 replies; 5+ messages in thread
From: Stratos Karafotis @ 2014-05-21 17:22 UTC (permalink / raw)
To: Dirk Brandewie, Rafael J. Wysocki, Viresh Kumar
Cc: dirk.j.brandewie, linux-pm@vger.kernel.org
On 21/05/2014 02:01 πμ, Dirk Brandewie wrote:
> On 05/20/2014 02:59 PM, Stratos Karafotis wrote:
>> On 21/05/2014 12:31 πμ, Dirk Brandewie wrote:
>>> On 05/20/2014 02:11 PM, Stratos Karafotis wrote:
>>>> Hi all,
>>>>
>>>> Currently, we use the current P state to calculate the busy_scaled factor
>>>> and then the next P state.
>>>>
>>>> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limit as the
>>>> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio limit of 1
>>>> core active").
>>>>
>>>> So, in processor families that have different turbo ratio limit
>>>> depending on active cores the current P state as it's considered
>>>> by the driver might be different from the actual current P state.
>>>>
>>>> For example, I use an i7-3770 which reports as maximum turbo ratio limits
>>>> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cases
>>>> we will calculate as the next P state the value 39. If the active cores
>>>> at that time was 3 or 4 the actual P state will be 38 or 37.
>>>> The current_pstate variable will have the value 39 and this will lead
>>>> to wrong calculation at the next sampling interval.
>>>>
>>>> Trying to find a solution to the above I couldn't find an MSR that
>>>> we could use to get the number of active cores and use the respective
>>>> turbo ratio limit.
>>>>
>>>> I also thought to use the IA32_PERF_STATUS to get the current P state
>>>> and use it in the calculations, but its scope is per core and not per
>>>> thread.
>>>>
>>>> Am I missing something? If the above is correct, any idea how this
>>>> could be resolved?
>>>
>>> The above is correct except the requested pstate is calculated using
>>> max_pstate and turbo_pstate is used as the upper limit when calling
>>> intel_pstate_set_pstate.
>>
>> Thanks for your prompt reply!
>>
>> But when we call intel_pstate_set_pstate we also set the current_pstate
>> to the requested pstate (which it may be, for example, 39).
>>
>>>
>>> The value written to (MSR_IA32_PPERF_CTL is a request that is processed by the
>>> CPU and is clipped internally to the current state of the CPU. Whether or
>>> not any turbo is available is decided by the CPU asking for the top
>>> turbo bin says give me all that is available.
>>>
>>> Asking for more than the CPU has to give ATM is harmless.
>>
>> Then the CPU clips internally, as you said, to the current actual state.
>> If all cores are active the current state (in CPU) will be 37.
>> Driver will still consider the current_pstate as 39.
>>
>> In next sampling interval, in intel_pstate_get_scaled_busy we calculate
>> the core_busy as core_busy * max_pstate / current_pstate.
>>
>> So, in the above example we have:
>> core_busy = core_busy * 34 / 39
>> and not
>> core_busy = core_busy * 34 / 37
>> as it should be.
>
> Yes and I know of no good way around it. In practice I haven't seen the
> oscillation in the upper turbo range (which is what this may cause).
> Do you have a workload where this is hurting performance?
>
> Keep in mind you may have gotten 3.8385 Ghz or any value in the turbo
> range based on the state of the processor. The processor updates the
> effective frequency a lot faster than we sample.
I guess, there will be no problem if all cores are full busy, because the
CPU internally will go to P state 37.
But if, later, the load decrease and the CPU is for example 50% busy, we
will carry this error in calculations because:
target = cpu->pstate.current_pstate +/- steps
The error will be vanished after some intervals (if we ask for a min P
state, for example, and the load is actually minimum). But the
above error will be introduced once in a while.
Stratos
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-21 17:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-20 21:11 [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state Stratos Karafotis
2014-05-20 21:31 ` Dirk Brandewie
2014-05-20 21:59 ` Stratos Karafotis
2014-05-20 23:01 ` Dirk Brandewie
2014-05-21 17:22 ` Stratos Karafotis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).