From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stratos Karafotis Subject: Re: [query] cpufreq: intel_pstate: diverge of current_pstate and actual P state Date: Wed, 21 May 2014 20:22:02 +0300 Message-ID: <537CE0BA.4020105@semaphore.gr> References: <537BC4E8.60500@semaphore.gr> <537BC9C2.6030801@intel.com> <537BD058.9070609@semaphore.gr> <537BDEE4.7080107@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from sema.semaphore.gr ([78.46.194.137]:33303 "EHLO sema.semaphore.gr" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751944AbaEURWH (ORCPT ); Wed, 21 May 2014 13:22:07 -0400 In-Reply-To: <537BDEE4.7080107@intel.com> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Dirk Brandewie , "Rafael J. Wysocki" , Viresh Kumar Cc: dirk.j.brandewie@intel.com, "linux-pm@vger.kernel.org" On 21/05/2014 02:01 =CF=80=CE=BC, Dirk Brandewie wrote: > On 05/20/2014 02:59 PM, Stratos Karafotis wrote: >> On 21/05/2014 12:31 =CF=80=CE=BC, Dirk Brandewie wrote: >>> On 05/20/2014 02:11 PM, Stratos Karafotis wrote: >>>> Hi all, >>>> >>>> Currently, we use the current P state to calculate the busy_scaled= factor >>>> and then the next P state. >>>> >>>> We also read the MSR_TURBO_RATIO_LIMIT to get the turbo ratio limi= t as the >>>> turbo_pstate. But, we always read bits 7:0 ("Maximum turbo ratio l= imit of 1 >>>> core active"). >>>> >>>> So, in processor families that have different turbo ratio limit >>>> depending on active cores the current P state as it's considered >>>> by the driver might be different from the actual current P state. >>>> >>>> For example, I use an i7-3770 which reports as maximum turbo ratio= limits >>>> with 1/2/3/4 actives cores the values 39/39/38/37. So, in some cas= es >>>> we will calculate as the next P state the value 39. If the active = cores >>>> at that time was 3 or 4 the actual P state will be 38 or 37. >>>> The current_pstate variable will have the value 39 and this will l= ead >>>> to wrong calculation at the next sampling interval. >>>> >>>> Trying to find a solution to the above I couldn't find an MSR that >>>> we could use to get the number of active cores and use the respect= ive >>>> turbo ratio limit. >>>> >>>> I also thought to use the IA32_PERF_STATUS to get the current P st= ate >>>> and use it in the calculations, but its scope is per core and not = per >>>> thread. >>>> >>>> Am I missing something? If the above is correct, any idea how this >>>> could be resolved? >>> >>> The above is correct except the requested pstate is calculated usin= g >>> max_pstate and turbo_pstate is used as the upper limit when calling >>> intel_pstate_set_pstate. >> >> Thanks for your prompt reply! >> >> But when we call intel_pstate_set_pstate we also set the current_pst= ate >> to the requested pstate (which it may be, for example, 39). >> >>> >>> The value written to (MSR_IA32_PPERF_CTL is a request that is proce= ssed by the >>> CPU and is clipped internally to the current state of the CPU. Whe= ther or >>> not any turbo is available is decided by the CPU asking for the top >>> turbo bin says give me all that is available. >>> >>> Asking for more than the CPU has to give ATM is harmless. >> >> Then the CPU clips internally, as you said, to the current actual st= ate. >> If all cores are active the current state (in CPU) will be 37. >> Driver will still consider the current_pstate as 39. >> >> In next sampling interval, in intel_pstate_get_scaled_busy we calcul= ate >> the core_busy as core_busy * max_pstate / current_pstate. >> >> So, in the above example we have: >> core_busy =3D core_busy * 34 / 39 >> and not >> core_busy =3D core_busy * 34 / 37 >> as it should be. >=20 > Yes and I know of no good way around it. In practice I haven't seen t= he > oscillation in the upper turbo range (which is what this may cause). > Do you have a workload where this is hurting performance? >=20 > Keep in mind you may have gotten 3.8385 Ghz or any value in the turbo > range based on the state of the processor. The processor updates the > effective frequency a lot faster than we sample. I guess, there will be no problem if all cores are full busy, because t= he CPU internally will go to P state 37. But if, later, the load decrease and the CPU is for example 50% busy, w= e will carry this error in calculations because: target =3D cpu->pstate.current_pstate +/- steps The error will be vanished after some intervals (if we ask for a min P state, for example, and the load is actually minimum). But the above error will be introduced once in a while. Stratos