* cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" @ 2014-11-25 17:21 Magdalena Dobosz 2014-11-26 6:07 ` Viresh Kumar 0 siblings, 1 reply; 7+ messages in thread From: Magdalena Dobosz @ 2014-11-25 17:21 UTC (permalink / raw) To: CPUFREQ; +Cc: Ross Lagerwall, rjw, viresh.kumar Hi, I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). I noticed that the value reported by cpufreq-info as the frequency "asserted by call to hardware" (exported to sysfs as "cpuinfo_cur_freq") is always the same as the value reported as the frequency requested by a governor (exported to sysfs as "freq", stored in policy->cur). It was not the case for kernel 3.2 (these two values tend to differ). I found a reason: while using acpi-cpufreq driver, the value reported as hardware frequency is retrieved using get_cur_freq_on_cpu function (defined in acpi-cpufreq). Now, in kernel version 3.2 this function reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads MSR_IA32_PERF_CTL. It is a result of the patch shown below: https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html It looks like a bug for me - we take a requested frequency (control register) and we report it as "cpuinfo_cur_freq - Current frequency of the CPU as obtained from the hardware, in KHz. This is the frequency the CPU actually runs at." (as desribed in cpufrequtils documentation: https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I believe we should read status register (MSR_IA32_PERF_STATUS) instead. Best regards, Magdalena Dobosz ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-25 17:21 cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" Magdalena Dobosz @ 2014-11-26 6:07 ` Viresh Kumar 2014-11-26 15:29 ` Dirk Brandewie 0 siblings, 1 reply; 7+ messages in thread From: Viresh Kumar @ 2014-11-26 6:07 UTC (permalink / raw) To: Magdalena Dobosz Cc: cpufreq@vger.kernel.org, Ross Lagerwall, Rafael J. Wysocki, linux-pm@vger.kernel.org, Dirk Brandewie, Len Brown On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> wrote: > Hi, > I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). > I noticed that the value reported by cpufreq-info as the frequency > "asserted by call to hardware" (exported to sysfs as > "cpuinfo_cur_freq") is always the same as the value reported as the > frequency requested by a governor (exported to sysfs as "freq", stored > in policy->cur). > It was not the case for kernel 3.2 (these two values tend to differ). > > I found a reason: while using acpi-cpufreq driver, the value reported > as hardware frequency is retrieved using get_cur_freq_on_cpu function > (defined in acpi-cpufreq). Now, in kernel version 3.2 this function > reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads > MSR_IA32_PERF_CTL. > It is a result of the patch shown below: > https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html > > It looks like a bug for me - we take a requested frequency (control > register) and we report it as "cpuinfo_cur_freq - Current frequency of > the CPU as obtained from the hardware, in KHz. This is the frequency > the CPU actually runs at." (as desribed in cpufrequtils documentation: > https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I > believe we should read status register (MSR_IA32_PERF_STATUS) instead. Fixed list and cc'd few more. This is what Len Brown said to that patch: Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable by its very definition. Any code that depends on it should be questioned... @Dirk: Can you help ? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-26 6:07 ` Viresh Kumar @ 2014-11-26 15:29 ` Dirk Brandewie 2014-11-26 20:41 ` Magdalena Dobosz 0 siblings, 1 reply; 7+ messages in thread From: Dirk Brandewie @ 2014-11-26 15:29 UTC (permalink / raw) To: Viresh Kumar, Magdalena Dobosz Cc: dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall, Rafael J. Wysocki, linux-pm@vger.kernel.org, Len Brown On 11/25/2014 10:07 PM, Viresh Kumar wrote: > On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> wrote: >> Hi, >> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). >> I noticed that the value reported by cpufreq-info as the frequency >> "asserted by call to hardware" (exported to sysfs as >> "cpuinfo_cur_freq") is always the same as the value reported as the >> frequency requested by a governor (exported to sysfs as "freq", stored >> in policy->cur). >> It was not the case for kernel 3.2 (these two values tend to differ). >> >> I found a reason: while using acpi-cpufreq driver, the value reported >> as hardware frequency is retrieved using get_cur_freq_on_cpu function >> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function >> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads >> MSR_IA32_PERF_CTL. >> It is a result of the patch shown below: >> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html >> >> It looks like a bug for me - we take a requested frequency (control >> register) and we report it as "cpuinfo_cur_freq - Current frequency of >> the CPU as obtained from the hardware, in KHz. This is the frequency >> the CPU actually runs at." (as desribed in cpufrequtils documentation: >> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I >> believe we should read status register (MSR_IA32_PERF_STATUS) instead. > > Fixed list and cc'd few more. > > This is what Len Brown said to that patch: > > Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable > by its very definition. Any code that depends on it should be questioned... > > @Dirk: Can you help ? Not really. This issue is one of the reasons that intel_pstate returns a measured/effective frequency. There is no way that I know of to get the instantaneous frequency that the core is running at in the presence of hardware coordination and idle. The best you can do is measure the effective frequency over a sample time. This is what turbostat and intel_pstate report. --Dirk > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-26 15:29 ` Dirk Brandewie @ 2014-11-26 20:41 ` Magdalena Dobosz 2014-11-26 22:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 7+ messages in thread From: Magdalena Dobosz @ 2014-11-26 20:41 UTC (permalink / raw) To: Dirk Brandewie Cc: Viresh Kumar, dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall, Rafael J. Wysocki, linux-pm@vger.kernel.org, Len Brown Dirk: I understand that it does not say about current frequency (it does not take into account such factors as c-states and turbo boost for instance). Yet, if it does say about a current p-state of a core, then it is a valid and useful piece of information. For instance, while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus, I observed that while the requested frequency (policy->cur) for different cores may be different, the frequency reported on the basis of the current p-state (read from MSR_IA32_PERF_STATUS) is all the time the same for all the cores. This suggests that all the cores are in the same frequency domain (hardware coordinated). Viresh, why is it ill-conceived? As I said above, I think the information concerning current p-state can be useful - as long as it is accurate. Do you have any premises to say that it is not accurate in terms of the current p-state of a core? Best regards, Magdalena 2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>: > On 11/25/2014 10:07 PM, Viresh Kumar wrote: >> >> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> >> wrote: >>> >>> Hi, >>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). >>> I noticed that the value reported by cpufreq-info as the frequency >>> "asserted by call to hardware" (exported to sysfs as >>> "cpuinfo_cur_freq") is always the same as the value reported as the >>> frequency requested by a governor (exported to sysfs as "freq", stored >>> in policy->cur). >>> It was not the case for kernel 3.2 (these two values tend to differ). >>> >>> I found a reason: while using acpi-cpufreq driver, the value reported >>> as hardware frequency is retrieved using get_cur_freq_on_cpu function >>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function >>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads >>> MSR_IA32_PERF_CTL. >>> It is a result of the patch shown below: >>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html >>> >>> It looks like a bug for me - we take a requested frequency (control >>> register) and we report it as "cpuinfo_cur_freq - Current frequency of >>> the CPU as obtained from the hardware, in KHz. This is the frequency >>> the CPU actually runs at." (as desribed in cpufrequtils documentation: >>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I >>> believe we should read status register (MSR_IA32_PERF_STATUS) instead. >> >> >> Fixed list and cc'd few more. >> >> This is what Len Brown said to that patch: >> >> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable >> by its very definition. Any code that depends on it should be >> questioned... >> >> @Dirk: Can you help ? > > > Not really. This issue is one of the reasons that intel_pstate returns a > measured/effective frequency. There is no way that I know of to get the > instantaneous frequency that the core is running at in the presence of > hardware coordination and idle. The best you can do is measure the > effective > frequency over a sample time. This is what turbostat and intel_pstate > report. > > --Dirk > >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-26 20:41 ` Magdalena Dobosz @ 2014-11-26 22:37 ` Rafael J. Wysocki 2014-11-27 16:38 ` Magdalena Dobosz 0 siblings, 1 reply; 7+ messages in thread From: Rafael J. Wysocki @ 2014-11-26 22:37 UTC (permalink / raw) To: Magdalena Dobosz Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org, Len Brown On Wednesday, November 26, 2014 09:41:02 PM Magdalena Dobosz wrote: > Dirk: I understand that it does not say about current frequency (it > does not take into account such factors as c-states and turbo boost > for instance). Yet, if it does say about a current p-state of a core, > then it is a valid and useful piece of information. For instance, You can't trust it, though, because the moment you've read the value and are now going to use it, it may be already stale. > while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus, > I observed that while the requested frequency (policy->cur) for > different cores may be different, the frequency reported on the basis > of the current p-state (read from MSR_IA32_PERF_STATUS) is all the > time the same for all the cores. This suggests that all the cores are > in the same frequency domain (hardware coordinated). Which usually is the case for (current) desktop processors, but may not be the case for server ones. > Viresh, why is it ill-conceived? As I said above, I think the > information concerning current p-state can be useful - as long as it > is accurate. Do you have any premises to say that it is not accurate > in terms of the current p-state of a core? Please see above. Kind regards, Rafael > 2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>: > > On 11/25/2014 10:07 PM, Viresh Kumar wrote: > >> > >> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> > >> wrote: > >>> > >>> Hi, > >>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). > >>> I noticed that the value reported by cpufreq-info as the frequency > >>> "asserted by call to hardware" (exported to sysfs as > >>> "cpuinfo_cur_freq") is always the same as the value reported as the > >>> frequency requested by a governor (exported to sysfs as "freq", stored > >>> in policy->cur). > >>> It was not the case for kernel 3.2 (these two values tend to differ). > >>> > >>> I found a reason: while using acpi-cpufreq driver, the value reported > >>> as hardware frequency is retrieved using get_cur_freq_on_cpu function > >>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function > >>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads > >>> MSR_IA32_PERF_CTL. > >>> It is a result of the patch shown below: > >>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html > >>> > >>> It looks like a bug for me - we take a requested frequency (control > >>> register) and we report it as "cpuinfo_cur_freq - Current frequency of > >>> the CPU as obtained from the hardware, in KHz. This is the frequency > >>> the CPU actually runs at." (as desribed in cpufrequtils documentation: > >>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I > >>> believe we should read status register (MSR_IA32_PERF_STATUS) instead. > >> > >> > >> Fixed list and cc'd few more. > >> > >> This is what Len Brown said to that patch: > >> > >> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable > >> by its very definition. Any code that depends on it should be > >> questioned... > >> > >> @Dirk: Can you help ? > > > > > > Not really. This issue is one of the reasons that intel_pstate returns a > > measured/effective frequency. There is no way that I know of to get the > > instantaneous frequency that the core is running at in the presence of > > hardware coordination and idle. The best you can do is measure the > > effective > > frequency over a sample time. This is what turbostat and intel_pstate > > report. > > > > --Dirk > > > >> > > -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-26 22:37 ` Rafael J. Wysocki @ 2014-11-27 16:38 ` Magdalena Dobosz 2015-03-14 17:18 ` Len Brown 0 siblings, 1 reply; 7+ messages in thread From: Magdalena Dobosz @ 2014-11-27 16:38 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org, Len Brown It can be already stale? Why is that? Anyway, cpufreq documentation is misleading: it says "cpuinfo_cur_freq : Current frequency of the CPU as obtained from the hardware, in KHz. This is the frequency the CPU actually runs at.", while the value is read from MSR_IA32_PERF_STATUS (or MSR_IA32_PERF_CTL in newer kernels). Best regards, Magdalena 2014-11-26 23:37 GMT+01:00 Rafael J. Wysocki <rjw@rjwysocki.net>: > On Wednesday, November 26, 2014 09:41:02 PM Magdalena Dobosz wrote: >> Dirk: I understand that it does not say about current frequency (it >> does not take into account such factors as c-states and turbo boost >> for instance). Yet, if it does say about a current p-state of a core, >> then it is a valid and useful piece of information. For instance, > > You can't trust it, though, because the moment you've read the value > and are now going to use it, it may be already stale. > >> while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus, >> I observed that while the requested frequency (policy->cur) for >> different cores may be different, the frequency reported on the basis >> of the current p-state (read from MSR_IA32_PERF_STATUS) is all the >> time the same for all the cores. This suggests that all the cores are >> in the same frequency domain (hardware coordinated). > > Which usually is the case for (current) desktop processors, but may not > be the case for server ones. > >> Viresh, why is it ill-conceived? As I said above, I think the >> information concerning current p-state can be useful - as long as it >> is accurate. Do you have any premises to say that it is not accurate >> in terms of the current p-state of a core? > > Please see above. > > Kind regards, > Rafael > > >> 2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>: >> > On 11/25/2014 10:07 PM, Viresh Kumar wrote: >> >> >> >> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> >> >> wrote: >> >>> >> >>> Hi, >> >>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13). >> >>> I noticed that the value reported by cpufreq-info as the frequency >> >>> "asserted by call to hardware" (exported to sysfs as >> >>> "cpuinfo_cur_freq") is always the same as the value reported as the >> >>> frequency requested by a governor (exported to sysfs as "freq", stored >> >>> in policy->cur). >> >>> It was not the case for kernel 3.2 (these two values tend to differ). >> >>> >> >>> I found a reason: while using acpi-cpufreq driver, the value reported >> >>> as hardware frequency is retrieved using get_cur_freq_on_cpu function >> >>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function >> >>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads >> >>> MSR_IA32_PERF_CTL. >> >>> It is a result of the patch shown below: >> >>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html >> >>> >> >>> It looks like a bug for me - we take a requested frequency (control >> >>> register) and we report it as "cpuinfo_cur_freq - Current frequency of >> >>> the CPU as obtained from the hardware, in KHz. This is the frequency >> >>> the CPU actually runs at." (as desribed in cpufrequtils documentation: >> >>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I >> >>> believe we should read status register (MSR_IA32_PERF_STATUS) instead. >> >> >> >> >> >> Fixed list and cc'd few more. >> >> >> >> This is what Len Brown said to that patch: >> >> >> >> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable >> >> by its very definition. Any code that depends on it should be >> >> questioned... >> >> >> >> @Dirk: Can you help ? >> > >> > >> > Not really. This issue is one of the reasons that intel_pstate returns a >> > measured/effective frequency. There is no way that I know of to get the >> > instantaneous frequency that the core is running at in the presence of >> > hardware coordination and idle. The best you can do is measure the >> > effective >> > frequency over a sample time. This is what turbostat and intel_pstate >> > report. >> > >> > --Dirk >> > >> >> >> > > > -- > I speak only for myself. > Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" 2014-11-27 16:38 ` Magdalena Dobosz @ 2015-03-14 17:18 ` Len Brown 0 siblings, 0 replies; 7+ messages in thread From: Len Brown @ 2015-03-14 17:18 UTC (permalink / raw) To: Magdalena Dobosz Cc: Rafael J. Wysocki, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org > {PERF_STATUS} can be already stale? Why is that? Hello Magdalena, Say two cores share a voltage domain. core0 makes a P-state request via PERF_CTRL to run FAST. core1 makes a P-state request via PERF_CTRL to run SLOW. The hardware is responsible for programming the shared VR to support the most demanding request -- FAST. It does this, and since it would be wasteful to have core1 run SLOW when the VR supports FAST, core1 also runs fast. Say core0 then goes idle. The SLOW request on core1 may now be the most demanding VR request, and so the VR is lowered, and core1 now runs SLOW. Say PERF_STATUS is read on core1 before and after this transition. One will read FAST, the other will read SLOW. Which one is the real frequency? The problem is that they are *both* the real frequency, and *neither* is the real frequency. Indeed, the concept of instantaneous frequency in many scenarios is extremely mis-leading. There are other scenarios related to turbo-mode and thermal control that make instantaneous frequency even more mis-leading, particularly since the frequency can change many times per second. Thus, the OS itself, can't depend on PERF_STATUS for anything. Requested frequency, however, does actually have *some* meaning. Average frequency is generally more useful. It requires is an accurate count of elapsed cycles divided by a reliable measurement of elapsed time. The problem with this is who decides the interval, and what do you do if somebody wants an instantaneous answer? The turbostat utility computes average frequency this in user-space, of course. intel_pstate can do it in-kernel because it is actually a governor+driver, and so it runs periodically. But if somebody wants to know the instantaneous frequency, you have to ask why, and what they'll do with it... > Anyway, cpufreq documentation is misleading: > it says "cpuinfo_cur_freq: Current frequency of the CPU > as obtained from the hardware, in KHz. > This is the frequency the CPU actually runs at." Yes, I see this in Documentation/cpu-freq/user-guide.txt It was added by this commit: da470db16c703d7f9617c366a36c6670f89a9830 [CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq Maybe it would have been a better idea to not update that documentation, and instead leave these as vague? > while the value is > read from MSR_IA32_PERF_STATUS (or MSR_IA32_PERF_CTL > in newer kernels). Both of those are bugs. The code that does this in acpi-cpufreq should be replaced with code that reads no MSR, and simply reutrns the last requested frequency (from memory). I think I had a patch to remove all that current-frequency cruft from the driver a while back -- perhaps time to revive it, and to fix the documentation too. thanks, Len Brown, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-03-14 17:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-25 17:21 cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" Magdalena Dobosz 2014-11-26 6:07 ` Viresh Kumar 2014-11-26 15:29 ` Dirk Brandewie 2014-11-26 20:41 ` Magdalena Dobosz 2014-11-26 22:37 ` Rafael J. Wysocki 2014-11-27 16:38 ` Magdalena Dobosz 2015-03-14 17:18 ` Len Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).