* cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
@ 2014-11-25 17:21 Magdalena Dobosz
2014-11-26 6:07 ` Viresh Kumar
0 siblings, 1 reply; 7+ messages in thread
From: Magdalena Dobosz @ 2014-11-25 17:21 UTC (permalink / raw)
To: CPUFREQ; +Cc: Ross Lagerwall, rjw, viresh.kumar
Hi,
I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
I noticed that the value reported by cpufreq-info as the frequency
"asserted by call to hardware" (exported to sysfs as
"cpuinfo_cur_freq") is always the same as the value reported as the
frequency requested by a governor (exported to sysfs as "freq", stored
in policy->cur).
It was not the case for kernel 3.2 (these two values tend to differ).
I found a reason: while using acpi-cpufreq driver, the value reported
as hardware frequency is retrieved using get_cur_freq_on_cpu function
(defined in acpi-cpufreq). Now, in kernel version 3.2 this function
reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
MSR_IA32_PERF_CTL.
It is a result of the patch shown below:
https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
It looks like a bug for me - we take a requested frequency (control
register) and we report it as "cpuinfo_cur_freq - Current frequency of
the CPU as obtained from the hardware, in KHz. This is the frequency
the CPU actually runs at." (as desribed in cpufrequtils documentation:
https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
believe we should read status register (MSR_IA32_PERF_STATUS) instead.
Best regards,
Magdalena Dobosz
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-25 17:21 cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" Magdalena Dobosz
@ 2014-11-26 6:07 ` Viresh Kumar
2014-11-26 15:29 ` Dirk Brandewie
0 siblings, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2014-11-26 6:07 UTC (permalink / raw)
To: Magdalena Dobosz
Cc: cpufreq@vger.kernel.org, Ross Lagerwall, Rafael J. Wysocki,
linux-pm@vger.kernel.org, Dirk Brandewie, Len Brown
On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> wrote:
> Hi,
> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
> I noticed that the value reported by cpufreq-info as the frequency
> "asserted by call to hardware" (exported to sysfs as
> "cpuinfo_cur_freq") is always the same as the value reported as the
> frequency requested by a governor (exported to sysfs as "freq", stored
> in policy->cur).
> It was not the case for kernel 3.2 (these two values tend to differ).
>
> I found a reason: while using acpi-cpufreq driver, the value reported
> as hardware frequency is retrieved using get_cur_freq_on_cpu function
> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function
> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
> MSR_IA32_PERF_CTL.
> It is a result of the patch shown below:
> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
>
> It looks like a bug for me - we take a requested frequency (control
> register) and we report it as "cpuinfo_cur_freq - Current frequency of
> the CPU as obtained from the hardware, in KHz. This is the frequency
> the CPU actually runs at." (as desribed in cpufrequtils documentation:
> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
> believe we should read status register (MSR_IA32_PERF_STATUS) instead.
Fixed list and cc'd few more.
This is what Len Brown said to that patch:
Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable
by its very definition. Any code that depends on it should be questioned...
@Dirk: Can you help ?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-26 6:07 ` Viresh Kumar
@ 2014-11-26 15:29 ` Dirk Brandewie
2014-11-26 20:41 ` Magdalena Dobosz
0 siblings, 1 reply; 7+ messages in thread
From: Dirk Brandewie @ 2014-11-26 15:29 UTC (permalink / raw)
To: Viresh Kumar, Magdalena Dobosz
Cc: dirk.j.brandewie, cpufreq@vger.kernel.org, Ross Lagerwall,
Rafael J. Wysocki, linux-pm@vger.kernel.org, Len Brown
On 11/25/2014 10:07 PM, Viresh Kumar wrote:
> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com> wrote:
>> Hi,
>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
>> I noticed that the value reported by cpufreq-info as the frequency
>> "asserted by call to hardware" (exported to sysfs as
>> "cpuinfo_cur_freq") is always the same as the value reported as the
>> frequency requested by a governor (exported to sysfs as "freq", stored
>> in policy->cur).
>> It was not the case for kernel 3.2 (these two values tend to differ).
>>
>> I found a reason: while using acpi-cpufreq driver, the value reported
>> as hardware frequency is retrieved using get_cur_freq_on_cpu function
>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function
>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
>> MSR_IA32_PERF_CTL.
>> It is a result of the patch shown below:
>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
>>
>> It looks like a bug for me - we take a requested frequency (control
>> register) and we report it as "cpuinfo_cur_freq - Current frequency of
>> the CPU as obtained from the hardware, in KHz. This is the frequency
>> the CPU actually runs at." (as desribed in cpufrequtils documentation:
>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
>> believe we should read status register (MSR_IA32_PERF_STATUS) instead.
>
> Fixed list and cc'd few more.
>
> This is what Len Brown said to that patch:
>
> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable
> by its very definition. Any code that depends on it should be questioned...
>
> @Dirk: Can you help ?
Not really. This issue is one of the reasons that intel_pstate returns a
measured/effective frequency. There is no way that I know of to get the
instantaneous frequency that the core is running at in the presence of
hardware coordination and idle. The best you can do is measure the effective
frequency over a sample time. This is what turbostat and intel_pstate
report.
--Dirk
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-26 15:29 ` Dirk Brandewie
@ 2014-11-26 20:41 ` Magdalena Dobosz
2014-11-26 22:37 ` Rafael J. Wysocki
0 siblings, 1 reply; 7+ messages in thread
From: Magdalena Dobosz @ 2014-11-26 20:41 UTC (permalink / raw)
To: Dirk Brandewie
Cc: Viresh Kumar, dirk.j.brandewie, cpufreq@vger.kernel.org,
Ross Lagerwall, Rafael J. Wysocki, linux-pm@vger.kernel.org,
Len Brown
Dirk: I understand that it does not say about current frequency (it
does not take into account such factors as c-states and turbo boost
for instance). Yet, if it does say about a current p-state of a core,
then it is a valid and useful piece of information. For instance,
while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus,
I observed that while the requested frequency (policy->cur) for
different cores may be different, the frequency reported on the basis
of the current p-state (read from MSR_IA32_PERF_STATUS) is all the
time the same for all the cores. This suggests that all the cores are
in the same frequency domain (hardware coordinated).
Viresh, why is it ill-conceived? As I said above, I think the
information concerning current p-state can be useful - as long as it
is accurate. Do you have any premises to say that it is not accurate
in terms of the current p-state of a core?
Best regards,
Magdalena
2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>:
> On 11/25/2014 10:07 PM, Viresh Kumar wrote:
>>
>> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
>>> I noticed that the value reported by cpufreq-info as the frequency
>>> "asserted by call to hardware" (exported to sysfs as
>>> "cpuinfo_cur_freq") is always the same as the value reported as the
>>> frequency requested by a governor (exported to sysfs as "freq", stored
>>> in policy->cur).
>>> It was not the case for kernel 3.2 (these two values tend to differ).
>>>
>>> I found a reason: while using acpi-cpufreq driver, the value reported
>>> as hardware frequency is retrieved using get_cur_freq_on_cpu function
>>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function
>>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
>>> MSR_IA32_PERF_CTL.
>>> It is a result of the patch shown below:
>>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
>>>
>>> It looks like a bug for me - we take a requested frequency (control
>>> register) and we report it as "cpuinfo_cur_freq - Current frequency of
>>> the CPU as obtained from the hardware, in KHz. This is the frequency
>>> the CPU actually runs at." (as desribed in cpufrequtils documentation:
>>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
>>> believe we should read status register (MSR_IA32_PERF_STATUS) instead.
>>
>>
>> Fixed list and cc'd few more.
>>
>> This is what Len Brown said to that patch:
>>
>> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable
>> by its very definition. Any code that depends on it should be
>> questioned...
>>
>> @Dirk: Can you help ?
>
>
> Not really. This issue is one of the reasons that intel_pstate returns a
> measured/effective frequency. There is no way that I know of to get the
> instantaneous frequency that the core is running at in the presence of
> hardware coordination and idle. The best you can do is measure the
> effective
> frequency over a sample time. This is what turbostat and intel_pstate
> report.
>
> --Dirk
>
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-26 20:41 ` Magdalena Dobosz
@ 2014-11-26 22:37 ` Rafael J. Wysocki
2014-11-27 16:38 ` Magdalena Dobosz
0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2014-11-26 22:37 UTC (permalink / raw)
To: Magdalena Dobosz
Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie,
cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org,
Len Brown
On Wednesday, November 26, 2014 09:41:02 PM Magdalena Dobosz wrote:
> Dirk: I understand that it does not say about current frequency (it
> does not take into account such factors as c-states and turbo boost
> for instance). Yet, if it does say about a current p-state of a core,
> then it is a valid and useful piece of information. For instance,
You can't trust it, though, because the moment you've read the value
and are now going to use it, it may be already stale.
> while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus,
> I observed that while the requested frequency (policy->cur) for
> different cores may be different, the frequency reported on the basis
> of the current p-state (read from MSR_IA32_PERF_STATUS) is all the
> time the same for all the cores. This suggests that all the cores are
> in the same frequency domain (hardware coordinated).
Which usually is the case for (current) desktop processors, but may not
be the case for server ones.
> Viresh, why is it ill-conceived? As I said above, I think the
> information concerning current p-state can be useful - as long as it
> is accurate. Do you have any premises to say that it is not accurate
> in terms of the current p-state of a core?
Please see above.
Kind regards,
Rafael
> 2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>:
> > On 11/25/2014 10:07 PM, Viresh Kumar wrote:
> >>
> >> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
> >>> I noticed that the value reported by cpufreq-info as the frequency
> >>> "asserted by call to hardware" (exported to sysfs as
> >>> "cpuinfo_cur_freq") is always the same as the value reported as the
> >>> frequency requested by a governor (exported to sysfs as "freq", stored
> >>> in policy->cur).
> >>> It was not the case for kernel 3.2 (these two values tend to differ).
> >>>
> >>> I found a reason: while using acpi-cpufreq driver, the value reported
> >>> as hardware frequency is retrieved using get_cur_freq_on_cpu function
> >>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function
> >>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
> >>> MSR_IA32_PERF_CTL.
> >>> It is a result of the patch shown below:
> >>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
> >>>
> >>> It looks like a bug for me - we take a requested frequency (control
> >>> register) and we report it as "cpuinfo_cur_freq - Current frequency of
> >>> the CPU as obtained from the hardware, in KHz. This is the frequency
> >>> the CPU actually runs at." (as desribed in cpufrequtils documentation:
> >>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
> >>> believe we should read status register (MSR_IA32_PERF_STATUS) instead.
> >>
> >>
> >> Fixed list and cc'd few more.
> >>
> >> This is what Len Brown said to that patch:
> >>
> >> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable
> >> by its very definition. Any code that depends on it should be
> >> questioned...
> >>
> >> @Dirk: Can you help ?
> >
> >
> > Not really. This issue is one of the reasons that intel_pstate returns a
> > measured/effective frequency. There is no way that I know of to get the
> > instantaneous frequency that the core is running at in the presence of
> > hardware coordination and idle. The best you can do is measure the
> > effective
> > frequency over a sample time. This is what turbostat and intel_pstate
> > report.
> >
> > --Dirk
> >
> >>
> >
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-26 22:37 ` Rafael J. Wysocki
@ 2014-11-27 16:38 ` Magdalena Dobosz
2015-03-14 17:18 ` Len Brown
0 siblings, 1 reply; 7+ messages in thread
From: Magdalena Dobosz @ 2014-11-27 16:38 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie,
cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org,
Len Brown
It can be already stale? Why is that?
Anyway, cpufreq documentation is misleading: it says "cpuinfo_cur_freq
: Current frequency of the CPU as obtained from the hardware, in KHz.
This is the frequency the CPU actually runs at.", while the value is
read from MSR_IA32_PERF_STATUS (or MSR_IA32_PERF_CTL in newer
kernels).
Best regards,
Magdalena
2014-11-26 23:37 GMT+01:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> On Wednesday, November 26, 2014 09:41:02 PM Magdalena Dobosz wrote:
>> Dirk: I understand that it does not say about current frequency (it
>> does not take into account such factors as c-states and turbo boost
>> for instance). Yet, if it does say about a current p-state of a core,
>> then it is a valid and useful piece of information. For instance,
>
> You can't trust it, though, because the moment you've read the value
> and are now going to use it, it may be already stale.
>
>> while testing cpufrequtils on Intel i7, Intel i5 and Core 2 duo cpus,
>> I observed that while the requested frequency (policy->cur) for
>> different cores may be different, the frequency reported on the basis
>> of the current p-state (read from MSR_IA32_PERF_STATUS) is all the
>> time the same for all the cores. This suggests that all the cores are
>> in the same frequency domain (hardware coordinated).
>
> Which usually is the case for (current) desktop processors, but may not
> be the case for server ones.
>
>> Viresh, why is it ill-conceived? As I said above, I think the
>> information concerning current p-state can be useful - as long as it
>> is accurate. Do you have any premises to say that it is not accurate
>> in terms of the current p-state of a core?
>
> Please see above.
>
> Kind regards,
> Rafael
>
>
>> 2014-11-26 16:29 GMT+01:00 Dirk Brandewie <dirk.brandewie@gmail.com>:
>> > On 11/25/2014 10:07 PM, Viresh Kumar wrote:
>> >>
>> >> On 25 November 2014 at 22:51, Magdalena Dobosz <maj.dobosz@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>> I was analyzing the code and behaviour of cpufrequtils (kernel 3.13).
>> >>> I noticed that the value reported by cpufreq-info as the frequency
>> >>> "asserted by call to hardware" (exported to sysfs as
>> >>> "cpuinfo_cur_freq") is always the same as the value reported as the
>> >>> frequency requested by a governor (exported to sysfs as "freq", stored
>> >>> in policy->cur).
>> >>> It was not the case for kernel 3.2 (these two values tend to differ).
>> >>>
>> >>> I found a reason: while using acpi-cpufreq driver, the value reported
>> >>> as hardware frequency is retrieved using get_cur_freq_on_cpu function
>> >>> (defined in acpi-cpufreq). Now, in kernel version 3.2 this function
>> >>> reads a value from MSR_IA32_PERF_STATUS, while in kernel 3.13 it reads
>> >>> MSR_IA32_PERF_CTL.
>> >>> It is a result of the patch shown below:
>> >>> https://lists.ubuntu.com/archives/kernel-team/2013-June/029493.html
>> >>>
>> >>> It looks like a bug for me - we take a requested frequency (control
>> >>> register) and we report it as "cpuinfo_cur_freq - Current frequency of
>> >>> the CPU as obtained from the hardware, in KHz. This is the frequency
>> >>> the CPU actually runs at." (as desribed in cpufrequtils documentation:
>> >>> https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt". I
>> >>> believe we should read status register (MSR_IA32_PERF_STATUS) instead.
>> >>
>> >>
>> >> Fixed list and cc'd few more.
>> >>
>> >> This is what Len Brown said to that patch:
>> >>
>> >> Ack -- MSR_IA32_PERF_STATUS is ill-conceived. It is un-reliable
>> >> by its very definition. Any code that depends on it should be
>> >> questioned...
>> >>
>> >> @Dirk: Can you help ?
>> >
>> >
>> > Not really. This issue is one of the reasons that intel_pstate returns a
>> > measured/effective frequency. There is no way that I know of to get the
>> > instantaneous frequency that the core is running at in the presence of
>> > hardware coordination and idle. The best you can do is measure the
>> > effective
>> > frequency over a sample time. This is what turbostat and intel_pstate
>> > report.
>> >
>> > --Dirk
>> >
>> >>
>> >
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency"
2014-11-27 16:38 ` Magdalena Dobosz
@ 2015-03-14 17:18 ` Len Brown
0 siblings, 0 replies; 7+ messages in thread
From: Len Brown @ 2015-03-14 17:18 UTC (permalink / raw)
To: Magdalena Dobosz
Cc: Rafael J. Wysocki, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie,
cpufreq@vger.kernel.org, Ross Lagerwall, linux-pm@vger.kernel.org
> {PERF_STATUS} can be already stale? Why is that?
Hello Magdalena,
Say two cores share a voltage domain.
core0 makes a P-state request via PERF_CTRL to run FAST.
core1 makes a P-state request via PERF_CTRL to run SLOW.
The hardware is responsible for programming the shared VR to
support the most demanding request -- FAST. It does this,
and since it would be wasteful to have core1 run SLOW when
the VR supports FAST, core1 also runs fast.
Say core0 then goes idle.
The SLOW request on core1 may now be the most demanding VR request,
and so the VR is lowered, and core1 now runs SLOW.
Say PERF_STATUS is read on core1 before and after this transition.
One will read FAST, the other will read SLOW. Which one is the
real frequency?
The problem is that they are *both* the real frequency, and *neither*
is the real frequency. Indeed, the concept of instantaneous frequency
in many scenarios is extremely mis-leading.
There are other scenarios related to turbo-mode and thermal control
that make instantaneous frequency even more mis-leading,
particularly since the frequency can change many times per second.
Thus, the OS itself, can't depend on PERF_STATUS for anything.
Requested frequency, however, does actually have *some* meaning.
Average frequency is generally more useful.
It requires is an accurate count of elapsed cycles
divided by a reliable measurement of elapsed time.
The problem with this is who decides the interval,
and what do you do if somebody wants an instantaneous answer?
The turbostat utility computes average frequency this in user-space,
of course. intel_pstate can do it in-kernel because it is actually a
governor+driver, and so it runs periodically.
But if somebody wants to know the instantaneous frequency,
you have to ask why, and what they'll do with it...
> Anyway, cpufreq documentation is misleading:
> it says "cpuinfo_cur_freq: Current frequency of the CPU
> as obtained from the hardware, in KHz.
> This is the frequency the CPU actually runs at."
Yes, I see this in Documentation/cpu-freq/user-guide.txt
It was added by this commit: da470db16c703d7f9617c366a36c6670f89a9830
[CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq
Maybe it would have been a better idea to not update
that documentation, and instead leave these as vague?
> while the value is
> read from MSR_IA32_PERF_STATUS (or MSR_IA32_PERF_CTL
> in newer kernels).
Both of those are bugs. The code that does this in acpi-cpufreq
should be replaced with code that reads no MSR, and simply
reutrns the last requested frequency (from memory).
I think I had a patch to remove all that current-frequency
cruft from the driver a while back -- perhaps time to revive it,
and to fix the documentation too.
thanks,
Len Brown, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-03-14 17:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-25 17:21 cpufreq-info and acpi-cpufreq: reporting MSR_IA32_PERF_CTL as "actual frequency" Magdalena Dobosz
2014-11-26 6:07 ` Viresh Kumar
2014-11-26 15:29 ` Dirk Brandewie
2014-11-26 20:41 ` Magdalena Dobosz
2014-11-26 22:37 ` Rafael J. Wysocki
2014-11-27 16:38 ` Magdalena Dobosz
2015-03-14 17:18 ` Len Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).