linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
@ 2025-05-22  8:05 Shashank Balaji
  2025-05-22  8:50 ` Russell Haley
  0 siblings, 1 reply; 14+ messages in thread
From: Shashank Balaji @ 2025-05-22  8:05 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet
  Cc: linux-pm, linux-doc, linux-kernel, Shinya Takumi, Shashank Balaji

The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
means the requested frequency may not strictly be followed. This is true in the
case of the intel_pstate driver with HWP enabled. When programming the
HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
is set to the policy's max. So, the hardware is free to increase the frequency
beyond the requested frequency.

This behaviour can be slightly surprising, given the current wording "allows
userspace to set the CPU frequency". Hence, document this.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
 Documentation/admin-guide/pm/cpufreq.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 3950583f2b1549b27f568632547e22e9ef8bc167..066fe74f856699c8dd6aaf5e135162ce70686333 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -397,8 +397,15 @@ policy limits change after that.
 -------------
 
 This governor does not do anything by itself.  Instead, it allows user space
-to set the CPU frequency for the policy it is attached to by writing to the
-``scaling_setspeed`` attribute of that policy.
+to set a target CPU frequency for the policy it is attached to by writing to the
+``scaling_setspeed`` attribute of that policy. The actual frequency will be
+greater than or equal to ``scaling_setspeed``, depending on the cpufreq driver.
+For example, if hardware-managed P-states are enabled, then the ``intel_pstate``
+driver will set the minimum frequency to the value of ``scaling_setspeed`` and
+the maximum frequency to the value of ``scaling_max_freq``.  The hardware is
+free to select any frequency between those two values. If this behavior is not
+desired, then ``scaling_max_freq`` should be set to the same value as
+``scaling_setspeed``.
 
 ``schedutil``
 -------------

---
base-commit: d608703fcdd9e9538f6c7a0fcf98bf79b1375b60
change-id: 20250522-userspace-governor-doc-86380dbab3d5

Best regards,
-- 
Shashank Balaji <shashank.mahadasyam@sony.com>


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  8:05 [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed Shashank Balaji
@ 2025-05-22  8:50 ` Russell Haley
  2025-05-22  9:46   ` Shashank Balaji
  2025-05-22  9:47   ` Rafael J. Wysocki
  0 siblings, 2 replies; 14+ messages in thread
From: Russell Haley @ 2025-05-22  8:50 UTC (permalink / raw)
  To: Shashank Balaji, Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet
  Cc: linux-pm, linux-doc, linux-kernel, Shinya Takumi


On 5/22/25 3:05 AM, Shashank Balaji wrote:
> The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
> means the requested frequency may not strictly be followed. This is true in the
> case of the intel_pstate driver with HWP enabled. When programming the
> HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
> is set to the policy's max. So, the hardware is free to increase the frequency
> beyond the requested frequency.
> 
> This behaviour can be slightly surprising, given the current wording "allows
> userspace to set the CPU frequency". Hence, document this.
> 

In my opinion, the documentation is correct, and it is the
implementation in intel_pstate that is wrong. If the user wanted two
separate knobs that control the minimum and maximum frequencies, they
could leave intel_pstate in "active" mode and change scaling_min_freq
and scaling_max_freq.

If the user asks for the frequency to be set from userspace, the
frequency had damn well better be set from userspace.

- Russell

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  8:50 ` Russell Haley
@ 2025-05-22  9:46   ` Shashank Balaji
  2025-05-22  9:56     ` Rafael J. Wysocki
  2025-05-22 11:15     ` Russell Haley
  2025-05-22  9:47   ` Rafael J. Wysocki
  1 sibling, 2 replies; 14+ messages in thread
From: Shashank Balaji @ 2025-05-22  9:46 UTC (permalink / raw)
  To: Russell Haley
  Cc: Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet, linux-pm,
	linux-doc, linux-kernel, Shinya Takumi

Hi Russell,

On Thu, May 22, 2025 at 03:50:55AM -0500, Russell Haley wrote:
> If the user asks for the frequency to be set from userspace, the
> frequency had damn well better be set from userspace.

First of all, I agree with you. In fact, before sending this patch, I
was considering adding CPUFREQ_GOV_STRICT_TARGET to the userspace
governor. intel_pstate should handle the rest of it.

> In my opinion, the documentation is correct, and it is the
> implementation in intel_pstate that is wrong. If the user wanted two
> separate knobs that control the minimum and maximum frequencies, they
> could leave intel_pstate in "active" mode and change scaling_min_freq
> and scaling_max_freq.

If intel_pstate is left in "active" mode, then userspace can't use any
of the other governors. Moreover, intel_pstate's min and max frequencies
apply to all the cpus. Whereas, the userspace governor can be set on a
per-cpu basis.

Let's say this is "fixed" by adding CPUFREQ_GOV_STRICT_TARGET flag to
the userspace governor. Then userspace has no way to get back the
current behavior where the hardware automagically increases frequency
beyond the target frequency. At least not without a new interface.

With the current behaviour, userspace can have it both ways:
    - actual frequency = target frequency
    - actual frequency >= target frequency

And the occasional higher frequency shouldn't hurt performance, right?
But if they still want exact equality, with the current interface, they
can do that too.

This consideration is what led me to document the "actual freq >= target
freq" rather than patch it so that "actual freq = target freq".

Thanks

Regards,
Shashank
"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  8:50 ` Russell Haley
  2025-05-22  9:46   ` Shashank Balaji
@ 2025-05-22  9:47   ` Rafael J. Wysocki
  2025-05-22 11:15     ` Russell Haley
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-22  9:47 UTC (permalink / raw)
  To: Russell Haley
  Cc: Shashank Balaji, Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet,
	linux-pm, linux-doc, linux-kernel, Shinya Takumi

On Thu, May 22, 2025 at 10:51 AM Russell Haley <yumpusamongus@gmail.com> wrote:
>
>
> On 5/22/25 3:05 AM, Shashank Balaji wrote:
> > The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
> > means the requested frequency may not strictly be followed. This is true in the
> > case of the intel_pstate driver with HWP enabled. When programming the
> > HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
> > is set to the policy's max. So, the hardware is free to increase the frequency
> > beyond the requested frequency.
> >
> > This behaviour can be slightly surprising, given the current wording "allows
> > userspace to set the CPU frequency". Hence, document this.
> >
>
> In my opinion, the documentation is correct, and it is the
> implementation in intel_pstate that is wrong. If the user wanted two
> separate knobs that control the minimum and maximum frequencies, they
> could leave intel_pstate in "active" mode and change scaling_min_freq
> and scaling_max_freq.
>
> If the user asks for the frequency to be set from userspace, the
> frequency had damn well better be set from userspace.

The userspace governor requests a frequency between policy->min and
policy->max on behalf of user space.  In intel_pstate this translates
to setting DESIRED_PERF to the requested value which is also the case
for the other governors.

There is no guarantee that the request will be granted by the
hardware, either way.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  9:46   ` Shashank Balaji
@ 2025-05-22  9:56     ` Rafael J. Wysocki
  2025-05-22 11:15     ` Russell Haley
  1 sibling, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-22  9:56 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Russell Haley, Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet,
	linux-pm, linux-doc, linux-kernel, Shinya Takumi

On Thu, May 22, 2025 at 11:46 AM Shashank Balaji
<shashank.mahadasyam@sony.com> wrote:
>
> Hi Russell,
>
> On Thu, May 22, 2025 at 03:50:55AM -0500, Russell Haley wrote:
> > If the user asks for the frequency to be set from userspace, the
> > frequency had damn well better be set from userspace.
>
> First of all, I agree with you. In fact, before sending this patch, I
> was considering adding CPUFREQ_GOV_STRICT_TARGET to the userspace
> governor. intel_pstate should handle the rest of it.

This wouldn't work the way you expect, though.  It would cause the
driver to always set the frequency to policy->max.

> > In my opinion, the documentation is correct, and it is the
> > implementation in intel_pstate that is wrong. If the user wanted two
> > separate knobs that control the minimum and maximum frequencies, they
> > could leave intel_pstate in "active" mode and change scaling_min_freq
> > and scaling_max_freq.
>
> If intel_pstate is left in "active" mode, then userspace can't use any
> of the other governors. Moreover, intel_pstate's min and max frequencies
> apply to all the cpus.

That's not true.

scaling_min_freq and scaling_max_freq is per CPU, but the values from
there are subject to hardware coordination.

> Whereas, the userspace governor can be set on a per-cpu basis.

This is also subject to hardware coordination.

> Let's say this is "fixed" by adding CPUFREQ_GOV_STRICT_TARGET flag to
> the userspace governor. Then userspace has no way to get back the
> current behavior where the hardware automagically increases frequency
> beyond the target frequency. At least not without a new interface.
>
> With the current behaviour, userspace can have it both ways:
>     - actual frequency = target frequency
>     - actual frequency >= target frequency
>
> And the occasional higher frequency shouldn't hurt performance, right?
> But if they still want exact equality, with the current interface, they
> can do that too.
>
> This consideration is what led me to document the "actual freq >= target
> freq" rather than patch it so that "actual freq = target freq".

The documentation can be adjusted by replacing "set" with "request" in
the userspace governor description and adding a clarification to it
that the requested frequency is between the policy min and max levels.

With HWP enabled, the closest to setting the frequency to a specific
value one can get is by setting scaling_min_freq and scaling_max_freq
to that value.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  9:47   ` Rafael J. Wysocki
@ 2025-05-22 11:15     ` Russell Haley
  2025-05-22 11:54       ` Rafael J. Wysocki
  2025-05-23  4:25       ` Shashank Balaji
  0 siblings, 2 replies; 14+ messages in thread
From: Russell Haley @ 2025-05-22 11:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Shashank Balaji, Viresh Kumar, Jonathan Corbet, linux-pm,
	linux-doc, linux-kernel, Shinya Takumi



On 5/22/25 4:47 AM, Rafael J. Wysocki wrote:
> On Thu, May 22, 2025 at 10:51 AM Russell Haley <yumpusamongus@gmail.com> wrote:
>>
>>
>> On 5/22/25 3:05 AM, Shashank Balaji wrote:
>>> The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
>>> means the requested frequency may not strictly be followed. This is true in the
>>> case of the intel_pstate driver with HWP enabled. When programming the
>>> HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
>>> is set to the policy's max. So, the hardware is free to increase the frequency
>>> beyond the requested frequency.
>>>
>>> This behaviour can be slightly surprising, given the current wording "allows
>>> userspace to set the CPU frequency". Hence, document this.
>>>
>>
>> In my opinion, the documentation is correct, and it is the
>> implementation in intel_pstate that is wrong. If the user wanted two
>> separate knobs that control the minimum and maximum frequencies, they
>> could leave intel_pstate in "active" mode and change scaling_min_freq
>> and scaling_max_freq.
>>
>> If the user asks for the frequency to be set from userspace, the
>> frequency had damn well better be set from userspace.
> 
> The userspace governor requests a frequency between policy->min and
> policy->max on behalf of user space.  In intel_pstate this translates
> to setting DESIRED_PERF to the requested value which is also the case
> for the other governors.

Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
Minimum_Performance, and leaving desired at 0.

> echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
userspace
> echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
1400000
> sudo x86_energy_perf_policy &| grep REQ
cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu1: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu2: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu3: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu4: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu5: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu6: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
cpu7: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0

Cheers,
Russell

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22  9:46   ` Shashank Balaji
  2025-05-22  9:56     ` Rafael J. Wysocki
@ 2025-05-22 11:15     ` Russell Haley
  1 sibling, 0 replies; 14+ messages in thread
From: Russell Haley @ 2025-05-22 11:15 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet, linux-pm,
	linux-doc, linux-kernel, Shinya Takumi



On 5/22/25 4:46 AM, Shashank Balaji wrote:
> Hi Russell,
> 
> If intel_pstate is left in "active" mode, then userspace can't use any
> of the other governors. Moreover, intel_pstate's min and max frequencies
> apply to all the cpus. Whereas, the userspace governor can be set on a
> per-cpu basis.

If setting frequencies on a per-CPU basis is how you discovered this,
you may find it to be a source of more automagic. There are a lot of
client processors that cannot (usefully) have different frequency
targets for each CPU, because there is only one voltage regulator. In
that case, slowing any CPU down would only harm its performance (and
efficiency, because race-to-sleep). So, the global frequency target is
taken as the maximum of the per-CPU targets.

Cheers,
Russell


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22 11:15     ` Russell Haley
@ 2025-05-22 11:54       ` Rafael J. Wysocki
  2025-05-23 18:57         ` Rafael J. Wysocki
  2025-05-23  4:25       ` Shashank Balaji
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-22 11:54 UTC (permalink / raw)
  To: Russell Haley
  Cc: Rafael J. Wysocki, Shashank Balaji, Viresh Kumar, Jonathan Corbet,
	linux-pm, linux-doc, linux-kernel, Shinya Takumi

On Thu, May 22, 2025 at 1:15 PM Russell Haley <yumpusamongus@gmail.com> wrote:
>
>
>
> On 5/22/25 4:47 AM, Rafael J. Wysocki wrote:
> > On Thu, May 22, 2025 at 10:51 AM Russell Haley <yumpusamongus@gmail.com> wrote:
> >>
> >>
> >> On 5/22/25 3:05 AM, Shashank Balaji wrote:
> >>> The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
> >>> means the requested frequency may not strictly be followed. This is true in the
> >>> case of the intel_pstate driver with HWP enabled. When programming the
> >>> HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
> >>> is set to the policy's max. So, the hardware is free to increase the frequency
> >>> beyond the requested frequency.
> >>>
> >>> This behaviour can be slightly surprising, given the current wording "allows
> >>> userspace to set the CPU frequency". Hence, document this.
> >>>
> >>
> >> In my opinion, the documentation is correct, and it is the
> >> implementation in intel_pstate that is wrong. If the user wanted two
> >> separate knobs that control the minimum and maximum frequencies, they
> >> could leave intel_pstate in "active" mode and change scaling_min_freq
> >> and scaling_max_freq.
> >>
> >> If the user asks for the frequency to be set from userspace, the
> >> frequency had damn well better be set from userspace.
> >
> > The userspace governor requests a frequency between policy->min and
> > policy->max on behalf of user space.  In intel_pstate this translates
> > to setting DESIRED_PERF to the requested value which is also the case
> > for the other governors.
>
> Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
> Minimum_Performance, and leaving desired at 0.
>
> > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> userspace
> > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
> 1400000
> > sudo x86_energy_perf_policy &| grep REQ
> cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu1: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu2: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu3: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu4: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu5: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu6: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> cpu7: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0

OK, let me double check the code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22 11:15     ` Russell Haley
  2025-05-22 11:54       ` Rafael J. Wysocki
@ 2025-05-23  4:25       ` Shashank Balaji
  2025-05-23 19:06         ` Rafael J. Wysocki
  1 sibling, 1 reply; 14+ messages in thread
From: Shashank Balaji @ 2025-05-23  4:25 UTC (permalink / raw)
  To: Russell Haley
  Cc: Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet, linux-pm,
	linux-doc, linux-kernel, Shinya Takumi

Hi Russell,

On Thu, May 22, 2025 at 06:15:24AM -0500, Russell Haley wrote:
> > The userspace governor requests a frequency between policy->min and
> > policy->max on behalf of user space.  In intel_pstate this translates
> > to setting DESIRED_PERF to the requested value which is also the case
> > for the other governors.
> 
> Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
> Minimum_Performance, and leaving desired at 0.
> 
> > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> userspace
> > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
> 1400000
> > sudo x86_energy_perf_policy &| grep REQ
> cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0

Oh cool, I didn't know about x86_energy_perf_policy.

Consider the following on a Raptor Lake machine:

1. HWP_REQUEST MSR set by intel_pstate in active mode:

	# echo active > intel_pstate/status
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
	# echo 2000000 > cpufreq/policy0/scaling_min_freq 
	# echo 3000000 > cpufreq/policy0/scaling_max_freq 
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 26 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)

	scaling_{min,max}_freq just affect the min and max frequencies
	set in HWP_REQEST. desired_freq is left at 0.

2. HWP_REQUEST MSR set by intel_pstate in passive mode with userspace
governor:

	# echo passive > intel_pstate/status
	# echo userspace > cpufreq/policy0/scaling_governor 
	# cat cpufreq/policy0/scaling_setspeed 
	866151
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
	# echo 2000000 > cpufreq/policy0/scaling_setspeed 
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 26 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)

	scaling_setspeed only changes the min frequency in HWP_REQUEST.
	Meaning, software is explicitly allowing the hardware to choose
	higher frequencies.

3. Same as above, except with strictuserspace governor, which is a
custom kernel module which is exactly the same as the userspace
governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set:

	# echo strictuserspace > cpufreq/policy0/scaling_governor 
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
	# echo 3000000 > cpufreq/policy0/scaling_setspeed 
	# x86_energy_perf_policy -c 0 2>&1 | grep REQ
	cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
	pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)

	With the strict flag set, intel_pstate honours this by setting
	the min and max freq same.

desired_perf is always 0 in the above cases. The strict flag check is done in
intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy
has strict target, and cpu->max_perf_ratio otherwise.

As Russell and Rafael have noted, CPU frequency is subject to hardware
coordination and optimizations. While I get that, shouldn't software try
its best with whatever interface it has available? If a user sets the
userspace governor, that's because they want to have manual control over
CPU frequency, for whatever reason. The kernel should honor this by
setting the min and max freq in HWP_REQUEST equal. The current behaviour
explicitly lets the hardware choose higher frequencies.

Since Russell pointed out that the "actual freq >= target freq" can be
achieved by leaving intel_pstate active and setting scaling_{min,max}_freq
instead (for some reason this slipped my mind), I now think the strict target
flag should be added to the userspace governor, leaving the documentation as
is. Maybe a warning like "you may want to set this exact frequency, but it's
subject to hardware coordination, so beware" can be added.

Thanks

Regards,
Shashank

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-22 11:54       ` Rafael J. Wysocki
@ 2025-05-23 18:57         ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-23 18:57 UTC (permalink / raw)
  To: Russell Haley
  Cc: Shashank Balaji, Viresh Kumar, Jonathan Corbet, linux-pm,
	linux-doc, linux-kernel, Shinya Takumi

On Thu, May 22, 2025 at 1:54 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, May 22, 2025 at 1:15 PM Russell Haley <yumpusamongus@gmail.com> wrote:
> >
> >
> >
> > On 5/22/25 4:47 AM, Rafael J. Wysocki wrote:
> > > On Thu, May 22, 2025 at 10:51 AM Russell Haley <yumpusamongus@gmail.com> wrote:
> > >>
> > >>
> > >> On 5/22/25 3:05 AM, Shashank Balaji wrote:
> > >>> The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which
> > >>> means the requested frequency may not strictly be followed. This is true in the
> > >>> case of the intel_pstate driver with HWP enabled. When programming the
> > >>> HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf
> > >>> is set to the policy's max. So, the hardware is free to increase the frequency
> > >>> beyond the requested frequency.
> > >>>
> > >>> This behaviour can be slightly surprising, given the current wording "allows
> > >>> userspace to set the CPU frequency". Hence, document this.
> > >>>
> > >>
> > >> In my opinion, the documentation is correct, and it is the
> > >> implementation in intel_pstate that is wrong. If the user wanted two
> > >> separate knobs that control the minimum and maximum frequencies, they
> > >> could leave intel_pstate in "active" mode and change scaling_min_freq
> > >> and scaling_max_freq.
> > >>
> > >> If the user asks for the frequency to be set from userspace, the
> > >> frequency had damn well better be set from userspace.
> > >
> > > The userspace governor requests a frequency between policy->min and
> > > policy->max on behalf of user space.  In intel_pstate this translates
> > > to setting DESIRED_PERF to the requested value which is also the case
> > > for the other governors.
> >
> > Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
> > Minimum_Performance, and leaving desired at 0.
> >
> > > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> > userspace
> > > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
> > 1400000
> > > sudo x86_energy_perf_policy &| grep REQ
> > cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu1: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu2: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu3: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu4: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu5: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu6: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > cpu7: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>
> OK, let me double check the code.

I stand corrected, HWP_MIN_PERF is indeed set in accordance with the
target frequency, not HWP_DESIRED_PERF.

The reason why is because running at a frequency below the target
might cause insufficient performance to be delivered which would break
the assumptions of the schedutil governor.

However, setting HWP_DESIRED_PERF to 0 may be a mistake because it may
cause the CPU to always run above the target frequency which is not
desirable from the power perspective.

What can be done is to set HWP_MIN_PERF and HWP_DESIRED_PERF to the same value.

[Note that intel_cpufreq_adjust_perf() used by the schedutil governor
actually sets HWP_DESIRED_PERF in accordance with the target
frequency, but it also sets HWP_MIN_PERF to the minimum sufficient
perf value supplied by schedutil.  Since intel_cpufreq_fast_switch()
and intel_cpufreq_target() only get one target frequency, they cannot
really say if any frequency below the target will be sufficient.]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-23  4:25       ` Shashank Balaji
@ 2025-05-23 19:06         ` Rafael J. Wysocki
  2025-05-23 21:47           ` Russell Haley
  2025-05-27  8:21           ` Shashank Balaji
  0 siblings, 2 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-23 19:06 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Russell Haley, Rafael J. Wysocki, Viresh Kumar, Jonathan Corbet,
	linux-pm, linux-doc, linux-kernel, Shinya Takumi

[-- Attachment #1: Type: text/plain, Size: 5454 bytes --]

On Fri, May 23, 2025 at 6:25 AM Shashank Balaji
<shashank.mahadasyam@sony.com> wrote:
>
> Hi Russell,
>
> On Thu, May 22, 2025 at 06:15:24AM -0500, Russell Haley wrote:
> > > The userspace governor requests a frequency between policy->min and
> > > policy->max on behalf of user space.  In intel_pstate this translates
> > > to setting DESIRED_PERF to the requested value which is also the case
> > > for the other governors.
> >
> > Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
> > Minimum_Performance, and leaving desired at 0.
> >
> > > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> > userspace
> > > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
> > 1400000
> > > sudo x86_energy_perf_policy &| grep REQ
> > cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>
> Oh cool, I didn't know about x86_energy_perf_policy.
>
> Consider the following on a Raptor Lake machine:
>
> 1. HWP_REQUEST MSR set by intel_pstate in active mode:
>
>         # echo active > intel_pstate/status
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>         # echo 2000000 > cpufreq/policy0/scaling_min_freq
>         # echo 3000000 > cpufreq/policy0/scaling_max_freq
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 26 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>
>         scaling_{min,max}_freq just affect the min and max frequencies
>         set in HWP_REQEST. desired_freq is left at 0.
>
> 2. HWP_REQUEST MSR set by intel_pstate in passive mode with userspace
> governor:
>
>         # echo passive > intel_pstate/status
>         # echo userspace > cpufreq/policy0/scaling_governor
>         # cat cpufreq/policy0/scaling_setspeed
>         866151
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>         # echo 2000000 > cpufreq/policy0/scaling_setspeed
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 26 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>
>         scaling_setspeed only changes the min frequency in HWP_REQUEST.
>         Meaning, software is explicitly allowing the hardware to choose
>         higher frequencies.
>
> 3. Same as above, except with strictuserspace governor, which is a
> custom kernel module which is exactly the same as the userspace
> governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set:
>
>         # echo strictuserspace > cpufreq/policy0/scaling_governor
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>         # echo 3000000 > cpufreq/policy0/scaling_setspeed
>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>         cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>
>         With the strict flag set, intel_pstate honours this by setting
>         the min and max freq same.
>
> desired_perf is always 0 in the above cases. The strict flag check is done in
> intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy
> has strict target, and cpu->max_perf_ratio otherwise.
>
> As Russell and Rafael have noted, CPU frequency is subject to hardware
> coordination and optimizations. While I get that, shouldn't software try
> its best with whatever interface it has available? If a user sets the
> userspace governor, that's because they want to have manual control over
> CPU frequency, for whatever reason. The kernel should honor this by
> setting the min and max freq in HWP_REQUEST equal. The current behaviour
> explicitly lets the hardware choose higher frequencies.

Well, the userspace governor ends up calling the same function,
intel_cpufreq_target(), as other cpufreq governors except for
schedutil.  This function needs to work for all of them and for some
of them setting HWP_MIN_PERF to the same value as HWP_MAX_PERF would
be too strict.  HWP_DESIRED_PERF can be set to the same value as
HWP_MIN_PERF, though (please see the attached patch).

> Since Russell pointed out that the "actual freq >= target freq" can be
> achieved by leaving intel_pstate active and setting scaling_{min,max}_freq
> instead (for some reason this slipped my mind), I now think the strict target
> flag should be added to the userspace governor, leaving the documentation as
> is. Maybe a warning like "you may want to set this exact frequency, but it's
> subject to hardware coordination, so beware" can be added.

If you expect the userspace governor to set the frequency exactly
(module HW coordination), that's the only way to make it do so without
potentially affecting the other governors.

[-- Attachment #2: intel_pstate-use-desired-hwp.patch --]
[-- Type: text/x-patch, Size: 578 bytes --]

---
 drivers/cpufreq/intel_pstate.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -3249,8 +3249,8 @@
 		int max_pstate = policy->strict_target ?
 					target_pstate : cpu->max_perf_ratio;
 
-		intel_cpufreq_hwp_update(cpu, target_pstate, max_pstate, 0,
-					 fast_switch);
+		intel_cpufreq_hwp_update(cpu, target_pstate, max_pstate,
+					 target_pstate, fast_switch);
 	} else if (target_pstate != old_pstate) {
 		intel_cpufreq_perf_ctl_update(cpu, target_pstate, fast_switch);
 	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-23 19:06         ` Rafael J. Wysocki
@ 2025-05-23 21:47           ` Russell Haley
  2025-05-27  8:21           ` Shashank Balaji
  1 sibling, 0 replies; 14+ messages in thread
From: Russell Haley @ 2025-05-23 21:47 UTC (permalink / raw)
  To: Rafael J. Wysocki, Shashank Balaji
  Cc: Viresh Kumar, Jonathan Corbet, linux-pm, linux-doc, linux-kernel,
	Shinya Takumi



On 5/23/25 2:06 PM, Rafael J. Wysocki wrote:
> On Fri, May 23, 2025 at 6:25 AM Shashank Balaji
> <shashank.mahadasyam@sony.com> wrote:
>>
>> Hi Russell,
>>
>> On Thu, May 22, 2025 at 06:15:24AM -0500, Russell Haley wrote:
>>>> The userspace governor requests a frequency between policy->min and
>>>> policy->max on behalf of user space.  In intel_pstate this translates
>>>> to setting DESIRED_PERF to the requested value which is also the case
>>>> for the other governors.
>>>
>>> Huh.  On this Skylake box with kernel 6.14.6, it seems to be setting
>>> Minimum_Performance, and leaving desired at 0.
>>>
>>>> echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>>> userspace
>>>> echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed
>>> 1400000
>>>> sudo x86_energy_perf_policy &| grep REQ
>>> cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>
>> Oh cool, I didn't know about x86_energy_perf_policy.
>>
>> Consider the following on a Raptor Lake machine:
>>
>> 1. HWP_REQUEST MSR set by intel_pstate in active mode:
>>
>>         # echo active > intel_pstate/status
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>         # echo 2000000 > cpufreq/policy0/scaling_min_freq
>>         # echo 3000000 > cpufreq/policy0/scaling_max_freq
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 26 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>
>>         scaling_{min,max}_freq just affect the min and max frequencies
>>         set in HWP_REQEST. desired_freq is left at 0.
>>
>> 2. HWP_REQUEST MSR set by intel_pstate in passive mode with userspace
>> governor:
>>
>>         # echo passive > intel_pstate/status
>>         # echo userspace > cpufreq/policy0/scaling_governor
>>         # cat cpufreq/policy0/scaling_setspeed
>>         866151
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>         # echo 2000000 > cpufreq/policy0/scaling_setspeed
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 26 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>
>>         scaling_setspeed only changes the min frequency in HWP_REQUEST.
>>         Meaning, software is explicitly allowing the hardware to choose
>>         higher frequencies.
>>
>> 3. Same as above, except with strictuserspace governor, which is a
>> custom kernel module which is exactly the same as the userspace
>> governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set:
>>
>>         # echo strictuserspace > cpufreq/policy0/scaling_governor
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>         # echo 3000000 > cpufreq/policy0/scaling_setspeed
>>         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
>>         cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
>>         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
>>
>>         With the strict flag set, intel_pstate honours this by setting
>>         the min and max freq same.
>>
>> desired_perf is always 0 in the above cases. The strict flag check is done in
>> intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy
>> has strict target, and cpu->max_perf_ratio otherwise.
>>
>> As Russell and Rafael have noted, CPU frequency is subject to hardware
>> coordination and optimizations. While I get that, shouldn't software try
>> its best with whatever interface it has available? If a user sets the
>> userspace governor, that's because they want to have manual control over
>> CPU frequency, for whatever reason. The kernel should honor this by
>> setting the min and max freq in HWP_REQUEST equal. The current behaviour
>> explicitly lets the hardware choose higher frequencies.
> 
> Well, the userspace governor ends up calling the same function,
> intel_cpufreq_target(), as other cpufreq governors except for
> schedutil.  This function needs to work for all of them and for some
> of them setting HWP_MIN_PERF to the same value as HWP_MAX_PERF would
> be too strict.  HWP_DESIRED_PERF can be set to the same value as
> HWP_MIN_PERF, though (please see the attached patch).

The other governors have been around a lot longer than HWP, though, and
and are used on non-Intel hardware, which may not have a, "this
frequency or higher subject to firmware heuristics," interface.

I tried this on a non-HWP Haswell machine, and there it works like
DESIRED=MIN. Or maybe DESIRED=MAX=MIN; I don't understand when or why
hardware would choose frequencies between DESIRED and MAX (before module
coordination).

IMO, intel_cpufreq_target() being wired up to HWP_MIN_PERF is actually
*more* strange for the other governors than for userspace, because at
least with userspace governor, the userspace program is free to write to
scaling_{min,max}_freq instead of scaling_setspeed if it wants.

The conservative governor on HWP hardware, for example, will cause
strictly higher frequencies (and typically, higher energy consumption)
than HWP powersave. But on non-HWP hardware, conservative is an
efficient, slow-ramping governor.

Changing the behavior of the old-style cpufreq governors is fraught,
because the defaults are schedutil and HWP-powersave, so users of the
other governors likely made an intentional choice, presumably after
tests on a specific platform. A change would invalidate those tests.

But on the other hand, they might *already* be invalid because of an
upgrade from non-HWP hardware. In that case, changing to DES=MIN would
move closer to the tested behavior.

And then there's churn coming from other parts of the stack. For
example, until recently [1] tuned would select conservative for its
"balanced" profile and ondemand for its "powersave" profile, based on
very old data. But that didn't matter until Redhat stopped funding work
on power-profiles-daemon, and the desktop environments' power-profile
selectors got wired up to tuned in Fedora. Hector Martin fixed that,
switching both to schedutil (unless CONFIG_CPU_FREQ_GOV_SCHEDUTIL=n,
which is rare I think). That is at least not terrible on non-HWP
hardware, but given what he was working on at the time, it might not
have been tested on x86.

[1]
https://github.com/redhat-performance/tuned/commit/e24bfef651aa7f4da95727815b2cacbf571b59af

Cheers,
Russell


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-23 19:06         ` Rafael J. Wysocki
  2025-05-23 21:47           ` Russell Haley
@ 2025-05-27  8:21           ` Shashank Balaji
  2025-05-27 12:00             ` Rafael J. Wysocki
  1 sibling, 1 reply; 14+ messages in thread
From: Shashank Balaji @ 2025-05-27  8:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell Haley, Viresh Kumar, Jonathan Corbet, linux-pm, linux-doc,
	linux-kernel, Shinya Takumi

Hi Rafael,

On Fri, May 23, 2025 at 09:06:04PM +0200, Rafael J. Wysocki wrote:
> On Fri, May 23, 2025 at 6:25 AM Shashank Balaji
> <shashank.mahadasyam@sony.com> wrote:
> > ...
> > Consider the following on a Raptor Lake machine:
> > ...
> >
> > 3. Same as above, except with strictuserspace governor, which is a
> > custom kernel module which is exactly the same as the userspace
> > governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set:
> >
> >         # echo strictuserspace > cpufreq/policy0/scaling_governor
> >         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
> >         cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> >         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
> >         # echo 3000000 > cpufreq/policy0/scaling_setspeed
> >         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
> >         cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> >         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
> >
> >         With the strict flag set, intel_pstate honours this by setting
> >         the min and max freq same.
> >
> > desired_perf is always 0 in the above cases. The strict flag check is done in
> > intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy
> > has strict target, and cpu->max_perf_ratio otherwise.
> >
> > As Russell and Rafael have noted, CPU frequency is subject to hardware
> > coordination and optimizations. While I get that, shouldn't software try
> > its best with whatever interface it has available? If a user sets the
> > userspace governor, that's because they want to have manual control over
> > CPU frequency, for whatever reason. The kernel should honor this by
> > setting the min and max freq in HWP_REQUEST equal. The current behaviour
> > explicitly lets the hardware choose higher frequencies.
> 
> Well, the userspace governor ends up calling the same function,
> intel_cpufreq_target(), as other cpufreq governors except for
> schedutil.  This function needs to work for all of them and for some
> of them setting HWP_MIN_PERF to the same value as HWP_MAX_PERF would
> be too strict.  HWP_DESIRED_PERF can be set to the same value as
> HWP_MIN_PERF, though (please see the attached patch).
> 
> > Since Russell pointed out that the "actual freq >= target freq" can be
> > achieved by leaving intel_pstate active and setting scaling_{min,max}_freq
> > instead (for some reason this slipped my mind), I now think the strict target
> > flag should be added to the userspace governor, leaving the documentation as
> > is. Maybe a warning like "you may want to set this exact frequency, but it's
> > subject to hardware coordination, so beware" can be added.
> 
> If you expect the userspace governor to set the frequency exactly
> (module HW coordination), that's the only way to make it do so without
> potentially affecting the other governors.

I don't mean to say that intel_cpufreq_target() should be modified. I'm
suggesting that the CPUFREQ_GOV_STRICT_TARGET flag be added to the
userspace governor. That'll ensure that HWP_MIN_PERF and
HWP_MAX_PERF are set to the target frequency. intel_cpufreq_target()
already correctly deals with the strict target flag. To test this, I
registered a custom governor, same as the userspace governor, except
with the strict target flag set. Please see case 3 above.

If this flag is added to the userspace governor, then whatever the
documentation says right now will actually be true. No need to modify
the documentation then.

Regards,
Shashank

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed
  2025-05-27  8:21           ` Shashank Balaji
@ 2025-05-27 12:00             ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-05-27 12:00 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Rafael J. Wysocki, Russell Haley, Viresh Kumar, Jonathan Corbet,
	linux-pm, linux-doc, linux-kernel, Shinya Takumi

Hi,

On Tue, May 27, 2025 at 10:22 AM Shashank Balaji
<shashank.mahadasyam@sony.com> wrote:
>
> Hi Rafael,
>
> On Fri, May 23, 2025 at 09:06:04PM +0200, Rafael J. Wysocki wrote:
> > On Fri, May 23, 2025 at 6:25 AM Shashank Balaji
> > <shashank.mahadasyam@sony.com> wrote:
> > > ...
> > > Consider the following on a Raptor Lake machine:
> > > ...
> > >
> > > 3. Same as above, except with strictuserspace governor, which is a
> > > custom kernel module which is exactly the same as the userspace
> > > governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set:
> > >
> > >         # echo strictuserspace > cpufreq/policy0/scaling_governor
> > >         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
> > >         cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > >         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
> > >         # echo 3000000 > cpufreq/policy0/scaling_setspeed
> > >         # x86_energy_perf_policy -c 0 2>&1 | grep REQ
> > >         cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0
> > >         pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us)
> > >
> > >         With the strict flag set, intel_pstate honours this by setting
> > >         the min and max freq same.
> > >
> > > desired_perf is always 0 in the above cases. The strict flag check is done in
> > > intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy
> > > has strict target, and cpu->max_perf_ratio otherwise.
> > >
> > > As Russell and Rafael have noted, CPU frequency is subject to hardware
> > > coordination and optimizations. While I get that, shouldn't software try
> > > its best with whatever interface it has available? If a user sets the
> > > userspace governor, that's because they want to have manual control over
> > > CPU frequency, for whatever reason. The kernel should honor this by
> > > setting the min and max freq in HWP_REQUEST equal. The current behaviour
> > > explicitly lets the hardware choose higher frequencies.
> >
> > Well, the userspace governor ends up calling the same function,
> > intel_cpufreq_target(), as other cpufreq governors except for
> > schedutil.  This function needs to work for all of them and for some
> > of them setting HWP_MIN_PERF to the same value as HWP_MAX_PERF would
> > be too strict.  HWP_DESIRED_PERF can be set to the same value as
> > HWP_MIN_PERF, though (please see the attached patch).
> >
> > > Since Russell pointed out that the "actual freq >= target freq" can be
> > > achieved by leaving intel_pstate active and setting scaling_{min,max}_freq
> > > instead (for some reason this slipped my mind), I now think the strict target
> > > flag should be added to the userspace governor, leaving the documentation as
> > > is. Maybe a warning like "you may want to set this exact frequency, but it's
> > > subject to hardware coordination, so beware" can be added.
> >
> > If you expect the userspace governor to set the frequency exactly
> > (module HW coordination), that's the only way to make it do so without
> > potentially affecting the other governors.
>
> I don't mean to say that intel_cpufreq_target() should be modified. I'm
> suggesting that the CPUFREQ_GOV_STRICT_TARGET flag be added to the
> userspace governor. That'll ensure that HWP_MIN_PERF and
> HWP_MAX_PERF are set to the target frequency. intel_cpufreq_target()
> already correctly deals with the strict target flag. To test this, I
> registered a custom governor, same as the userspace governor, except
> with the strict target flag set. Please see case 3 above.
>
> If this flag is added to the userspace governor, then whatever the
> documentation says right now will actually be true. No need to modify
> the documentation then.

So please submit a patch to set CPUFREQ_GOV_STRICT_TARGET in the
userspace governor.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-27 12:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  8:05 [PATCH] cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed Shashank Balaji
2025-05-22  8:50 ` Russell Haley
2025-05-22  9:46   ` Shashank Balaji
2025-05-22  9:56     ` Rafael J. Wysocki
2025-05-22 11:15     ` Russell Haley
2025-05-22  9:47   ` Rafael J. Wysocki
2025-05-22 11:15     ` Russell Haley
2025-05-22 11:54       ` Rafael J. Wysocki
2025-05-23 18:57         ` Rafael J. Wysocki
2025-05-23  4:25       ` Shashank Balaji
2025-05-23 19:06         ` Rafael J. Wysocki
2025-05-23 21:47           ` Russell Haley
2025-05-27  8:21           ` Shashank Balaji
2025-05-27 12:00             ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).