public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
@ 2024-02-09 21:38 Doug Smythies
  2024-02-09 22:10 ` Vincent Guittot
  2024-02-14 13:42 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 15+ messages in thread
From: Doug Smythies @ 2024-02-09 21:38 UTC (permalink / raw)
  To: 'Vincent Guittot', 'Ingo Molnar',
	'Rafael J. Wysocki'
  Cc: linux-pm, linux-kernel, Doug Smythies

Hi,

I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:

# first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
sched/cpufreq: Rework schedutil governor performance estimation

There was previous bisection and suggestion of reversion,
but I guess it wasn't done in the end. [1]

The regression: reduced maximum CPU frequency is ignored.

Conditions:
CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
CPU frequency scaling governor: schedutil
HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz

I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.

Example: A 100% load on CPU 5.

sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval 15
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt CorWatt GFXWatt RAMWatt
8.42    4636    21823   67      28.40   27.56   0.00    2.59
8.40    4577    17724   66      27.57   26.73   0.00    2.59
8.35    4637    19535   66      28.65   27.81   0.00    2.60
8.41    4578    20723   66      27.73   26.89   0.00    2.59
8.40    4558    19156   67      27.39   26.55   0.00    2.58
8.34    4502    18127   67      26.79   25.96   0.00    2.57

grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu10/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu11/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu8/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu9/cpufreq/scaling_max_freq:2400000

grep . /sys/devices/system/cpu/cpu5/cpufreq/*
/sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:5
/sys/devices/system/cpu/cpu5/cpufreq/base_frequency:4100000
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:4800000
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpu5/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
power
/sys/devices/system/cpu/cpu5/cpufreq/energy_performance_preference:balance_performance
/sys/devices/system/cpu/cpu5/cpufreq/related_cpus:5
/sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4799998
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
/sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>

[1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-09 21:38 sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected Doug Smythies
@ 2024-02-09 22:10 ` Vincent Guittot
  2024-02-09 23:16   ` Doug Smythies
  2024-02-14 13:42 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-02-09 22:10 UTC (permalink / raw)
  To: Doug Smythies; +Cc: Ingo Molnar, Rafael J. Wysocki, linux-pm, linux-kernel

On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi,
>
> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>
> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> sched/cpufreq: Rework schedutil governor performance estimation
>
> There was previous bisection and suggestion of reversion,
> but I guess it wasn't done in the end. [1]

This has been fixed with
https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/

>
> The regression: reduced maximum CPU frequency is ignored.

This seems to be something new.
schedutil doesn't impact the max_freq and it's up to cpufreq driver
select the final freq which should stay within the limits

>
> Conditions:
> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
> CPU frequency scaling governor: schedutil
> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
>
> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.
>
> Example: A 100% load on CPU 5.
>
> sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval 15
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt CorWatt GFXWatt RAMWatt
> 8.42    4636    21823   67      28.40   27.56   0.00    2.59
> 8.40    4577    17724   66      27.57   26.73   0.00    2.59
> 8.35    4637    19535   66      28.65   27.81   0.00    2.60
> 8.41    4578    20723   66      27.73   26.89   0.00    2.59
> 8.40    4558    19156   67      27.39   26.55   0.00    2.58
> 8.34    4502    18127   67      26.79   25.96   0.00    2.57
>
> grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu10/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu11/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu8/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu9/cpufreq/scaling_max_freq:2400000
>
> grep . /sys/devices/system/cpu/cpu5/cpufreq/*
> /sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:5
> /sys/devices/system/cpu/cpu5/cpufreq/base_frequency:4100000
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:4800000
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:800000
> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:20000
> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
> power
> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_preference:balance_performance
> /sys/devices/system/cpu/cpu5/cpufreq/related_cpus:5
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4799998
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:800000
> /sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>
>
> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-09 22:10 ` Vincent Guittot
@ 2024-02-09 23:16   ` Doug Smythies
  2024-02-11 13:36     ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2024-02-09 23:16 UTC (permalink / raw)
  To: 'Vincent Guittot'
  Cc: 'Ingo Molnar', 'Rafael J. Wysocki', linux-pm,
	linux-kernel, Doug Smythies

Hi Vincent,
Thank you for your quick reply.

On 2024.02.09.14:11 Vincent wrote:
On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>>
>> Hi,
>>
>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>
>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>> sched/cpufreq: Rework schedutil governor performance estimation
>>
>> There was previous bisection and suggestion of reversion,
>> but I guess it wasn't done in the end. [1]
>
> This has been fixed with
> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/

Okay, thanks. I didn't find that one.

>> The regression: reduced maximum CPU frequency is ignored.

> This seems to be something new.
> schedutil doesn't impact the max_freq and it's up to cpufreq driver
> select the final freq which should stay within the limits

Okay. All I know is this is the commit that caused the regression.
I do not know why, but I do wonder if there could any relationship with
the old, never fixed, problem of incorrect stale frequencies reported
under the same operating conditions. See the V2 note:
https://lore.kernel.org/all/001d01d9d3a7$71736f50$545a4df0$@telus.net/

where I haven't been able to figure out a solution.

>> Conditions:
>> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
>> CPU frequency scaling governor: schedutil
>> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
>>
>> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.
>>
>> Example: A 100% load on CPU 5.
>>
>> sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval 15
>> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt CorWatt GFXWatt RAMWatt
>> 8.42    4636    21823   67      28.40   27.56   0.00    2.59
>> 8.40    4577    17724   66      27.57   26.73   0.00    2.59
>> 8.35    4637    19535   66      28.65   27.81   0.00    2.60
>> 8.41    4578    20723   66      27.73   26.89   0.00    2.59
>> 8.40    4558    19156   67      27.39   26.55   0.00    2.58
>> 8.34    4502    18127   67      26.79   25.96   0.00    2.57
>>
>> grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu10/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu11/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu8/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu9/cpufreq/scaling_max_freq:2400000
>>
>> grep . /sys/devices/system/cpu/cpu5/cpufreq/*
>> /sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:5
>> /sys/devices/system/cpu/cpu5/cpufreq/base_frequency:4100000
>> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:4800000
>> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:800000
>> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:20000
>> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
>> power
>> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_preference:balance_performance
>> /sys/devices/system/cpu/cpu5/cpufreq/related_cpus:5
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4799998
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:800000
>> /sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>
>>
>> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-09 23:16   ` Doug Smythies
@ 2024-02-11 13:36     ` Vincent Guittot
  2024-02-11 16:43       ` Doug Smythies
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-02-11 13:36 UTC (permalink / raw)
  To: Doug Smythies; +Cc: Ingo Molnar, Rafael J. Wysocki, linux-pm, linux-kernel

On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Vincent,
> Thank you for your quick reply.
>
> On 2024.02.09.14:11 Vincent wrote:
> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
> >>
> >> Hi,
> >>
> >> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>
> >> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >> sched/cpufreq: Rework schedutil governor performance estimation
> >>
> >> There was previous bisection and suggestion of reversion,
> >> but I guess it wasn't done in the end. [1]
> >
> > This has been fixed with
> > https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>
> Okay, thanks. I didn't find that one.
>
> >> The regression: reduced maximum CPU frequency is ignored.
>
> > This seems to be something new.
> > schedutil doesn't impact the max_freq and it's up to cpufreq driver
> > select the final freq which should stay within the limits
>
> Okay. All I know is this is the commit that caused the regression.

Could you check if the fix solved your problem ?

> I do not know why, but I do wonder if there could any relationship with
> the old, never fixed, problem of incorrect stale frequencies reported
> under the same operating conditions. See the V2 note:
> https://lore.kernel.org/all/001d01d9d3a7$71736f50$545a4df0$@telus.net/

IIUC the problem is that policy->cur is not used by intel_cpufreq and
stays set to the last old/init value.
Do I get it right that this is only informative ?

Normally cpufreq governor checks the new limits and updates current
freq if necessary except when fast switch is enabled.

>
> where I haven't been able to figure out a solution.
>
> >> Conditions:
> >> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
> >> CPU frequency scaling governor: schedutil
> >> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
> >> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> >>
> >> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.
> >>
> >> Example: A 100% load on CPU 5.
> >>
> >> sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval 15
> >> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt CorWatt GFXWatt RAMWatt
> >> 8.42    4636    21823   67      28.40   27.56   0.00    2.59
> >> 8.40    4577    17724   66      27.57   26.73   0.00    2.59
> >> 8.35    4637    19535   66      28.65   27.81   0.00    2.60
> >> 8.41    4578    20723   66      27.73   26.89   0.00    2.59
> >> 8.40    4558    19156   67      27.39   26.55   0.00    2.58
> >> 8.34    4502    18127   67      26.79   25.96   0.00    2.57
> >>
> >> grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
> >> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu10/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu11/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu6/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu7/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu8/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu9/cpufreq/scaling_max_freq:2400000
> >>
> >> grep . /sys/devices/system/cpu/cpu5/cpufreq/*
> >> /sys/devices/system/cpu/cpu5/cpufreq/affected_cpus:5
> >> /sys/devices/system/cpu/cpu5/cpufreq/base_frequency:4100000
> >> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:4800000
> >> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_min_freq:800000
> >> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_transition_latency:20000
> >> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
> >> power
> >> /sys/devices/system/cpu/cpu5/cpufreq/energy_performance_preference:balance_performance
> >> /sys/devices/system/cpu/cpu5/cpufreq/related_cpus:5
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4799998
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:schedutil
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:2400000
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq:800000
> >> /sys/devices/system/cpu/cpu5/cpufreq/scaling_setspeed:<unsupported>
> >>
> >> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-11 13:36     ` Vincent Guittot
@ 2024-02-11 16:43       ` Doug Smythies
  2024-02-13 11:27         ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2024-02-11 16:43 UTC (permalink / raw)
  To: 'Vincent Guittot'
  Cc: 'Ingo Molnar', 'Rafael J. Wysocki', linux-pm,
	linux-kernel, Doug Smythies

On 2024.02.11 05:36 Vincent wrote:
> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2024.02.09.14:11 Vincent wrote:
>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>>>>
>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>
>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>
>>>> There was previous bisection and suggestion of reversion,
>>>> but I guess it wasn't done in the end. [1]
>>>
>>> This has been fixed with
>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>>
>> Okay, thanks. I didn't find that one.
>>
>>>> The regression: reduced maximum CPU frequency is ignored.

Perhaps I should have said "sometimes ignored".
With a maximum CPU frequency for all CPUs set to 2.4 GHz and
a 100% load on CPU 5, its frequency was sampled 1000 times:
28.6% of samples were 2.4 GHz.
71.4% of samples were 4.8 GHz (the max turbo frequency)
The results are highly non-repeatable, for example another sample:
32.8% of samples were 2.4 GHz.
76.2% of samples were 4.8 GHz

Another interesting side note: If load is added to the other CPUs,
the set maximum CPU frequency is enforced.

>>
>>> This seems to be something new.
>>> schedutil doesn't impact the max_freq and it's up to cpufreq driver
>>> select the final freq which should stay within the limits
>>
>> Okay. All I know is this is the commit that caused the regression.
>
> Could you check if the fix solved your problem ?

Given the tags for that commit:

$ git tag --contains e37617c8e53a
v6.8-rc1
v6.8-rc2
v6.8-rc3

It does not solve issue I have raised herein, as it exists in v6.8-rc1 but not v6.7

>> I do not know why, but I do wonder if there could any relationship with
>> the old, never fixed, problem of incorrect stale frequencies reported
>> under the same operating conditions. See the V2 note:
>> https://lore.kernel.org/all/001d01d9d3a7$71736f50$545a4df0$@telus.net/
>
> IIUC the problem is that policy->cur is not used by intel_cpufreq and
> stays set to the last old/init value.

Yes, exactly.

> Do I get it right that this is only informative ?

I don't know, that is what I was wondering. I do not know if the two issues
are related or not.

> Normally cpufreq governor checks the new limits and updates current
> freq if necessary except when fast switch is enabled.

>> where I haven't been able to figure out a solution.

>>>> Conditions:
>>>> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
>>>> CPU frequency scaling governor: schedutil
>>>> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
>>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
>>>>
>>>> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.

Changing from HWP enabled to HWP disabled, it works properly.

...

>>>> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-11 16:43       ` Doug Smythies
@ 2024-02-13 11:27         ` Vincent Guittot
  2024-02-13 18:07           ` Doug Smythies
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-02-13 11:27 UTC (permalink / raw)
  To: Doug Smythies; +Cc: Ingo Molnar, Rafael J. Wysocki, linux-pm, linux-kernel

On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2024.02.11 05:36 Vincent wrote:
> > On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
> >> On 2024.02.09.14:11 Vincent wrote:
> >>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
> >>>>
> >>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>>>
> >>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >>>> sched/cpufreq: Rework schedutil governor performance estimation
> >>>>
> >>>> There was previous bisection and suggestion of reversion,
> >>>> but I guess it wasn't done in the end. [1]
> >>>
> >>> This has been fixed with
> >>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
> >>
> >> Okay, thanks. I didn't find that one.
> >>
> >>>> The regression: reduced maximum CPU frequency is ignored.
>
> Perhaps I should have said "sometimes ignored".
> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
> a 100% load on CPU 5, its frequency was sampled 1000 times:
> 28.6% of samples were 2.4 GHz.
> 71.4% of samples were 4.8 GHz (the max turbo frequency)
> The results are highly non-repeatable, for example another sample:
> 32.8% of samples were 2.4 GHz.
> 76.2% of samples were 4.8 GHz
>
> Another interesting side note: If load is added to the other CPUs,
> the set maximum CPU frequency is enforced.

Could you trace cpufreq and pstate ? I'd like to understand how
policy->cur can be changed
whereas there is this comment in intel_pstate_set_policy():
        /*
         * policy->cur is never updated with the intel_pstate driver, but it
         * is used as a stale frequency value. So, keep it within limits.
         */

but cpufreq_driver_fast_switch() updates it with the freq returned by
intel_cpufreq_fast_switch()

>
> >>
> >>> This seems to be something new.
> >>> schedutil doesn't impact the max_freq and it's up to cpufreq driver
> >>> select the final freq which should stay within the limits
> >>
> >> Okay. All I know is this is the commit that caused the regression.
> >
> > Could you check if the fix solved your problem ?
>
> Given the tags for that commit:
>
> $ git tag --contains e37617c8e53a
> v6.8-rc1
> v6.8-rc2
> v6.8-rc3
>
> It does not solve issue I have raised herein, as it exists in v6.8-rc1 but not v6.7
>
> >> I do not know why, but I do wonder if there could any relationship with
> >> the old, never fixed, problem of incorrect stale frequencies reported
> >> under the same operating conditions. See the V2 note:
> >> https://lore.kernel.org/all/001d01d9d3a7$71736f50$545a4df0$@telus.net/
> >
> > IIUC the problem is that policy->cur is not used by intel_cpufreq and
> > stays set to the last old/init value.
>
> Yes, exactly.
>
> > Do I get it right that this is only informative ?
>
> I don't know, that is what I was wondering. I do not know if the two issues
> are related or not.
>
> > Normally cpufreq governor checks the new limits and updates current
> > freq if necessary except when fast switch is enabled.
>
> >> where I haven't been able to figure out a solution.
>
> >>>> Conditions:
> >>>> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
> >>>> CPU frequency scaling governor: schedutil
> >>>> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
> >>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> >>>>
> >>>> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.
>
> Changing from HWP enabled to HWP disabled, it works properly.
>
> ...
>
> >>>> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@mail.gmail.com/
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-13 11:27         ` Vincent Guittot
@ 2024-02-13 18:07           ` Doug Smythies
  2024-02-14 15:37             ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2024-02-13 18:07 UTC (permalink / raw)
  To: 'Vincent Guittot'
  Cc: 'Ingo Molnar', 'Rafael J. Wysocki', linux-pm,
	linux-kernel, Doug Smythies

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

On 2024.02.13 03:27 Vincent wrote:
> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2024.02.11 05:36 Vincent wrote:
>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2024.02.09.14:11 Vincent wrote:
>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>
>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>>>
>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>>>
>>>>>> There was previous bisection and suggestion of reversion,
>>>>>> but I guess it wasn't done in the end. [1]
>>>>>
>>>>> This has been fixed with
>>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>>>>
>>>> Okay, thanks. I didn't find that one.
>>>>
>>>>>> The regression: reduced maximum CPU frequency is ignored.
>>
>> Perhaps I should have said "sometimes ignored".
>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
>> a 100% load on CPU 5, its frequency was sampled 1000 times:
>> 28.6% of samples were 2.4 GHz.
>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
>> The results are highly non-repeatable, for example another sample:
>> 32.8% of samples were 2.4 GHz.
>> 76.2% of samples were 4.8 GHz
>>
>> Another interesting side note: If load is added to the other CPUs,
>> the set maximum CPU frequency is enforced.
>
> Could you trace cpufreq and pstate ? I'd like to understand how
> policy->cur can be changed
> whereas there is this comment in intel_pstate_set_policy():
>        /*
>         * policy->cur is never updated with the intel_pstate driver, but it
>         * is used as a stale frequency value. So, keep it within limits.
>         */
>
> but cpufreq_driver_fast_switch() updates it with the freq returned by
> intel_cpufreq_fast_switch()

Perhaps I should submit a patch clarifying that comment.
It is true for the "intel_pstate" CPU frequency scaling driver but not for the
"intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
driver in passive mode. Sorry for any confusion.

I ran the intel_pstate_tracer.py during the test and do observe many, but
not all, CPUs requesting pstate 48 when the max is set to 24.
The calling request seems to always be via "fast_switch" path.
The root issue here appears to be a limit clamping problem for that path.
I'll try to attach a couple of graphs and screen shots from the tracer data.

I do not know how to trace cpufreq at the same time.

... Doug


[-- Attachment #2: all_cpu_pstates.png --]
[-- Type: image/png, Size: 23829 bytes --]

[-- Attachment #3: cpu5-example.png --]
[-- Type: image/png, Size: 66432 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-09 21:38 sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected Doug Smythies
  2024-02-09 22:10 ` Vincent Guittot
@ 2024-02-14 13:42 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 0 replies; 15+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2024-02-14 13:42 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: linux-pm, linux-kernel

On 09.02.24 22:38, Doug Smythies wrote:
> 
> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> 
> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> sched/cpufreq: Rework schedutil governor performance estimation
> 
> There was previous bisection and suggestion of reversion,
> but I guess it wasn't done in the end. [1]
> 
> The regression: reduced maximum CPU frequency is ignored.
> 

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d
#regzbot title sched/cpufreq: reduced maximum CPU frequency is ignored.
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-13 18:07           ` Doug Smythies
@ 2024-02-14 15:37             ` Vincent Guittot
  2024-02-15 22:53               ` Doug Smythies
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-02-14 15:37 UTC (permalink / raw)
  To: Doug Smythies; +Cc: Ingo Molnar, Rafael J. Wysocki, linux-pm, linux-kernel

On Tue, 13 Feb 2024 at 19:07, Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2024.02.13 03:27 Vincent wrote:
> > On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
> >> On 2024.02.11 05:36 Vincent wrote:
> >>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
> >>>> On 2024.02.09.14:11 Vincent wrote:
> >>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
> >>>>>>
> >>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>>>>>
> >>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >>>>>> sched/cpufreq: Rework schedutil governor performance estimation
> >>>>>>
> >>>>>> There was previous bisection and suggestion of reversion,
> >>>>>> but I guess it wasn't done in the end. [1]
> >>>>>
> >>>>> This has been fixed with
> >>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
> >>>>
> >>>> Okay, thanks. I didn't find that one.
> >>>>
> >>>>>> The regression: reduced maximum CPU frequency is ignored.
> >>
> >> Perhaps I should have said "sometimes ignored".
> >> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
> >> a 100% load on CPU 5, its frequency was sampled 1000 times:
> >> 28.6% of samples were 2.4 GHz.
> >> 71.4% of samples were 4.8 GHz (the max turbo frequency)
> >> The results are highly non-repeatable, for example another sample:
> >> 32.8% of samples were 2.4 GHz.
> >> 76.2% of samples were 4.8 GHz
> >>
> >> Another interesting side note: If load is added to the other CPUs,
> >> the set maximum CPU frequency is enforced.
> >
> > Could you trace cpufreq and pstate ? I'd like to understand how
> > policy->cur can be changed
> > whereas there is this comment in intel_pstate_set_policy():
> >        /*
> >         * policy->cur is never updated with the intel_pstate driver, but it
> >         * is used as a stale frequency value. So, keep it within limits.
> >         */
> >
> > but cpufreq_driver_fast_switch() updates it with the freq returned by
> > intel_cpufreq_fast_switch()
>
> Perhaps I should submit a patch clarifying that comment.
> It is true for the "intel_pstate" CPU frequency scaling driver but not for the
> "intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
> driver in passive mode. Sorry for any confusion.
>
> I ran the intel_pstate_tracer.py during the test and do observe many, but
> not all, CPUs requesting pstate 48 when the max is set to 24.
> The calling request seems to always be via "fast_switch" path.
> The root issue here appears to be a limit clamping problem for that path.

Yes, I came to a similar conclusion as well. Whatever does schedutil
ask for, it should be clamped by  cpu->max|min_perf_ratio.

Do you know if you use fast_switch or adjust_perf call back ?

> I'll try to attach a couple of graphs and screen shots from the tracer data.
>
> I do not know how to trace cpufreq at the same time.

I was thinking of enabling cpufreq traces in ftrace in addition to
pstate ones that intel_pstate_tracer.py is enabling

Vincent
>
> ... Doug
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-14 15:37             ` Vincent Guittot
@ 2024-02-15 22:53               ` Doug Smythies
  2024-02-16 13:17                 ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2024-02-15 22:53 UTC (permalink / raw)
  To: 'Vincent Guittot', 'Rafael J. Wysocki',
	'Srinivas Pandruvada'
  Cc: 'Ingo Molnar', linux-pm, linux-kernel, Doug Smythies

Hi Vincent,

This email thread appears as if it might be moving away from a regression
caused by your commit towards a conclusion that your commit exposed
a pre-existing bug in the intel_psate.c code.

Therefore, I have moved Rafael from the C.C. line to the "to" line and
added Srinivas.

On 2024.02.14 07:38 Vincent wrote:
> On Tue, 13 Feb 2024 at 19:07, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2024.02.13 03:27 Vincent wrote:
>>> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2024.02.11 05:36 Vincent wrote:
>>>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>> On 2024.02.09.14:11 Vincent wrote:
>>>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>>>
>>>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>>>>>
>>>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>>>>>
>>>>>>>> There was previous bisection and suggestion of reversion,
>>>>>>>> but I guess it wasn't done in the end. [1]
>>>>>>>
>>>>>>> This has been fixed with
>>>>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>>>>>>
>>>>>> Okay, thanks. I didn't find that one.
>>>>>>
>>>>>>>> The regression: reduced maximum CPU frequency is ignored.
>>>>
>>>> Perhaps I should have said "sometimes ignored".
>>>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
>>>> a 100% load on CPU 5, its frequency was sampled 1000 times:
>>>> 28.6% of samples were 2.4 GHz.
>>>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
>>>> The results are highly non-repeatable, for example another sample:
>>>> 32.8% of samples were 2.4 GHz.
>>>> 76.2% of samples were 4.8 GHz
>>>>
>>>> Another interesting side note: If load is added to the other CPUs,
>>>> the set maximum CPU frequency is enforced.
>>>
>>> Could you trace cpufreq and pstate ? I'd like to understand how
>>> policy->cur can be changed
>>> whereas there is this comment in intel_pstate_set_policy():
>>>        /*
>>>         * policy->cur is never updated with the intel_pstate driver, but it
>>>         * is used as a stale frequency value. So, keep it within limits.
>>>         */
>>>
>>> but cpufreq_driver_fast_switch() updates it with the freq returned by
>>> intel_cpufreq_fast_switch()
>>
>> Perhaps I should submit a patch clarifying that comment.
>> It is true for the "intel_pstate" CPU frequency scaling driver but not for the
>> "intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
>> driver in passive mode. Sorry for any confusion.
>>
>> I ran the intel_pstate_tracer.py during the test and do observe many, but
>> not all, CPUs requesting pstate 48 when the max is set to 24.
>> The calling request seems to always be via "fast_switch" path.
>> The root issue here appears to be a limit clamping problem for that path.
>
> Yes, I came to a similar conclusion as well. Whatever does schedutil
> ask for, it should be clamped by  cpu->max|min_perf_ratio.

Agreed. And it is not clamping properly under specific conditions.

> Do you know if you use fast_switch or adjust_perf call back ?

I am not certain, but I think it uses "adjust_perf" call back.
I do know for certain that it never takes the
"intel_cpufreq_update_pstate" path
and always takes the
"intel_cpu_freq_adjust_perf" path.

The problem seems to occur when that function is called with:
min_perf = 1024
target_perf = 1024
capacity = 1024

Even though cpu->max_perf_ratio is 24, the related HWP MSR,
0x774: IA32_HWP_REQUEST, ends up as 48, 48, 48 for min, max, des.

This patch appears to fix the issue (still has my debug code and
includes a question):

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ca94e60e705a..8f88a04a494b 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2987,12 +2987,22 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum,
        if (min_pstate < cpu->min_perf_ratio)
                min_pstate = cpu->min_perf_ratio;

+//     if (min_pstate > cpu->pstate.max_pstate)   /* needed? I don't know */
+//             min_pstate = cpu->pstate.max_pstate;
+
+       if (min_pstate > cpu->max_perf_ratio)
+               min_pstate = cpu->max_perf_ratio;
+
        max_pstate = min(cap_pstate, cpu->max_perf_ratio);
        if (max_pstate < min_pstate)
                max_pstate = min_pstate;

        target_pstate = clamp_t(int, target_pstate, min_pstate, max_pstate);

+       if((max_pstate > 40) || (max_pstate < 7) || (min_pstate < 7) || min_pstate > 40 || target_pstate > 40){
+               pr_debug("Doug: t: %d : min %d : max %d : minp %d : maxp %d : mnperf %lu : tgperf %lu : capacity %lu\n", target_pstate, min_pstate, max_pstate, cpu->min_perf_ratio, cpu->max_perf_ratio, min_perf, target_perf, capacity);
+       }
+
        intel_cpufreq_hwp_update(cpu, min_pstate, max_pstate, target_pstate, true);

        cpu->pstate.current_pstate = target_pstate;

With the patch, I never hit the debug statement if the max CPU frequency is limited to 2.4 GHz,
whereas it used to get triggered often.
More importantly, the system seems to now behave properly and obey set CPU frequency limits.



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-15 22:53               ` Doug Smythies
@ 2024-02-16 13:17                 ` Vincent Guittot
  2024-02-24 13:43                   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-02-16 13:17 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Ingo Molnar, linux-pm,
	linux-kernel

Hi Doug,

On Thu, 15 Feb 2024 at 23:53, Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Vincent,
>
> This email thread appears as if it might be moving away from a regression
> caused by your commit towards a conclusion that your commit exposed
> a pre-existing bug in the intel_psate.c code.

Ok

>
> Therefore, I have moved Rafael from the C.C. line to the "to" line and
> added Srinivas.
>
> On 2024.02.14 07:38 Vincent wrote:
> > On Tue, 13 Feb 2024 at 19:07, Doug Smythies <dsmythies@telus.net> wrote:
> >> On 2024.02.13 03:27 Vincent wrote:
> >>> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
> >>>> On 2024.02.11 05:36 Vincent wrote:
> >>>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
> >>>>>> On 2024.02.09.14:11 Vincent wrote:
> >>>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
> >>>>>>>>
> >>>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>>>>>>>
> >>>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >>>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
> >>>>>>>>
> >>>>>>>> There was previous bisection and suggestion of reversion,
> >>>>>>>> but I guess it wasn't done in the end. [1]
> >>>>>>>
> >>>>>>> This has been fixed with
> >>>>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
> >>>>>>
> >>>>>> Okay, thanks. I didn't find that one.
> >>>>>>
> >>>>>>>> The regression: reduced maximum CPU frequency is ignored.
> >>>>
> >>>> Perhaps I should have said "sometimes ignored".
> >>>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
> >>>> a 100% load on CPU 5, its frequency was sampled 1000 times:
> >>>> 28.6% of samples were 2.4 GHz.
> >>>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
> >>>> The results are highly non-repeatable, for example another sample:
> >>>> 32.8% of samples were 2.4 GHz.
> >>>> 76.2% of samples were 4.8 GHz
> >>>>
> >>>> Another interesting side note: If load is added to the other CPUs,
> >>>> the set maximum CPU frequency is enforced.
> >>>
> >>> Could you trace cpufreq and pstate ? I'd like to understand how
> >>> policy->cur can be changed
> >>> whereas there is this comment in intel_pstate_set_policy():
> >>>        /*
> >>>         * policy->cur is never updated with the intel_pstate driver, but it
> >>>         * is used as a stale frequency value. So, keep it within limits.
> >>>         */
> >>>
> >>> but cpufreq_driver_fast_switch() updates it with the freq returned by
> >>> intel_cpufreq_fast_switch()
> >>
> >> Perhaps I should submit a patch clarifying that comment.
> >> It is true for the "intel_pstate" CPU frequency scaling driver but not for the
> >> "intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
> >> driver in passive mode. Sorry for any confusion.
> >>
> >> I ran the intel_pstate_tracer.py during the test and do observe many, but
> >> not all, CPUs requesting pstate 48 when the max is set to 24.
> >> The calling request seems to always be via "fast_switch" path.
> >> The root issue here appears to be a limit clamping problem for that path.
> >
> > Yes, I came to a similar conclusion as well. Whatever does schedutil
> > ask for, it should be clamped by  cpu->max|min_perf_ratio.
>
> Agreed. And it is not clamping properly under specific conditions.
>
> > Do you know if you use fast_switch or adjust_perf call back ?
>
> I am not certain, but I think it uses "adjust_perf" call back.
> I do know for certain that it never takes the
> "intel_cpufreq_update_pstate" path
> and always takes the
> "intel_cpu_freq_adjust_perf" path.

intel_cpu_freq_adjust_perf is registered as the callback for
cpufreq->adjust_perf

>
> The problem seems to occur when that function is called with:
> min_perf = 1024
> target_perf = 1024
> capacity = 1024
>
> Even though cpu->max_perf_ratio is 24, the related HWP MSR,
> 0x774: IA32_HWP_REQUEST, ends up as 48, 48, 48 for min, max, des.
>
> This patch appears to fix the issue (still has my debug code and
> includes a question):
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index ca94e60e705a..8f88a04a494b 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2987,12 +2987,22 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum,
>         if (min_pstate < cpu->min_perf_ratio)
>                 min_pstate = cpu->min_perf_ratio;
>
> +//     if (min_pstate > cpu->pstate.max_pstate)   /* needed? I don't know */
> +//             min_pstate = cpu->pstate.max_pstate;
> +
> +       if (min_pstate > cpu->max_perf_ratio)
> +               min_pstate = cpu->max_perf_ratio;
> +
>         max_pstate = min(cap_pstate, cpu->max_perf_ratio);
>         if (max_pstate < min_pstate)
>                 max_pstate = min_pstate;
>
>         target_pstate = clamp_t(int, target_pstate, min_pstate, max_pstate);
>
> +       if((max_pstate > 40) || (max_pstate < 7) || (min_pstate < 7) || min_pstate > 40 || target_pstate > 40){
> +               pr_debug("Doug: t: %d : min %d : max %d : minp %d : maxp %d : mnperf %lu : tgperf %lu : capacity %lu\n", target_pstate, min_pstate, max_pstate, cpu->min_perf_ratio, cpu->max_perf_ratio, min_perf, target_perf, capacity);
> +       }
> +
>         intel_cpufreq_hwp_update(cpu, min_pstate, max_pstate, target_pstate, true);
>
>         cpu->pstate.current_pstate = target_pstate;
>
> With the patch, I never hit the debug statement if the max CPU frequency is limited to 2.4 GHz,
> whereas it used to get triggered often.
> More importantly, the system seems to now behave properly and obey set CPU frequency limits.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-16 13:17                 ` Vincent Guittot
@ 2024-02-24 13:43                   ` Linux regression tracking (Thorsten Leemhuis)
  2024-02-24 14:11                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-02-24 13:43 UTC (permalink / raw)
  To: Vincent Guittot, Doug Smythies, Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Ingo Molnar, linux-pm, linux-kernel,
	Linux kernel regressions list

On 16.02.24 14:17, Vincent Guittot wrote:
> On Thu, 15 Feb 2024 at 23:53, Doug Smythies <dsmythies@telus.net> wrote:
>>
>> This email thread appears as if it might be moving away from a regression
>> caused by your commit towards a conclusion that your commit exposed
>> a pre-existing bug in the intel_psate.c code.
> Ok

Well, even in that case it's a regression that must be fixed -- ideally
before 6.8. Did anything happen towards that?

I noticed that Doug send the fix "cpufreq: intel_pstate: fix pstate
limits enforcement for adjust_perf call back":
https://lore.kernel.org/all/20240217213010.2466-1-dsmythies@telus.net/

Is that supposed to fix the problem? Looks a bit like it, but I'm not
totally sure. In that case I'd say it likely should be applied to 6.8,
but Rafael apparently applied it to 6.9.

I'd also say that a Fixes: would be good as well (to ensure that fix is
also backported in case anyone backports 9c0b4bb7f630), but I know that
subsystems handle this differently.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> Therefore, I have moved Rafael from the C.C. line to the "to" line and
>> added Srinivas.
>>
>> On 2024.02.14 07:38 Vincent wrote:
>>> On Tue, 13 Feb 2024 at 19:07, Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2024.02.13 03:27 Vincent wrote:
>>>>> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>> On 2024.02.11 05:36 Vincent wrote:
>>>>>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>>> On 2024.02.09.14:11 Vincent wrote:
>>>>>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>>>>>
>>>>>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>>>>>>>
>>>>>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>>>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>>>>>>>
>>>>>>>>>> There was previous bisection and suggestion of reversion,
>>>>>>>>>> but I guess it wasn't done in the end. [1]
>>>>>>>>>
>>>>>>>>> This has been fixed with
>>>>>>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>>>>>>>>
>>>>>>>> Okay, thanks. I didn't find that one.
>>>>>>>>
>>>>>>>>>> The regression: reduced maximum CPU frequency is ignored.
>>>>>>
>>>>>> Perhaps I should have said "sometimes ignored".
>>>>>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
>>>>>> a 100% load on CPU 5, its frequency was sampled 1000 times:
>>>>>> 28.6% of samples were 2.4 GHz.
>>>>>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
>>>>>> The results are highly non-repeatable, for example another sample:
>>>>>> 32.8% of samples were 2.4 GHz.
>>>>>> 76.2% of samples were 4.8 GHz
>>>>>>
>>>>>> Another interesting side note: If load is added to the other CPUs,
>>>>>> the set maximum CPU frequency is enforced.
>>>>>
>>>>> Could you trace cpufreq and pstate ? I'd like to understand how
>>>>> policy->cur can be changed
>>>>> whereas there is this comment in intel_pstate_set_policy():
>>>>>        /*
>>>>>         * policy->cur is never updated with the intel_pstate driver, but it
>>>>>         * is used as a stale frequency value. So, keep it within limits.
>>>>>         */
>>>>>
>>>>> but cpufreq_driver_fast_switch() updates it with the freq returned by
>>>>> intel_cpufreq_fast_switch()
>>>>
>>>> Perhaps I should submit a patch clarifying that comment.
>>>> It is true for the "intel_pstate" CPU frequency scaling driver but not for the
>>>> "intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
>>>> driver in passive mode. Sorry for any confusion.
>>>>
>>>> I ran the intel_pstate_tracer.py during the test and do observe many, but
>>>> not all, CPUs requesting pstate 48 when the max is set to 24.
>>>> The calling request seems to always be via "fast_switch" path.
>>>> The root issue here appears to be a limit clamping problem for that path.
>>>
>>> Yes, I came to a similar conclusion as well. Whatever does schedutil
>>> ask for, it should be clamped by  cpu->max|min_perf_ratio.
>>
>> Agreed. And it is not clamping properly under specific conditions.
>>
>>> Do you know if you use fast_switch or adjust_perf call back ?
>>
>> I am not certain, but I think it uses "adjust_perf" call back.
>> I do know for certain that it never takes the
>> "intel_cpufreq_update_pstate" path
>> and always takes the
>> "intel_cpu_freq_adjust_perf" path.
> 
> intel_cpu_freq_adjust_perf is registered as the callback for
> cpufreq->adjust_perf
> 
>>
>> The problem seems to occur when that function is called with:
>> min_perf = 1024
>> target_perf = 1024
>> capacity = 1024
>>
>> Even though cpu->max_perf_ratio is 24, the related HWP MSR,
>> 0x774: IA32_HWP_REQUEST, ends up as 48, 48, 48 for min, max, des.
>>
>> This patch appears to fix the issue (still has my debug code and
>> includes a question):
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index ca94e60e705a..8f88a04a494b 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -2987,12 +2987,22 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum,
>>         if (min_pstate < cpu->min_perf_ratio)
>>                 min_pstate = cpu->min_perf_ratio;
>>
>> +//     if (min_pstate > cpu->pstate.max_pstate)   /* needed? I don't know */
>> +//             min_pstate = cpu->pstate.max_pstate;
>> +
>> +       if (min_pstate > cpu->max_perf_ratio)
>> +               min_pstate = cpu->max_perf_ratio;
>> +
>>         max_pstate = min(cap_pstate, cpu->max_perf_ratio);
>>         if (max_pstate < min_pstate)
>>                 max_pstate = min_pstate;
>>
>>         target_pstate = clamp_t(int, target_pstate, min_pstate, max_pstate);
>>
>> +       if((max_pstate > 40) || (max_pstate < 7) || (min_pstate < 7) || min_pstate > 40 || target_pstate > 40){
>> +               pr_debug("Doug: t: %d : min %d : max %d : minp %d : maxp %d : mnperf %lu : tgperf %lu : capacity %lu\n", target_pstate, min_pstate, max_pstate, cpu->min_perf_ratio, cpu->max_perf_ratio, min_perf, target_perf, capacity);
>> +       }
>> +
>>         intel_cpufreq_hwp_update(cpu, min_pstate, max_pstate, target_pstate, true);
>>
>>         cpu->pstate.current_pstate = target_pstate;
>>
>> With the patch, I never hit the debug statement if the max CPU frequency is limited to 2.4 GHz,
>> whereas it used to get triggered often.
>> More importantly, the system seems to now behave properly and obey set CPU frequency limits.
>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-24 13:43                   ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-02-24 14:11                     ` Rafael J. Wysocki
  2024-02-24 14:31                       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2024-02-24 14:11 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Vincent Guittot, Doug Smythies, Rafael J. Wysocki,
	Srinivas Pandruvada, Ingo Molnar, linux-pm, linux-kernel

On Sat, Feb 24, 2024 at 2:44 PM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> On 16.02.24 14:17, Vincent Guittot wrote:
> > On Thu, 15 Feb 2024 at 23:53, Doug Smythies <dsmythies@telus.net> wrote:
> >>
> >> This email thread appears as if it might be moving away from a regression
> >> caused by your commit towards a conclusion that your commit exposed
> >> a pre-existing bug in the intel_psate.c code.
> > Ok
>
> Well, even in that case it's a regression that must be fixed -- ideally
> before 6.8. Did anything happen towards that?
>
> I noticed that Doug send the fix "cpufreq: intel_pstate: fix pstate
> limits enforcement for adjust_perf call back":
> https://lore.kernel.org/all/20240217213010.2466-1-dsmythies@telus.net/
>
> Is that supposed to fix the problem? Looks a bit like it, but I'm not
> totally sure. In that case I'd say it likely should be applied to 6.8,
> but Rafael apparently applied it to 6.9.

This hasn't reached linux-next yet, so I rebased it on top of -rc5 in
order to push it as a 6.8 fix.

> I'd also say that a Fixes: would be good as well (to ensure that fix is
> also backported in case anyone backports 9c0b4bb7f630), but I know that
> subsystems handle this differently.

So I added a Fixes: tag to it, but it points to the original change
that missed the check.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-24 14:11                     ` Rafael J. Wysocki
@ 2024-02-24 14:31                       ` Linux regression tracking (Thorsten Leemhuis)
  2024-02-24 14:57                         ` Doug Smythies
  0 siblings, 1 reply; 15+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-02-24 14:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux regressions mailing list
  Cc: Vincent Guittot, Doug Smythies, Srinivas Pandruvada, Ingo Molnar,
	linux-pm, linux-kernel

On 24.02.24 15:11, Rafael J. Wysocki wrote:
> On Sat, Feb 24, 2024 at 2:44 PM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> On 16.02.24 14:17, Vincent Guittot wrote:
>>> On Thu, 15 Feb 2024 at 23:53, Doug Smythies <dsmythies@telus.net> wrote:
>>>>
>>>> This email thread appears as if it might be moving away from a regression
>>>> caused by your commit towards a conclusion that your commit exposed
>>>> a pre-existing bug in the intel_psate.c code.
>>> Ok
>>
>> Well, even in that case it's a regression that must be fixed -- ideally
>> before 6.8. Did anything happen towards that?
>>
>> I noticed that Doug send the fix "cpufreq: intel_pstate: fix pstate
>> limits enforcement for adjust_perf call back":
>> https://lore.kernel.org/all/20240217213010.2466-1-dsmythies@telus.net/
>>
>> Is that supposed to fix the problem? Looks a bit like it, but I'm not
>> totally sure. In that case I'd say it likely should be applied to 6.8,
>> but Rafael apparently applied it to 6.9.
> 
> This hasn't reached linux-next yet, so I rebased it on top of -rc5 in
> order to push it as a 6.8 fix.

Ahh, great, many thx!

>> I'd also say that a Fixes: would be good as well (to ensure that fix is
>> also backported in case anyone backports 9c0b4bb7f630), but I know that
>> subsystems handle this differently.
> 
> So I added a Fixes: tag to it, but it points to the original change
> that missed the check.

Yeah, that totally works for me as well. Again: many thx!

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected
  2024-02-24 14:31                       ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-02-24 14:57                         ` Doug Smythies
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Smythies @ 2024-02-24 14:57 UTC (permalink / raw)
  To: 'Linux regressions mailing list',
	'Rafael J. Wysocki'
  Cc: 'Vincent Guittot', 'Srinivas Pandruvada',
	'Ingo Molnar', linux-pm, linux-kernel, Doug Smythies

On 2024.02.24 06:31 Thorsten wrote:
> On 24.02.24 15:11, Rafael J. Wysocki wrote:
>> On Sat, Feb 24, 2024 at 2:44 PM Linux regression tracking (Thorsten
>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>> On 16.02.24 14:17, Vincent Guittot wrote:
>>>> On Thu, 15 Feb 2024 at 23:53, Doug Smythies <dsmythies@telus.net> wrote:
>>>>>
>>>>> This email thread appears as if it might be moving away from a regression
>>>>> caused by your commit towards a conclusion that your commit exposed
>>>>> a pre-existing bug in the intel_psate.c code.
>>>> Ok
>>>
>>> Well, even in that case it's a regression that must be fixed -- ideally
>>> before 6.8. Did anything happen towards that?
>>>
>>> I noticed that Doug send the fix "cpufreq: intel_pstate: fix pstate
>>> limits enforcement for adjust_perf call back":
>>> https://lore.kernel.org/all/20240217213010.2466-1-dsmythies@telus.net/
>>>
>>> Is that supposed to fix the problem? 

Yes it fixes the preexisting issue exposed by 9c0b4bb7f630.

>>>Looks a bit like it, but I'm not
>>> totally sure. In that case I'd say it likely should be applied to 6.8,
>>> but Rafael apparently applied it to 6.9.
>> 
>> This hasn't reached linux-next yet, so I rebased it on top of -rc5 in
>> order to push it as a 6.8 fix.
>
> Ahh, great, many thx!

Yes, thanks.

>>> I'd also say that a Fixes: would be good as well (to ensure that fix is
>>> also backported in case anyone backports 9c0b4bb7f630), but I know that
>>> subsystems handle this differently.

I left the fixes tag off on purpose, because there was never anything wrong
with 9c0b4bb7f630. Apologies to Vincent for wasting his time, but thanks
for the help finding the actual issue.
 
>> So I added a Fixes: tag to it, but it points to the original change
>> that missed the check.
> Yeah, that totally works for me as well. Again: many thx!

Yes, thanks. A fix tag pointing to the original commit makes sense.

... Doug



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-02-24 14:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-09 21:38 sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected Doug Smythies
2024-02-09 22:10 ` Vincent Guittot
2024-02-09 23:16   ` Doug Smythies
2024-02-11 13:36     ` Vincent Guittot
2024-02-11 16:43       ` Doug Smythies
2024-02-13 11:27         ` Vincent Guittot
2024-02-13 18:07           ` Doug Smythies
2024-02-14 15:37             ` Vincent Guittot
2024-02-15 22:53               ` Doug Smythies
2024-02-16 13:17                 ` Vincent Guittot
2024-02-24 13:43                   ` Linux regression tracking (Thorsten Leemhuis)
2024-02-24 14:11                     ` Rafael J. Wysocki
2024-02-24 14:31                       ` Linux regression tracking (Thorsten Leemhuis)
2024-02-24 14:57                         ` Doug Smythies
2024-02-14 13:42 ` Linux regression tracking #adding (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox