Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
@ 2025-12-03 16:18 Harshvardhan Jha
  2025-12-03 16:44 ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2025-12-03 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Daniel Lezcano
  Cc: Sasha Levin, Christian Loehle, Greg Kroah-Hartman,
	linux-pm@vger.kernel.org, stable@vger.kernel.org

Hi there,

While running performance benchmarks for the 5.15.196 LTS tags , it was
observed that several regressions across different benchmarks is being
introduced when compared to the previous 5.15.193 kernel tag. Running an
automated bisect on both of them narrowed down the culprit commit to:
- 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
information" for 5.15

Regressions on 5.15.196 include:
-9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
-6.3% : Phoronix system/sqlite on OnPrem X6-2
-18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
thread & 1M buffer size on OnPrem X6-2
-4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
1 thread & 1M buffer size on OnPrem X6-2
Up to -30% : Some Netpipe metrics on OnPrem X5-2

The culprit commits' messages mention that these reverts were done due
to performance regressions introduced in Intel Jasper Lake systems but
this revert is causing issues in other systems unfortunately. I wanted
to know the maintainers' opinion on how we should proceed in order to
fix this. If we reapply it'll bring back the previous regressions on
Jasper Lake systems and if we don't revert it then it's stuck with
current regressions. If this problem has been reported before and a fix
is in the works then please let me know I shall follow developments to
that mail thread.

Thanks & Regards,
Harshvardhan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
@ 2025-12-03 16:44 ` Christian Loehle
  2025-12-03 22:30   ` Doug Smythies
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2025-12-03 16:44 UTC (permalink / raw)
  To: Harshvardhan Jha, Rafael J. Wysocki, Daniel Lezcano
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm@vger.kernel.org,
	stable@vger.kernel.org

On 12/3/25 16:18, Harshvardhan Jha wrote:
> Hi there,
> 
> While running performance benchmarks for the 5.15.196 LTS tags , it was
> observed that several regressions across different benchmarks is being
> introduced when compared to the previous 5.15.193 kernel tag. Running an
> automated bisect on both of them narrowed down the culprit commit to:
> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> information" for 5.15
> 
> Regressions on 5.15.196 include:
> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> thread & 1M buffer size on OnPrem X6-2
> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> 1 thread & 1M buffer size on OnPrem X6-2
> Up to -30% : Some Netpipe metrics on OnPrem X5-2
> 
> The culprit commits' messages mention that these reverts were done due
> to performance regressions introduced in Intel Jasper Lake systems but
> this revert is causing issues in other systems unfortunately. I wanted
> to know the maintainers' opinion on how we should proceed in order to
> fix this. If we reapply it'll bring back the previous regressions on
> Jasper Lake systems and if we don't revert it then it's stuck with
> current regressions. If this problem has been reported before and a fix
> is in the works then please let me know I shall follow developments to
> that mail thread.

The discussion regarding this can be found here:
https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
we explored an alternative to the full revert here:
https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/
unfortunately that didn't lead anywhere useful, so Rafael went with the
full revert you're seeing now.

Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
will highly depend on your platform, but also your workload, Jasper Lake
in particular seems to favor deep idle states even when they don't seem
to be a 'good' choice from a purely cpuidle (governor) perspective, so
we're kind of stuck with that.

For teo we've discussed a tunable knob in the past, which comes naturally with
the logic, for menu there's nothing obvious that would be comparable.
But for teo such a knob didn't generate any further interest (so far).

That's the status, unless I missed anything?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-03 16:44 ` Christian Loehle
@ 2025-12-03 22:30   ` Doug Smythies
  2025-12-08 11:33     ` Harshvardhan Jha
  0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2025-12-03 22:30 UTC (permalink / raw)
  To: 'Harshvardhan Jha'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, Doug Smythies, 'Christian Loehle',
	'Rafael J. Wysocki', 'Daniel Lezcano'

[-- Attachment #1: Type: text/plain, Size: 3917 bytes --]

On 2025.12.03 08:45 Christian Loehle wrote:
> On 12/3/25 16:18, Harshvardhan Jha wrote:
>> Hi there,
>> 
>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>> observed that several regressions across different benchmarks is being
>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>> automated bisect on both of them narrowed down the culprit commit to:
>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>> information" for 5.15
>> 
>> Regressions on 5.15.196 include:
>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>> thread & 1M buffer size on OnPrem X6-2
>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>> 1 thread & 1M buffer size on OnPrem X6-2
>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>> 
>> The culprit commits' messages mention that these reverts were done due
>> to performance regressions introduced in Intel Jasper Lake systems but
>> this revert is causing issues in other systems unfortunately. I wanted
>> to know the maintainers' opinion on how we should proceed in order to
>> fix this. If we reapply it'll bring back the previous regressions on
>> Jasper Lake systems and if we don't revert it then it's stuck with
>> current regressions. If this problem has been reported before and a fix
>> is in the works then please let me know I shall follow developments to
>> that mail thread.
>
> The discussion regarding this can be found here:
> https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
> we explored an alternative to the full revert here:
> https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/
> unfortunately that didn't lead anywhere useful, so Rafael went with the
> full revert you're seeing now.
>
> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> will highly depend on your platform, but also your workload, Jasper Lake
> in particular seems to favor deep idle states even when they don't seem
> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> we're kind of stuck with that.
>
> For teo we've discussed a tunable knob in the past, which comes naturally with
> the logic, for menu there's nothing obvious that would be comparable.
> But for teo such a knob didn't generate any further interest (so far).
>
> That's the status, unless I missed anything?

By reading everything in the links Chrsitian provided, you can see
that we had difficulties repeating test results on other platforms.

Of the tests listed herein, the only one that was easy to repeat on my
test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):

Kernel 6.18									Reverted			
pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
				performance		performance		performance		performance	
test	what			ave			ave			ave			ave	
1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%

Where:
T/C means: Threads / Copies
performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
Data points are in Seconds.
Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
The reversion was manually applied to kernel 6.18-rc1 for that test.
The reversion was included in kernel 6.18-rc3.
Kernel 6.18-rc4 had another code change to menu.c

In case the formatting gets messed up, the table is also attached.

Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
HWP: Enabled.
 
... Doug


[-- Attachment #2: sqlite-test-table.png --]
[-- Type: image/png, Size: 21984 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-03 22:30   ` Doug Smythies
@ 2025-12-08 11:33     ` Harshvardhan Jha
  2025-12-08 12:47       ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2025-12-08 11:33 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Christian Loehle', 'Rafael J. Wysocki',
	'Daniel Lezcano'

Hi Doug,

On 04/12/25 4:00 AM, Doug Smythies wrote:
> On 2025.12.03 08:45 Christian Loehle wrote:
>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>> Hi there,
>>>
>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>> observed that several regressions across different benchmarks is being
>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>> automated bisect on both of them narrowed down the culprit commit to:
>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>> information" for 5.15
>>>
>>> Regressions on 5.15.196 include:
>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>> thread & 1M buffer size on OnPrem X6-2
>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>> 1 thread & 1M buffer size on OnPrem X6-2
>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>
>>> The culprit commits' messages mention that these reverts were done due
>>> to performance regressions introduced in Intel Jasper Lake systems but
>>> this revert is causing issues in other systems unfortunately. I wanted
>>> to know the maintainers' opinion on how we should proceed in order to
>>> fix this. If we reapply it'll bring back the previous regressions on
>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>> current regressions. If this problem has been reported before and a fix
>>> is in the works then please let me know I shall follow developments to
>>> that mail thread.
>> The discussion regarding this can be found here:
>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ 
>> we explored an alternative to the full revert here:
>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ 
>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>> full revert you're seeing now.
>>
>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>> will highly depend on your platform, but also your workload, Jasper Lake
>> in particular seems to favor deep idle states even when they don't seem
>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>> we're kind of stuck with that.
>>
>> For teo we've discussed a tunable knob in the past, which comes naturally with
>> the logic, for menu there's nothing obvious that would be comparable.
>> But for teo such a knob didn't generate any further interest (so far).
>>
>> That's the status, unless I missed anything?
> By reading everything in the links Chrsitian provided, you can see
> that we had difficulties repeating test results on other platforms.
>
> Of the tests listed herein, the only one that was easy to repeat on my
> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>
> Kernel 6.18									Reverted			
> pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
> 				performance		performance		performance		performance	
> test	what			ave			ave			ave			ave	
> 1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
> 2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
> 3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
> 4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
> 5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%
>
> Where:
> T/C means: Threads / Copies
> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> Data points are in Seconds.
> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> The reversion was manually applied to kernel 6.18-rc1 for that test.
> The reversion was included in kernel 6.18-rc3.
> Kernel 6.18-rc4 had another code change to menu.c
>
> In case the formatting gets messed up, the table is also attached.
>
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> HWP: Enabled.

I was able to recover performance on 5.15 and 5.4 LTS based kernels
after reapplying the revert on X6-2 systems.

Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             46 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      56
  On-line CPU(s) list:       0-55
Vendor ID:                   GenuineIntel
  Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
    CPU family:              6
    Model:                   79
    Thread(s) per core:      2
    Core(s) per socket:      14
    Socket(s):               2
    Stepping:                1
    CPU(s) scaling MHz:      98%
    CPU max MHz:             2600.0000
    CPU min MHz:             1200.0000
    BogoMIPS:                5188.26
    Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pg
                             e mca cmov pat pse36 clflush dts acpi mmx
fxsr sse 
                             sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
lm cons
                             tant_tsc arch_perfmon pebs bts rep_good
nopl xtopol
                             ogy nonstop_tsc cpuid aperfmperf pni
pclmulqdq dtes
                             64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx
                             16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe po
                             pcnt tsc_deadline_timer aes xsave avx f16c
rdrand l
                             ahf_lm abm 3dnowprefetch cpuid_fault epb
cat_l3 cdp
                             _l3 pti intel_ppin ssbd ibrs ibpb stibp
tpr_shadow 
                             flexpriority ept vpid ept_ad fsgsbase
tsc_adjust bm
                             i1 hle avx2 smep bmi2 erms invpcid rtm cqm
rdt_a rd
                             seed adx smap intel_pt xsaveopt cqm_llc
cqm_occup_l
                             lc cqm_mbm_total cqm_mbm_local dtherm arat
pln pts 
                             vnmi md_clear flush_l1d
Virtualization features:     
  Virtualization:            VT-x
Caches (sum of all):         
  L1d:                       896 KiB (28 instances)
  L1i:                       896 KiB (28 instances)
  L2:                        7 MiB (28 instances)
  L3:                        70 MiB (2 instances)
NUMA:                        
  NUMA node(s):              2
  NUMA node0 CPU(s):         0-13,28-41
  NUMA node1 CPU(s):         14-27,42-55
Vulnerabilities:             
  Gather data sampling:      Not affected
  Indirect target selection: Not affected
  Itlb multihit:             KVM: Mitigation: Split huge pages
  L1tf:                      Mitigation; PTE Inversion; VMX conditional
cache fl
                             ushes, SMT vulnerable
  Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:                  Mitigation; PTI
  Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling:    Not affected
  Retbleed:                  Not affected
  Spec rstack overflow:      Not affected
  Spec store bypass:         Mitigation; Speculative Store Bypass
disabled via p
                             rctl
  Spectre v1:                Mitigation; usercopy/swapgs barriers and
__user poi
                             nter sanitization
  Spectre v2:                Mitigation; Retpolines; IBPB conditional;
IBRS_FW; 
                             STIBP conditional; RSB filling; PBRSB-eIBRS
Not aff
                             ected; BHI Not affected
  Srbds:                     Not affected
  Tsa:                       Not affected
  Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
  Vmscape:                   Mitigation; IBPB before exit to userspace


Thanks & Regards,
Harshvardhan

>  
> ... Doug
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-08 11:33     ` Harshvardhan Jha
@ 2025-12-08 12:47       ` Christian Loehle
  2026-01-13  7:06         ` Harshvardhan Jha
  2026-01-27 15:45         ` Harshvardhan Jha
  0 siblings, 2 replies; 44+ messages in thread
From: Christian Loehle @ 2025-12-08 12:47 UTC (permalink / raw)
  To: Harshvardhan Jha, Doug Smythies
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano'

On 12/8/25 11:33, Harshvardhan Jha wrote:
> Hi Doug,
> 
> On 04/12/25 4:00 AM, Doug Smythies wrote:
>> On 2025.12.03 08:45 Christian Loehle wrote:
>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>> Hi there,
>>>>
>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>> observed that several regressions across different benchmarks is being
>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>> information" for 5.15
>>>>
>>>> Regressions on 5.15.196 include:
>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>> thread & 1M buffer size on OnPrem X6-2
>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>
>>>> The culprit commits' messages mention that these reverts were done due
>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>> to know the maintainers' opinion on how we should proceed in order to
>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>> current regressions. If this problem has been reported before and a fix
>>>> is in the works then please let me know I shall follow developments to
>>>> that mail thread.
>>> The discussion regarding this can be found here:
>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ 
>>> we explored an alternative to the full revert here:
>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ 
>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>> full revert you're seeing now.
>>>
>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>> will highly depend on your platform, but also your workload, Jasper Lake
>>> in particular seems to favor deep idle states even when they don't seem
>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>> we're kind of stuck with that.
>>>
>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>> the logic, for menu there's nothing obvious that would be comparable.
>>> But for teo such a knob didn't generate any further interest (so far).
>>>
>>> That's the status, unless I missed anything?
>> By reading everything in the links Chrsitian provided, you can see
>> that we had difficulties repeating test results on other platforms.
>>
>> Of the tests listed herein, the only one that was easy to repeat on my
>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>
>> Kernel 6.18									Reverted			
>> pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
>> 				performance		performance		performance		performance	
>> test	what			ave			ave			ave			ave	
>> 1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
>> 2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
>> 3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
>> 4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
>> 5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%
>>
>> Where:
>> T/C means: Threads / Copies
>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>> Data points are in Seconds.
>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>> The reversion was included in kernel 6.18-rc3.
>> Kernel 6.18-rc4 had another code change to menu.c
>>
>> In case the formatting gets messed up, the table is also attached.
>>
>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>> HWP: Enabled.
> 
> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> after reapplying the revert on X6-2 systems.
> 
> Architecture:                x86_64
>   CPU op-mode(s):            32-bit, 64-bit
>   Address sizes:             46 bits physical, 48 bits virtual
>   Byte Order:                Little Endian
> CPU(s):                      56
>   On-line CPU(s) list:       0-55
> Vendor ID:                   GenuineIntel
>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>     CPU family:              6
>     Model:                   79
>     Thread(s) per core:      2
>     Core(s) per socket:      14
>     Socket(s):               2
>     Stepping:                1
>     CPU(s) scaling MHz:      98%
>     CPU max MHz:             2600.0000
>     CPU min MHz:             1200.0000
>     BogoMIPS:                5188.26
>     Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pg
>                              e mca cmov pat pse36 clflush dts acpi mmx
> fxsr sse 
>                              sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> lm cons
>                              tant_tsc arch_perfmon pebs bts rep_good
> nopl xtopol
>                              ogy nonstop_tsc cpuid aperfmperf pni
> pclmulqdq dtes
>                              64 monitor ds_cpl vmx smx est tm2 ssse3
> sdbg fma cx
>                              16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> movbe po
>                              pcnt tsc_deadline_timer aes xsave avx f16c
> rdrand l
>                              ahf_lm abm 3dnowprefetch cpuid_fault epb
> cat_l3 cdp
>                              _l3 pti intel_ppin ssbd ibrs ibpb stibp
> tpr_shadow 
>                              flexpriority ept vpid ept_ad fsgsbase
> tsc_adjust bm
>                              i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdt_a rd
>                              seed adx smap intel_pt xsaveopt cqm_llc
> cqm_occup_l
>                              lc cqm_mbm_total cqm_mbm_local dtherm arat
> pln pts 
>                              vnmi md_clear flush_l1d
> Virtualization features:     
>   Virtualization:            VT-x
> Caches (sum of all):         
>   L1d:                       896 KiB (28 instances)
>   L1i:                       896 KiB (28 instances)
>   L2:                        7 MiB (28 instances)
>   L3:                        70 MiB (2 instances)
> NUMA:                        
>   NUMA node(s):              2
>   NUMA node0 CPU(s):         0-13,28-41
>   NUMA node1 CPU(s):         14-27,42-55
> Vulnerabilities:             
>   Gather data sampling:      Not affected
>   Indirect target selection: Not affected
>   Itlb multihit:             KVM: Mitigation: Split huge pages
>   L1tf:                      Mitigation; PTE Inversion; VMX conditional
> cache fl
>                              ushes, SMT vulnerable
>   Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
>   Meltdown:                  Mitigation; PTI
>   Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
>   Reg file data sampling:    Not affected
>   Retbleed:                  Not affected
>   Spec rstack overflow:      Not affected
>   Spec store bypass:         Mitigation; Speculative Store Bypass
> disabled via p
>                              rctl
>   Spectre v1:                Mitigation; usercopy/swapgs barriers and
> __user poi
>                              nter sanitization
>   Spectre v2:                Mitigation; Retpolines; IBPB conditional;
> IBRS_FW; 
>                              STIBP conditional; RSB filling; PBRSB-eIBRS
> Not aff
>                              ected; BHI Not affected
>   Srbds:                     Not affected
>   Tsa:                       Not affected
>   Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
>   Vmscape:                   Mitigation; IBPB before exit to userspace
> 

It would be nice to get the idle states here, ideally how the states' usage changed
from base to revert.
The mentioned thread did this and should show how it can be done, but a dump of
cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
before and after the workload is usually fine to work with:
https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-08 12:47       ` Christian Loehle
@ 2026-01-13  7:06         ` Harshvardhan Jha
  2026-01-13 14:13           ` Rafael J. Wysocki
  2026-01-27 15:45         ` Harshvardhan Jha
  1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-13  7:06 UTC (permalink / raw)
  To: Christian Loehle, Doug Smythies
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano'

Hi Crhistian,

On 08/12/25 6:17 PM, Christian Loehle wrote:
> On 12/8/25 11:33, Harshvardhan Jha wrote:
>> Hi Doug,
>>
>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> Hi there,
>>>>>
>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>> observed that several regressions across different benchmarks is being
>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>> information" for 5.15
>>>>>
>>>>> Regressions on 5.15.196 include:
>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>
>>>>> The culprit commits' messages mention that these reverts were done due
>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>> current regressions. If this problem has been reported before and a fix
>>>>> is in the works then please let me know I shall follow developments to
>>>>> that mail thread.
>>>> The discussion regarding this can be found here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ 
>>>> we explored an alternative to the full revert here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ 
>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>> full revert you're seeing now.
>>>>
>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>> in particular seems to favor deep idle states even when they don't seem
>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>> we're kind of stuck with that.
>>>>
>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>
>>>> That's the status, unless I missed anything?
>>> By reading everything in the links Chrsitian provided, you can see
>>> that we had difficulties repeating test results on other platforms.
>>>
>>> Of the tests listed herein, the only one that was easy to repeat on my
>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>
>>> Kernel 6.18									Reverted			
>>> pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
>>> 				performance		performance		performance		performance	
>>> test	what			ave			ave			ave			ave	
>>> 1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
>>> 2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
>>> 3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
>>> 4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
>>> 5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%
>>>
>>> Where:
>>> T/C means: Threads / Copies
>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>> Data points are in Seconds.
>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>> The reversion was included in kernel 6.18-rc3.
>>> Kernel 6.18-rc4 had another code change to menu.c
>>>
>>> In case the formatting gets messed up, the table is also attached.
>>>
>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>> HWP: Enabled.
>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>> after reapplying the revert on X6-2 systems.
>>
>> Architecture:                x86_64
>>   CPU op-mode(s):            32-bit, 64-bit
>>   Address sizes:             46 bits physical, 48 bits virtual
>>   Byte Order:                Little Endian
>> CPU(s):                      56
>>   On-line CPU(s) list:       0-55
>> Vendor ID:                   GenuineIntel
>>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>     CPU family:              6
>>     Model:                   79
>>     Thread(s) per core:      2
>>     Core(s) per socket:      14
>>     Socket(s):               2
>>     Stepping:                1
>>     CPU(s) scaling MHz:      98%
>>     CPU max MHz:             2600.0000
>>     CPU min MHz:             1200.0000
>>     BogoMIPS:                5188.26
>>     Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pg
>>                              e mca cmov pat pse36 clflush dts acpi mmx
>> fxsr sse 
>>                              sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
>> lm cons
>>                              tant_tsc arch_perfmon pebs bts rep_good
>> nopl xtopol
>>                              ogy nonstop_tsc cpuid aperfmperf pni
>> pclmulqdq dtes
>>                              64 monitor ds_cpl vmx smx est tm2 ssse3
>> sdbg fma cx
>>                              16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>> movbe po
>>                              pcnt tsc_deadline_timer aes xsave avx f16c
>> rdrand l
>>                              ahf_lm abm 3dnowprefetch cpuid_fault epb
>> cat_l3 cdp
>>                              _l3 pti intel_ppin ssbd ibrs ibpb stibp
>> tpr_shadow 
>>                              flexpriority ept vpid ept_ad fsgsbase
>> tsc_adjust bm
>>                              i1 hle avx2 smep bmi2 erms invpcid rtm cqm
>> rdt_a rd
>>                              seed adx smap intel_pt xsaveopt cqm_llc
>> cqm_occup_l
>>                              lc cqm_mbm_total cqm_mbm_local dtherm arat
>> pln pts 
>>                              vnmi md_clear flush_l1d
>> Virtualization features:     
>>   Virtualization:            VT-x
>> Caches (sum of all):         
>>   L1d:                       896 KiB (28 instances)
>>   L1i:                       896 KiB (28 instances)
>>   L2:                        7 MiB (28 instances)
>>   L3:                        70 MiB (2 instances)
>> NUMA:                        
>>   NUMA node(s):              2
>>   NUMA node0 CPU(s):         0-13,28-41
>>   NUMA node1 CPU(s):         14-27,42-55
>> Vulnerabilities:             
>>   Gather data sampling:      Not affected
>>   Indirect target selection: Not affected
>>   Itlb multihit:             KVM: Mitigation: Split huge pages
>>   L1tf:                      Mitigation; PTE Inversion; VMX conditional
>> cache fl
>>                              ushes, SMT vulnerable
>>   Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
>>   Meltdown:                  Mitigation; PTI
>>   Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
>>   Reg file data sampling:    Not affected
>>   Retbleed:                  Not affected
>>   Spec rstack overflow:      Not affected
>>   Spec store bypass:         Mitigation; Speculative Store Bypass
>> disabled via p
>>                              rctl
>>   Spectre v1:                Mitigation; usercopy/swapgs barriers and
>> __user poi
>>                              nter sanitization
>>   Spectre v2:                Mitigation; Retpolines; IBPB conditional;
>> IBRS_FW; 
>>                              STIBP conditional; RSB filling; PBRSB-eIBRS
>> Not aff
>>                              ected; BHI Not affected
>>   Srbds:                     Not affected
>>   Tsa:                       Not affected
>>   Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
>>   Vmscape:                   Mitigation; IBPB before exit to userspace
>>
> It would be nice to get the idle states here, ideally how the states' usage changed
> from base to revert.
> The mentioned thread did this and should show how it can be done, but a dump of
> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> before and after the workload is usually fine to work with:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ 

Bumping this as I discovered this issue on 6.12 stable branch also. The
reapplication seems inevitable. I shall get back to you with these
details also.

Thanks & Regards,
Harshvardhan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-13  7:06         ` Harshvardhan Jha
@ 2026-01-13 14:13           ` Rafael J. Wysocki
  2026-01-13 14:18             ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-13 14:13 UTC (permalink / raw)
  To: Harshvardhan Jha
  Cc: Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman,
	linux-pm, stable, Rafael J. Wysocki, Daniel Lezcano

On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha
<harshvardhan.j.jha@oracle.com> wrote:
>
> Hi Crhistian,
>
> On 08/12/25 6:17 PM, Christian Loehle wrote:
> > On 12/8/25 11:33, Harshvardhan Jha wrote:
> >> Hi Doug,
> >>
> >> On 04/12/25 4:00 AM, Doug Smythies wrote:
> >>> On 2025.12.03 08:45 Christian Loehle wrote:
> >>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> >>>>> Hi there,
> >>>>>
> >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
> >>>>> observed that several regressions across different benchmarks is being
> >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
> >>>>> automated bisect on both of them narrowed down the culprit commit to:
> >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> >>>>> information" for 5.15
> >>>>>
> >>>>> Regressions on 5.15.196 include:
> >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> >>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> >>>>> thread & 1M buffer size on OnPrem X6-2
> >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> >>>>> 1 thread & 1M buffer size on OnPrem X6-2
> >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
> >>>>>
> >>>>> The culprit commits' messages mention that these reverts were done due
> >>>>> to performance regressions introduced in Intel Jasper Lake systems but
> >>>>> this revert is causing issues in other systems unfortunately. I wanted
> >>>>> to know the maintainers' opinion on how we should proceed in order to
> >>>>> fix this. If we reapply it'll bring back the previous regressions on
> >>>>> Jasper Lake systems and if we don't revert it then it's stuck with
> >>>>> current regressions. If this problem has been reported before and a fix
> >>>>> is in the works then please let me know I shall follow developments to
> >>>>> that mail thread.
> >>>> The discussion regarding this can be found here:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
> >>>> we explored an alternative to the full revert here:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
> >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
> >>>> full revert you're seeing now.
> >>>>
> >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> >>>> will highly depend on your platform, but also your workload, Jasper Lake
> >>>> in particular seems to favor deep idle states even when they don't seem
> >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> >>>> we're kind of stuck with that.
> >>>>
> >>>> For teo we've discussed a tunable knob in the past, which comes naturally with
> >>>> the logic, for menu there's nothing obvious that would be comparable.
> >>>> But for teo such a knob didn't generate any further interest (so far).
> >>>>
> >>>> That's the status, unless I missed anything?
> >>> By reading everything in the links Chrsitian provided, you can see
> >>> that we had difficulties repeating test results on other platforms.
> >>>
> >>> Of the tests listed herein, the only one that was easy to repeat on my
> >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
> >>>
> >>> Kernel 6.18                                                                 Reverted
> >>> pts/sqlite-2.3.0                    menu rc4                menu rc1                menu rc1                menu rc3
> >>>                             performance             performance             performance             performance
> >>> test        what                    ave                     ave                     ave                     ave
> >>> 1   T/C 1                   2.147   -0.2%           2.143   0.0%            2.16    -0.8%           2.156   -0.6%
> >>> 2   T/C 2                   3.468   0.1%            3.473   0.0%            3.486   -0.4%           3.478   -0.1%
> >>> 3   T/C 4                   4.336   0.3%            4.35    0.0%            4.355   -0.1%           4.354   -0.1%
> >>> 4   T/C 8                   5.438   -0.1%           5.434   0.0%            5.456   -0.4%           5.45    -0.3%
> >>> 5   T/C 12                  6.314   -0.2%           6.299   0.0%            6.307   -0.1%           6.29    0.1%
> >>>
> >>> Where:
> >>> T/C means: Threads / Copies
> >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> >>> Data points are in Seconds.
> >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> >>> The reversion was manually applied to kernel 6.18-rc1 for that test.
> >>> The reversion was included in kernel 6.18-rc3.
> >>> Kernel 6.18-rc4 had another code change to menu.c
> >>>
> >>> In case the formatting gets messed up, the table is also attached.
> >>>
> >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> >>> HWP: Enabled.
> >> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> >> after reapplying the revert on X6-2 systems.
> >>
> >> Architecture:                x86_64
> >>   CPU op-mode(s):            32-bit, 64-bit
> >>   Address sizes:             46 bits physical, 48 bits virtual
> >>   Byte Order:                Little Endian
> >> CPU(s):                      56
> >>   On-line CPU(s) list:       0-55
> >> Vendor ID:                   GenuineIntel
> >>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >>     CPU family:              6
> >>     Model:                   79
> >>     Thread(s) per core:      2
> >>     Core(s) per socket:      14
> >>     Socket(s):               2
> >>     Stepping:                1
> >>     CPU(s) scaling MHz:      98%
> >>     CPU max MHz:             2600.0000
> >>     CPU min MHz:             1200.0000
> >>     BogoMIPS:                5188.26
> >>     Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
> >> mtrr pg
> >>                              e mca cmov pat pse36 clflush dts acpi mmx
> >> fxsr sse
> >>                              sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> >> lm cons
> >>                              tant_tsc arch_perfmon pebs bts rep_good
> >> nopl xtopol
> >>                              ogy nonstop_tsc cpuid aperfmperf pni
> >> pclmulqdq dtes
> >>                              64 monitor ds_cpl vmx smx est tm2 ssse3
> >> sdbg fma cx
> >>                              16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> >> movbe po
> >>                              pcnt tsc_deadline_timer aes xsave avx f16c
> >> rdrand l
> >>                              ahf_lm abm 3dnowprefetch cpuid_fault epb
> >> cat_l3 cdp
> >>                              _l3 pti intel_ppin ssbd ibrs ibpb stibp
> >> tpr_shadow
> >>                              flexpriority ept vpid ept_ad fsgsbase
> >> tsc_adjust bm
> >>                              i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> >> rdt_a rd
> >>                              seed adx smap intel_pt xsaveopt cqm_llc
> >> cqm_occup_l
> >>                              lc cqm_mbm_total cqm_mbm_local dtherm arat
> >> pln pts
> >>                              vnmi md_clear flush_l1d
> >> Virtualization features:
> >>   Virtualization:            VT-x
> >> Caches (sum of all):
> >>   L1d:                       896 KiB (28 instances)
> >>   L1i:                       896 KiB (28 instances)
> >>   L2:                        7 MiB (28 instances)
> >>   L3:                        70 MiB (2 instances)
> >> NUMA:
> >>   NUMA node(s):              2
> >>   NUMA node0 CPU(s):         0-13,28-41
> >>   NUMA node1 CPU(s):         14-27,42-55
> >> Vulnerabilities:
> >>   Gather data sampling:      Not affected
> >>   Indirect target selection: Not affected
> >>   Itlb multihit:             KVM: Mitigation: Split huge pages
> >>   L1tf:                      Mitigation; PTE Inversion; VMX conditional
> >> cache fl
> >>                              ushes, SMT vulnerable
> >>   Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
> >>   Meltdown:                  Mitigation; PTI
> >>   Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
> >>   Reg file data sampling:    Not affected
> >>   Retbleed:                  Not affected
> >>   Spec rstack overflow:      Not affected
> >>   Spec store bypass:         Mitigation; Speculative Store Bypass
> >> disabled via p
> >>                              rctl
> >>   Spectre v1:                Mitigation; usercopy/swapgs barriers and
> >> __user poi
> >>                              nter sanitization
> >>   Spectre v2:                Mitigation; Retpolines; IBPB conditional;
> >> IBRS_FW;
> >>                              STIBP conditional; RSB filling; PBRSB-eIBRS
> >> Not aff
> >>                              ected; BHI Not affected
> >>   Srbds:                     Not affected
> >>   Tsa:                       Not affected
> >>   Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
> >>   Vmscape:                   Mitigation; IBPB before exit to userspace
> >>
> > It would be nice to get the idle states here, ideally how the states' usage changed
> > from base to revert.
> > The mentioned thread did this and should show how it can be done, but a dump of
> > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> > before and after the workload is usually fine to work with:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>
> Bumping this as I discovered this issue on 6.12 stable branch also. The
> reapplication seems inevitable. I shall get back to you with these
> details also.

Yes, please, because I have another reason to restore the reverted commit.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-13 14:13           ` Rafael J. Wysocki
@ 2026-01-13 14:18             ` Rafael J. Wysocki
  2026-01-14  4:28               ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-13 14:18 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin,
	Greg Kroah-Hartman, linux-pm, stable, Rafael J. Wysocki,
	Daniel Lezcano

On Tue, Jan 13, 2026 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha
> <harshvardhan.j.jha@oracle.com> wrote:
> >
> > Hi Crhistian,
> >
> > On 08/12/25 6:17 PM, Christian Loehle wrote:
> > > On 12/8/25 11:33, Harshvardhan Jha wrote:
> > >> Hi Doug,
> > >>
> > >> On 04/12/25 4:00 AM, Doug Smythies wrote:
> > >>> On 2025.12.03 08:45 Christian Loehle wrote:
> > >>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> > >>>>> Hi there,
> > >>>>>
> > >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
> > >>>>> observed that several regressions across different benchmarks is being
> > >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
> > >>>>> automated bisect on both of them narrowed down the culprit commit to:
> > >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> > >>>>> information" for 5.15
> > >>>>>
> > >>>>> Regressions on 5.15.196 include:
> > >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> > >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> > >>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> > >>>>> thread & 1M buffer size on OnPrem X6-2
> > >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> > >>>>> 1 thread & 1M buffer size on OnPrem X6-2
> > >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
> > >>>>>
> > >>>>> The culprit commits' messages mention that these reverts were done due
> > >>>>> to performance regressions introduced in Intel Jasper Lake systems but
> > >>>>> this revert is causing issues in other systems unfortunately. I wanted
> > >>>>> to know the maintainers' opinion on how we should proceed in order to
> > >>>>> fix this. If we reapply it'll bring back the previous regressions on
> > >>>>> Jasper Lake systems and if we don't revert it then it's stuck with
> > >>>>> current regressions. If this problem has been reported before and a fix
> > >>>>> is in the works then please let me know I shall follow developments to
> > >>>>> that mail thread.
> > >>>> The discussion regarding this can be found here:
> > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
> > >>>> we explored an alternative to the full revert here:
> > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
> > >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
> > >>>> full revert you're seeing now.
> > >>>>
> > >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> > >>>> will highly depend on your platform, but also your workload, Jasper Lake
> > >>>> in particular seems to favor deep idle states even when they don't seem
> > >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> > >>>> we're kind of stuck with that.
> > >>>>
> > >>>> For teo we've discussed a tunable knob in the past, which comes naturally with
> > >>>> the logic, for menu there's nothing obvious that would be comparable.
> > >>>> But for teo such a knob didn't generate any further interest (so far).
> > >>>>
> > >>>> That's the status, unless I missed anything?
> > >>> By reading everything in the links Chrsitian provided, you can see
> > >>> that we had difficulties repeating test results on other platforms.
> > >>>
> > >>> Of the tests listed herein, the only one that was easy to repeat on my
> > >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
> > >>>
> > >>> Kernel 6.18                                                                 Reverted
> > >>> pts/sqlite-2.3.0                    menu rc4                menu rc1                menu rc1                menu rc3
> > >>>                             performance             performance             performance             performance
> > >>> test        what                    ave                     ave                     ave                     ave
> > >>> 1   T/C 1                   2.147   -0.2%           2.143   0.0%            2.16    -0.8%           2.156   -0.6%
> > >>> 2   T/C 2                   3.468   0.1%            3.473   0.0%            3.486   -0.4%           3.478   -0.1%
> > >>> 3   T/C 4                   4.336   0.3%            4.35    0.0%            4.355   -0.1%           4.354   -0.1%
> > >>> 4   T/C 8                   5.438   -0.1%           5.434   0.0%            5.456   -0.4%           5.45    -0.3%
> > >>> 5   T/C 12                  6.314   -0.2%           6.299   0.0%            6.307   -0.1%           6.29    0.1%
> > >>>
> > >>> Where:
> > >>> T/C means: Threads / Copies
> > >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> > >>> Data points are in Seconds.
> > >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> > >>> The reversion was manually applied to kernel 6.18-rc1 for that test.
> > >>> The reversion was included in kernel 6.18-rc3.
> > >>> Kernel 6.18-rc4 had another code change to menu.c
> > >>>
> > >>> In case the formatting gets messed up, the table is also attached.
> > >>>
> > >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> > >>> HWP: Enabled.
> > >> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> > >> after reapplying the revert on X6-2 systems.
> > >>
> > >> Architecture:                x86_64
> > >>   CPU op-mode(s):            32-bit, 64-bit
> > >>   Address sizes:             46 bits physical, 48 bits virtual
> > >>   Byte Order:                Little Endian
> > >> CPU(s):                      56
> > >>   On-line CPU(s) list:       0-55
> > >> Vendor ID:                   GenuineIntel
> > >>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> > >>     CPU family:              6
> > >>     Model:                   79
> > >>     Thread(s) per core:      2
> > >>     Core(s) per socket:      14
> > >>     Socket(s):               2
> > >>     Stepping:                1
> > >>     CPU(s) scaling MHz:      98%
> > >>     CPU max MHz:             2600.0000
> > >>     CPU min MHz:             1200.0000
> > >>     BogoMIPS:                5188.26
> > >>     Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
> > >> mtrr pg
> > >>                              e mca cmov pat pse36 clflush dts acpi mmx
> > >> fxsr sse
> > >>                              sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> > >> lm cons
> > >>                              tant_tsc arch_perfmon pebs bts rep_good
> > >> nopl xtopol
> > >>                              ogy nonstop_tsc cpuid aperfmperf pni
> > >> pclmulqdq dtes
> > >>                              64 monitor ds_cpl vmx smx est tm2 ssse3
> > >> sdbg fma cx
> > >>                              16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> > >> movbe po
> > >>                              pcnt tsc_deadline_timer aes xsave avx f16c
> > >> rdrand l
> > >>                              ahf_lm abm 3dnowprefetch cpuid_fault epb
> > >> cat_l3 cdp
> > >>                              _l3 pti intel_ppin ssbd ibrs ibpb stibp
> > >> tpr_shadow
> > >>                              flexpriority ept vpid ept_ad fsgsbase
> > >> tsc_adjust bm
> > >>                              i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> > >> rdt_a rd
> > >>                              seed adx smap intel_pt xsaveopt cqm_llc
> > >> cqm_occup_l
> > >>                              lc cqm_mbm_total cqm_mbm_local dtherm arat
> > >> pln pts
> > >>                              vnmi md_clear flush_l1d
> > >> Virtualization features:
> > >>   Virtualization:            VT-x
> > >> Caches (sum of all):
> > >>   L1d:                       896 KiB (28 instances)
> > >>   L1i:                       896 KiB (28 instances)
> > >>   L2:                        7 MiB (28 instances)
> > >>   L3:                        70 MiB (2 instances)
> > >> NUMA:
> > >>   NUMA node(s):              2
> > >>   NUMA node0 CPU(s):         0-13,28-41
> > >>   NUMA node1 CPU(s):         14-27,42-55
> > >> Vulnerabilities:
> > >>   Gather data sampling:      Not affected
> > >>   Indirect target selection: Not affected
> > >>   Itlb multihit:             KVM: Mitigation: Split huge pages
> > >>   L1tf:                      Mitigation; PTE Inversion; VMX conditional
> > >> cache fl
> > >>                              ushes, SMT vulnerable
> > >>   Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
> > >>   Meltdown:                  Mitigation; PTI
> > >>   Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
> > >>   Reg file data sampling:    Not affected
> > >>   Retbleed:                  Not affected
> > >>   Spec rstack overflow:      Not affected
> > >>   Spec store bypass:         Mitigation; Speculative Store Bypass
> > >> disabled via p
> > >>                              rctl
> > >>   Spectre v1:                Mitigation; usercopy/swapgs barriers and
> > >> __user poi
> > >>                              nter sanitization
> > >>   Spectre v2:                Mitigation; Retpolines; IBPB conditional;
> > >> IBRS_FW;
> > >>                              STIBP conditional; RSB filling; PBRSB-eIBRS
> > >> Not aff
> > >>                              ected; BHI Not affected
> > >>   Srbds:                     Not affected
> > >>   Tsa:                       Not affected
> > >>   Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
> > >>   Vmscape:                   Mitigation; IBPB before exit to userspace
> > >>
> > > It would be nice to get the idle states here, ideally how the states' usage changed
> > > from base to revert.
> > > The mentioned thread did this and should show how it can be done, but a dump of
> > > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> > > before and after the workload is usually fine to work with:
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> >
> > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > reapplication seems inevitable. I shall get back to you with these
> > details also.
>
> Yes, please, because I have another reason to restore the reverted commit.

Sergey, did you see a performance regression from 85975daeaa4d
("cpuidle: menu: Avoid discarding useful information") on any
platforms other than the Jasper Lake it was reported for?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-13 14:18             ` Rafael J. Wysocki
@ 2026-01-14  4:28               ` Sergey Senozhatsky
  2026-01-14  4:49                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-01-14  4:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sergey Senozhatsky, Harshvardhan Jha, Christian Loehle,
	Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
	Daniel Lezcano

Hi,

On (26/01/13 15:18), Rafael J. Wysocki wrote:
[..]
> > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > reapplication seems inevitable. I shall get back to you with these
> > > details also.
> >
> > Yes, please, because I have another reason to restore the reverted commit.
> 
> Sergey, did you see a performance regression from 85975daeaa4d
> ("cpuidle: menu: Avoid discarding useful information") on any
> platforms other than the Jasper Lake it was reported for?

Let me try to dig it up.  I think I saw regressions on a number of
devices:

---
cpu family      : 6
model           : 122
model name      : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
---
cpu family      : 6
model           : 122
model name      : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
---
cpu family      : 6
model           : 156
model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
---
cpu family      : 6
model           : 156
model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
---
cpu family      : 6
model           : 156
model name      : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz


I guess family 6/model 122 is not Jasper Lake?

I also saw some where the patch in question seemed to improve the
metrics, but regressions are more important, so the revert simply
put all of the boards back to the previous state.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-14  4:28               ` Sergey Senozhatsky
@ 2026-01-14  4:49                 ` Sergey Senozhatsky
  2026-01-14  5:15                   ` Tomasz Figa
  2026-01-29 22:47                   ` Doug Smythies
  0 siblings, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-01-14  4:49 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle,
	Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
	Daniel Lezcano, Sergey Senozhatsky

Cc-ing Tomasz

On (26/01/14 13:28), Sergey Senozhatsky wrote:
> Hi,
> 
> On (26/01/13 15:18), Rafael J. Wysocki wrote:
> [..]
> > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > reapplication seems inevitable. I shall get back to you with these
> > > > details also.
> > >
> > > Yes, please, because I have another reason to restore the reverted commit.
> > 
> > Sergey, did you see a performance regression from 85975daeaa4d
> > ("cpuidle: menu: Avoid discarding useful information") on any
> > platforms other than the Jasper Lake it was reported for?
> 
> Let me try to dig it up.  I think I saw regressions on a number of
> devices:
> 
> ---
> cpu family      : 6
> model           : 122
> model name      : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
> ---
> cpu family      : 6
> model           : 122
> model name      : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
> ---
> cpu family      : 6
> model           : 156
> model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
> ---
> cpu family      : 6
> model           : 156
> model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
> ---
> cpu family      : 6
> model           : 156
> model name      : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
> 
> 
> I guess family 6/model 122 is not Jasper Lake?
> 
> I also saw some where the patch in question seemed to improve the
> metrics, but regressions are more important, so the revert simply
> put all of the boards back to the previous state.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-14  4:49                 ` Sergey Senozhatsky
@ 2026-01-14  5:15                   ` Tomasz Figa
  2026-01-14 20:07                     ` Rafael J. Wysocki
  2026-01-29 22:47                   ` Doug Smythies
  1 sibling, 1 reply; 44+ messages in thread
From: Tomasz Figa @ 2026-01-14  5:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle,
	Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
	Daniel Lezcano

Hi all,

On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Cc-ing Tomasz
>
> On (26/01/14 13:28), Sergey Senozhatsky wrote:
> > Hi,
> >
> > On (26/01/13 15:18), Rafael J. Wysocki wrote:
> > [..]
> > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > > reapplication seems inevitable. I shall get back to you with these
> > > > > details also.
> > > >
> > > > Yes, please, because I have another reason to restore the reverted commit.

Is the performance difference the reporter observed an actual
regression, or is it just a return to the level before the
optimization was merged into stable branches? If the latter, shouldn't
avoiding regressions be a priority over further optimizing for other
users?

If there is a really strong desire to reland this optimization, could
it at least be applied selectively to the CPUs that it's known to
help, or alternatively, made configurable?

Best,
Tomasz

> > >
> > > Sergey, did you see a performance regression from 85975daeaa4d
> > > ("cpuidle: menu: Avoid discarding useful information") on any
> > > platforms other than the Jasper Lake it was reported for?
> >
> > Let me try to dig it up.  I think I saw regressions on a number of
> > devices:
> >
> > ---
> > cpu family      : 6
> > model           : 122
> > model name      : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
> > ---
> > cpu family      : 6
> > model           : 122
> > model name      : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
> > ---
> > cpu family      : 6
> > model           : 156
> > model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
> > ---
> > cpu family      : 6
> > model           : 156
> > model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
> > ---
> > cpu family      : 6
> > model           : 156
> > model name      : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
> >
> >
> > I guess family 6/model 122 is not Jasper Lake?
> >
> > I also saw some where the patch in question seemed to improve the
> > metrics, but regressions are more important, so the revert simply
> > put all of the boards back to the previous state.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-14  5:15                   ` Tomasz Figa
@ 2026-01-14 20:07                     ` Rafael J. Wysocki
  2026-01-29 10:23                       ` Harshvardhan Jha
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-14 20:07 UTC (permalink / raw)
  To: Tomasz Figa, Harshvardhan Jha
  Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin,
	Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano

On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi all,
>
> On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > Cc-ing Tomasz
> >
> > On (26/01/14 13:28), Sergey Senozhatsky wrote:
> > > Hi,
> > >
> > > On (26/01/13 15:18), Rafael J. Wysocki wrote:
> > > [..]
> > > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > > > reapplication seems inevitable. I shall get back to you with these
> > > > > > details also.
> > > > >
> > > > > Yes, please, because I have another reason to restore the reverted commit.
>
> Is the performance difference the reporter observed an actual
> regression, or is it just a return to the level before the
> optimization was merged into stable branches?

Good question.

Harshvardhan, which one is the case?

> If the latter, shouldn't avoiding regressions be a priority over further optimizing for other
> users?
>
> If there is a really strong desire to reland this optimization, could
> it at least be applied selectively to the CPUs that it's known to
> help, or alternatively, made configurable?

That wouldn't be easy in practice, but I think that it may be
compensated by reducing the target residency values of the deepest
idle states on those systems.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2025-12-08 12:47       ` Christian Loehle
  2026-01-13  7:06         ` Harshvardhan Jha
@ 2026-01-27 15:45         ` Harshvardhan Jha
  2026-01-28  5:06           ` Doug Smythies
  1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-27 15:45 UTC (permalink / raw)
  To: Christian Loehle, Doug Smythies
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano'

[-- Attachment #1: Type: text/plain, Size: 10297 bytes --]

Hi Christian,

On 08/12/25 6:17 PM, Christian Loehle wrote:
> On 12/8/25 11:33, Harshvardhan Jha wrote:
>> Hi Doug,
>>
>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> Hi there,
>>>>>
>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>> observed that several regressions across different benchmarks is being
>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>> information" for 5.15
>>>>>
>>>>> Regressions on 5.15.196 include:
>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>
>>>>> The culprit commits' messages mention that these reverts were done due
>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>> current regressions. If this problem has been reported before and a fix
>>>>> is in the works then please let me know I shall follow developments to
>>>>> that mail thread.
>>>> The discussion regarding this can be found here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ 
>>>> we explored an alternative to the full revert here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ 
>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>> full revert you're seeing now.
>>>>
>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>> in particular seems to favor deep idle states even when they don't seem
>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>> we're kind of stuck with that.
>>>>
>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>
>>>> That's the status, unless I missed anything?
>>> By reading everything in the links Chrsitian provided, you can see
>>> that we had difficulties repeating test results on other platforms.
>>>
>>> Of the tests listed herein, the only one that was easy to repeat on my
>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>
>>> Kernel 6.18									Reverted			
>>> pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
>>> 				performance		performance		performance		performance	
>>> test	what			ave			ave			ave			ave	
>>> 1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
>>> 2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
>>> 3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
>>> 4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
>>> 5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%
>>>
>>> Where:
>>> T/C means: Threads / Copies
>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>> Data points are in Seconds.
>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>> The reversion was included in kernel 6.18-rc3.
>>> Kernel 6.18-rc4 had another code change to menu.c
>>>
>>> In case the formatting gets messed up, the table is also attached.
>>>
>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>> HWP: Enabled.
>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>> after reapplying the revert on X6-2 systems.
>>
>> Architecture:                x86_64
>>   CPU op-mode(s):            32-bit, 64-bit
>>   Address sizes:             46 bits physical, 48 bits virtual
>>   Byte Order:                Little Endian
>> CPU(s):                      56
>>   On-line CPU(s) list:       0-55
>> Vendor ID:                   GenuineIntel
>>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>     CPU family:              6
>>     Model:                   79
>>     Thread(s) per core:      2
>>     Core(s) per socket:      14
>>     Socket(s):               2
>>     Stepping:                1
>>     CPU(s) scaling MHz:      98%
>>     CPU max MHz:             2600.0000
>>     CPU min MHz:             1200.0000
>>     BogoMIPS:                5188.26
>>     Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pg
>>                              e mca cmov pat pse36 clflush dts acpi mmx
>> fxsr sse 
>>                              sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
>> lm cons
>>                              tant_tsc arch_perfmon pebs bts rep_good
>> nopl xtopol
>>                              ogy nonstop_tsc cpuid aperfmperf pni
>> pclmulqdq dtes
>>                              64 monitor ds_cpl vmx smx est tm2 ssse3
>> sdbg fma cx
>>                              16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>> movbe po
>>                              pcnt tsc_deadline_timer aes xsave avx f16c
>> rdrand l
>>                              ahf_lm abm 3dnowprefetch cpuid_fault epb
>> cat_l3 cdp
>>                              _l3 pti intel_ppin ssbd ibrs ibpb stibp
>> tpr_shadow 
>>                              flexpriority ept vpid ept_ad fsgsbase
>> tsc_adjust bm
>>                              i1 hle avx2 smep bmi2 erms invpcid rtm cqm
>> rdt_a rd
>>                              seed adx smap intel_pt xsaveopt cqm_llc
>> cqm_occup_l
>>                              lc cqm_mbm_total cqm_mbm_local dtherm arat
>> pln pts 
>>                              vnmi md_clear flush_l1d
>> Virtualization features:     
>>   Virtualization:            VT-x
>> Caches (sum of all):         
>>   L1d:                       896 KiB (28 instances)
>>   L1i:                       896 KiB (28 instances)
>>   L2:                        7 MiB (28 instances)
>>   L3:                        70 MiB (2 instances)
>> NUMA:                        
>>   NUMA node(s):              2
>>   NUMA node0 CPU(s):         0-13,28-41
>>   NUMA node1 CPU(s):         14-27,42-55
>> Vulnerabilities:             
>>   Gather data sampling:      Not affected
>>   Indirect target selection: Not affected
>>   Itlb multihit:             KVM: Mitigation: Split huge pages
>>   L1tf:                      Mitigation; PTE Inversion; VMX conditional
>> cache fl
>>                              ushes, SMT vulnerable
>>   Mds:                       Mitigation; Clear CPU buffers; SMT vulnerable
>>   Meltdown:                  Mitigation; PTI
>>   Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
>>   Reg file data sampling:    Not affected
>>   Retbleed:                  Not affected
>>   Spec rstack overflow:      Not affected
>>   Spec store bypass:         Mitigation; Speculative Store Bypass
>> disabled via p
>>                              rctl
>>   Spectre v1:                Mitigation; usercopy/swapgs barriers and
>> __user poi
>>                              nter sanitization
>>   Spectre v2:                Mitigation; Retpolines; IBPB conditional;
>> IBRS_FW; 
>>                              STIBP conditional; RSB filling; PBRSB-eIBRS
>> Not aff
>>                              ected; BHI Not affected
>>   Srbds:                     Not affected
>>   Tsa:                       Not affected
>>   Tsx async abort:           Mitigation; Clear CPU buffers; SMT vulnerable
>>   Vmscape:                   Mitigation; IBPB before exit to userspace
>>
> It would be nice to get the idle states here, ideally how the states' usage changed
> from base to revert.
> The mentioned thread did this and should show how it can be done, but a dump of
> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> before and after the workload is usually fine to work with:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ 
Apologies for the late reply, I'm attaching a tar ball which has the cpu
states for the test suites before and after tests. The folders with the
name of the test contain two folders good-kernel and bad-kernel
containing two files having the before and after states. Please note
that different machines were used for different test suites due to
compatibility reasons. The jbb test was run using containers.


Thanks & Regards,
Harshvardhan

[-- Attachment #2: cpuidle.tar.gz --]
[-- Type: application/x-gzip, Size: 36882 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-27 15:45         ` Harshvardhan Jha
@ 2026-01-28  5:06           ` Doug Smythies
  2026-01-28 23:53             ` Doug Smythies
  0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-28  5:06 UTC (permalink / raw)
  To: 'Harshvardhan Jha', 'Christian Loehle'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
	Doug Smythies

[-- Attachment #1: Type: text/plain, Size: 8076 bytes --]

On 2026.01.27 07:45 Harshvardhan Jha wrote:
>On 08/12/25 6:17 PM, Christian Loehle wrote:
>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>
>>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>>> observed that several regressions across different benchmarks is being
>>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>>> information" for 5.15
>>>>>>
>>>>>> Regressions on 5.15.196 include:
>>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>>> -18%  : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>>
>>>>>> The culprit commits' messages mention that these reverts were done due
>>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>>> current regressions. If this problem has been reported before and a fix
>>>>>> is in the works then please let me know I shall follow developments to
>>>>>> that mail thread.
>>>>> The discussion regarding this can be found here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ 
>>>>> we explored an alternative to the full revert here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ 
>>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>>> full revert you're seeing now.
>>>>>
>>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>>> in particular seems to favor deep idle states even when they don't seem
>>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>>> we're kind of stuck with that.
>>>>>
>>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>>
>>>>> That's the status, unless I missed anything?
>>>> By reading everything in the links Chrsitian provided, you can see
>>>> that we had difficulties repeating test results on other platforms.
>>>>
>>>> Of the tests listed herein, the only one that was easy to repeat on my
>>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>>
>>>> Kernel 6.18									Reverted			
>>>> pts/sqlite-2.3.0			menu rc4		menu rc1		menu rc1		menu rc3	
>>>> 				performance		performance		performance		performance	
>>>> test	what			ave			ave			ave			ave	
>>>> 1	T/C 1			2.147	-0.2%		2.143	0.0%		2.16	-0.8%		2.156	-0.6%
>>>> 2	T/C 2			3.468	0.1%		3.473	0.0%		3.486	-0.4%		3.478	-0.1%
>>>> 3	T/C 4			4.336	0.3%		4.35	0.0%		4.355	-0.1%		4.354	-0.1%
>>>> 4	T/C 8			5.438	-0.1%		5.434	0.0%		5.456	-0.4%		5.45	-0.3%
>>>> 5	T/C 12			6.314	-0.2%		6.299	0.0%		6.307	-0.1%		6.29	0.1%
>>>>
>>>> Where:
>>>> T/C means: Threads / Copies
>>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>>> Data points are in Seconds.
>>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>>> The reversion was included in kernel 6.18-rc3.
>>>> Kernel 6.18-rc4 had another code change to menu.c
>>>>
>>>> In case the formatting gets messed up, the table is also attached.
>>>>
>>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>>> HWP: Enabled.
>>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>>> after reapplying the revert on X6-2 systems.
>>>
>>> Architecture:                x86_64
>>>   CPU op-mode(s):            32-bit, 64-bit
>>>   Address sizes:             46 bits physical, 48 bits virtual
>>>   Byte Order:                Little Endian
>>> CPU(s):                      56
>>>   On-line CPU(s) list:       0-55
>>> Vendor ID:                   GenuineIntel
>>>   Model name:                Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>    CPU family:              6
>>>     Model:                   79
>>>     Thread(s) per core:      2
>>>     Core(s) per socket:      14
>>>     Socket(s):               2
>>>     Stepping:                1
>>>     CPU(s) scaling MHz:      98%
>>>     CPU max MHz:             2600.0000
>>>     CPU min MHz:             1200.0000
>>>     BogoMIPS:                5188.26
... snip ...
>> It would be nice to get the idle states here, ideally how the states' usage changed
>> from base to revert.
>> The mentioned thread did this and should show how it can be done, but a dump of
>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>> before and after the workload is usually fine to work with:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ 
> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> states for the test suites before and after tests. The folders with the
> name of the test contain two folders good-kernel and bad-kernel
> containing two files having the before and after states. Please note
> that different machines were used for different test suites due to
> compatibility reasons. The jbb test was run using containers.

It is a considerable amount of work to manually extract and summarize the data.
I have only done it for the phoronix-sqlite data.
There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
I remember seeing a Linux-pm email about why but couldn't find it just now.
Summary (also attached as a PNG file, in case the formatting gets messed up):
The total idle entries (usage)  and time seem low to me, which is why the ???.

phoronix-sqlite						
	Good Kernel: Time between samples 4 seconds (estimated and ???)					
	Usage	Above	Below	Above	Below	
state 0	220	0	218	0.00%	99.09%	
state 1	70212	5213	34602	7.42%	49.28%	
state 2	30273	5237	1806	17.30%	5.97%	
state 3	0	0	0	0.00%	0.00%	
state 4	11824	2120	0	17.93%	0.00%	
						
total	112529	12570	36626	43.72%	 <<< Misses %	
						
	Bad Kernel: Time between samples 3.8 seconds (estimated and ???)					
	Usage	Above	Below	Above	Below	
state 0	262	0	260	0.00%	99.24%	
state 1	62751	3985	35588	6.35%	56.71%	
state 2	24941	7896	1433	31.66%	5.75%	
state 3	0	0	0	0.00%	0.00%	
state 4	24489	11543	0	47.14%	0.00%	
						
total	112443	23424	37281	53.99%	 <<< Misses %

Observe 2X use of idle state 4 for the "Bad Kernel"

I have a template now, and can summarize the other 40 CPU data
faster, but I would have to rework the template for the 56 CPU data,
and is it a 64 CPU data set at 4 idle states per CPU?

... Doug


[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-28  5:06           ` Doug Smythies
@ 2026-01-28 23:53             ` Doug Smythies
  2026-01-29 22:27               ` Doug Smythies
  0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-28 23:53 UTC (permalink / raw)
  To: 'Harshvardhan Jha', 'Christian Loehle'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
	Doug Smythies

[-- Attachment #1: Type: text/plain, Size: 6136 bytes --]

On 2026.01.27 21:07 Doug Smythies wrote:
> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
... snip ...

>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>> from base to revert.
>>> The mentioned thread did this and should show how it can be done, but a dump of
>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>> before and after the workload is usually fine to work with:
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ 

>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>> states for the test suites before and after tests. The folders with the
>> name of the test contain two folders good-kernel and bad-kernel
>> containing two files having the before and after states. Please note
>> that different machines were used for different test suites due to
>> compatibility reasons. The jbb test was run using containers.

Please provide the results of the test runs that were done for
the supplied before and after idle data.
In particular, what is the "fio" test and it results. Its idle data is not very revealing.
Is it a test I can run on my test computer?

> It is a considerable amount of work to manually extract and summarize the data.
> I have only done it for the phoronix-sqlite data.

I have done the rest now, see below.
I have also attached the results, in case the formatting gets screwed up.

> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
> I remember seeing a Linux-pm email about why but couldn't find it just now.
> Summary (also attached as a PNG file, in case the formatting gets messed up):
> The total idle entries (usage)  and time seem low to me, which is why the ???.
>
> phoronix-sqlite						
>	Good Kernel: Time between samples 4 seconds (estimated and ???)					
>		Usage	Above	Below	Above	Below	
> state 0	220	0	218	0.00%	99.09%	
> state 1	70212	5213	34602	7.42%	49.28%	
> state 2	30273	5237	1806	17.30%	5.97%	
> state 3	0	0	0	0.00%	0.00%	
> state 4	11824	2120	0	17.93%	0.00%	
>						
> total		112529	12570	36626	43.72%	 <<< Misses %	
>						
>	Bad Kernel: Time between samples 3.8 seconds (estimated and ???)					
>		Usage	Above	Below	Above	Below	
> state 0	262	0	260	0.00%	99.24%	
> state 1	62751	3985	35588	6.35%	56.71%	
> state 2	24941	7896	1433	31.66%	5.75%	
> state 3	0	0	0	0.00%	0.00%	
> state 4	24489	11543	0	47.14%	0.00%	
>						
> total		112443	23424	37281	53.99%	 <<< Misses %
>
> Observe 2X use of idle state 4 for the "Bad Kernel"
>
> I have a template now, and can summarize the other 40 CPU data
> faster, but I would have to rework the template for the 56 CPU data,
> and is it a 64 CPU data set at 4 idle states per CPU?

jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
POLL, C1, C1E, C3 (disabled), C6

	Good Kernel: Time between samples > 2 hours (estimated)				
	Usage		Above		Below		Above	Below
state 0	297550		0		296084		0.00%	99.51%
state 1	8062854	341043		4962635	4.23%	61.55%
state 2	56708358	12688379	6252051	22.37%	11.02%
state 3	0		0		0		0.00%	0.00%
state 4	54624476	15868752	0		29.05%	0.00%
					
total	119693238	28898174	11510770	33.76%	<<< Misses
					
	Bad Kernel: Time between samples > 2 hours (estimated)				
	Usage		Above		Below		Above	Below
state 0	90715		0		75134		0.00%	82.82%
state 1	8878738	312970		6082180	3.52%	68.50%
state 2	12048728	2576251	603316		21.38%	5.01%
state 3	0		0		0		0.00%	0.00%
state 4	85999424	44723273	0		52.00%	0.00%
					
total	107017605	47612494	6760630	50.81%	<<< Misses

As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"

fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.

fio					
	Good Kernel: Time between samples ~ 1 minute (estimated)				
	Usage		Above	Below	Above	Below
state 0	3822		0	3818	0.00%	99.90%
state 1	148640		4406	68956	2.96%	46.39%
state 2	593455		45344	105675	7.64%	17.81%
state 3	3209648	807014	0	25.14%	0.00%
					
total	3955565	856764	178449	26.17%	<<< Misses
					
	Bad Kernel: Time between samples ~ 1 minute (estimated)				
	Usage		Above	Below	Above	Below
state 0	916		0	756	0.00%	82.53%
state 1	80230		2028	42791	2.53%	53.34%
state 2	59231		6888	6791	11.63%	11.47%
state 3	2455784	564797	0	23.00%	0.00%
					
total	2596161	573713	50338	24.04%	<<< Misses

It is not clear why the number of idle entries differs so much
between the tests, but there is a bit of a different distribution
of the workload among the CPUs.

rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
POLL, C1, C1E, C3 (disabled), C6

rds-stress-test						
	Good Kernel: Time between samples ~70 Seconds (estimated)					
	Usage	Above	Below	Above	Below	
state 0	1561	0	1435	0.00%	91.93%	
state 1	13855	899	2410	6.49%	17.39%	
state 2	467998	139254	23679	29.76%	5.06%	
state 3	0	0	0	0.00%	0.00%	
state 4	213132	107417	0	50.40%	0.00%	
						
total	696546	247570	27524	39.49%	<<< Misses	
						
	Bad Kernel: Time between samples ~ 70 Seconds (estimated)					
	Usage	Above	Below	Above	Below	
state 0	231	0	231	0.00%	100.00%	
state 1	5413	266	1186	4.91%	21.91%	
state 2	54365	719	3789	1.32%	6.97%	
state 3	0	0	0	0.00%	0.00%	
state 4	267055	148327	0	55.54%	0.00%	
						
total	327064	149312	5206	47.24%	<<< Misses	

Again, differing numbers of idle entries between tests.
This time the load distribution between CPUs is more
obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
In the "Good" case the work is distributed over more CPUs.
I assume without proof, that the scheduler is deciding not to migrate
the next bit of work to another CPU in the one case verses the other.

... Doug


[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]

[-- Attachment #3: jbb-summary.png --]
[-- Type: image/png, Size: 38778 bytes --]

[-- Attachment #4: fio-summary.png --]
[-- Type: image/png, Size: 34474 bytes --]

[-- Attachment #5: rds-stress-summary.png --]
[-- Type: image/png, Size: 37087 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-14 20:07                     ` Rafael J. Wysocki
@ 2026-01-29 10:23                       ` Harshvardhan Jha
  0 siblings, 0 replies; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-29 10:23 UTC (permalink / raw)
  To: Rafael J. Wysocki, Tomasz Figa
  Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin,
	Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano

Hi Rafael,

On 15/01/26 1:37 AM, Rafael J. Wysocki wrote:
> On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote:
>> Hi all,
>>
>> On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
>> <senozhatsky@chromium.org> wrote:
>>> Cc-ing Tomasz
>>>
>>> On (26/01/14 13:28), Sergey Senozhatsky wrote:
>>>> Hi,
>>>>
>>>> On (26/01/13 15:18), Rafael J. Wysocki wrote:
>>>> [..]
>>>>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The
>>>>>>> reapplication seems inevitable. I shall get back to you with these
>>>>>>> details also.
>>>>>> Yes, please, because I have another reason to restore the reverted commit.
>> Is the performance difference the reporter observed an actual
>> regression, or is it just a return to the level before the
>> optimization was merged into stable branches?
> Good question.
>
> Harshvardhan, which one is the case?

The commit first introduced a performance improvement and then with the
revert it returned back to the baseline.

Harshvardhan

>
>> If the latter, shouldn't avoiding regressions be a priority over further optimizing for other
>> users?
>>
>> If there is a really strong desire to reland this optimization, could
>> it at least be applied selectively to the CPUs that it's known to
>> help, or alternatively, made configurable?
> That wouldn't be easy in practice, but I think that it may be
> compensated by reducing the target residency values of the deepest
> idle states on those systems.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-28 23:53             ` Doug Smythies
@ 2026-01-29 22:27               ` Doug Smythies
  2026-01-30 19:28                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-29 22:27 UTC (permalink / raw)
  To: 'Harshvardhan Jha', 'Christian Loehle'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
	Doug Smythies

[-- Attachment #1: Type: text/plain, Size: 6908 bytes --]

On 2026.01.28 15:53 Doug Smythies wrote:
> On 2026.01.27 21:07 Doug Smythies wrote:
>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> ... snip ...
> 
>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>> from base to revert.
>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>> before and after the workload is usually fine to work with:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ 
> 
>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>> states for the test suites before and after tests. The folders with the
>>> name of the test contain two folders good-kernel and bad-kernel
>>> containing two files having the before and after states. Please note
>>> that different machines were used for different test suites due to
>>> compatibility reasons. The jbb test was run using containers.
> 
> Please provide the results of the test runs that were done for
> the supplied before and after idle data.
> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
> Is it a test I can run on my test computer?

I see that I have fio installed on my test computer.
 
>> It is a considerable amount of work to manually extract and summarize the data.
>> I have only done it for the phoronix-sqlite data.
> 
> I have done the rest now, see below.
> I have also attached the results, in case the formatting gets screwed up.
> 
>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>
>> phoronix-sqlite						
>>	Good Kernel: Time between samples 4 seconds (estimated and ???)					
>>		Usage	Above	Below	Above	Below	
>> state 0	220	0	218	0.00%	99.09%	
>> state 1	70212	5213	34602	7.42%	49.28%	
>> state 2	30273	5237	1806	17.30%	5.97%	
>> state 3	0	0	0	0.00%	0.00%	
>> state 4	11824	2120	0	17.93%	0.00%	
>>						
>> total		112529	12570	36626	43.72%	 <<< Misses %	
>>						
>>	Bad Kernel: Time between samples 3.8 seconds (estimated and ???)					
>>		Usage	Above	Below	Above	Below	
>> state 0	262	0	260	0.00%	99.24%	
>> state 1	62751	3985	35588	6.35%	56.71%	
>> state 2	24941	7896	1433	31.66%	5.75%	
>> state 3	0	0	0	0.00%	0.00%	
>> state 4	24489	11543	0	47.14%	0.00%	
>>						
>> total		112443	23424	37281	53.99%	 <<< Misses %
>>
>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>
>> I have a template now, and can summarize the other 40 CPU data
>> faster, but I would have to rework the template for the 56 CPU data,
>> and is it a 64 CPU data set at 4 idle states per CPU?
> 
> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
> POLL, C1, C1E, C3 (disabled), C6
> 
> 	Good Kernel: Time between samples > 2 hours (estimated)				
> 	Usage		Above		Below		Above	Below
> state 0	297550		0		296084		0.00%	99.51%
> state 1	8062854	341043		4962635	4.23%	61.55%
> state 2	56708358	12688379	6252051	22.37%	11.02%
> state 3	0		0		0		0.00%	0.00%
> state 4	54624476	15868752	0		29.05%	0.00%
> 					
> total	119693238	28898174	11510770	33.76%	<<< Misses
> 					
> 	Bad Kernel: Time between samples > 2 hours (estimated)				
> 	Usage		Above		Below		Above	Below
> state 0	90715		0		75134		0.00%	82.82%
> state 1	8878738	312970		6082180	3.52%	68.50%
> state 2	12048728	2576251	603316		21.38%	5.01%
> state 3	0		0		0		0.00%	0.00%
> state 4	85999424	44723273	0		52.00%	0.00%
> 					
> total	107017605	47612494	6760630	50.81%	<<< Misses
> 
> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
> 
> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
> 
> fio					
> 	Good Kernel: Time between samples ~ 1 minute (estimated)				
> 	Usage		Above	Below	Above	Below
> state 0	3822		0	3818	0.00%	99.90%
> state 1	148640		4406	68956	2.96%	46.39%
> state 2	593455		45344	105675	7.64%	17.81%
> state 3	3209648	807014	0	25.14%	0.00%
> 					
> total	3955565	856764	178449	26.17%	<<< Misses
> 					
> 	Bad Kernel: Time between samples ~ 1 minute (estimated)				
> 	Usage		Above	Below	Above	Below
> state 0	916		0	756	0.00%	82.53%
> state 1	80230		2028	42791	2.53%	53.34%
> state 2	59231		6888	6791	11.63%	11.47%
> state 3	2455784	564797	0	23.00%	0.00%
> 					
> total	2596161	573713	50338	24.04%	<<< Misses
> 
> It is not clear why the number of idle entries differs so much
> between the tests, but there is a bit of a different distribution
> of the workload among the CPUs.
> 
> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
> POLL, C1, C1E, C3 (disabled), C6
> 
> rds-stress-test						
> 	Good Kernel: Time between samples ~70 Seconds (estimated)					
> 	Usage	Above	Below	Above	Below	
> state 0	1561	0	1435	0.00%	91.93%	
> state 1	13855	899	2410	6.49%	17.39%	
> state 2	467998	139254	23679	29.76%	5.06%	
> state 3	0	0	0	0.00%	0.00%	
> state 4	213132	107417	0	50.40%	0.00%	
> 						
> total	696546	247570	27524	39.49%	<<< Misses	
> 						
> 	Bad Kernel: Time between samples ~ 70 Seconds (estimated)					
> 	Usage	Above	Below	Above	Below	
> state 0	231	0	231	0.00%	100.00%	
> state 1	5413	266	1186	4.91%	21.91%	
> state 2	54365	719	3789	1.32%	6.97%	
> state 3	0	0	0	0.00%	0.00%	
> state 4	267055	148327	0	55.54%	0.00%	
> 						
> total	327064	149312	5206	47.24%	<<< Misses	
> 
> Again, differing numbers of idle entries between tests.
> This time the load distribution between CPUs is more
> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
> In the "Good" case the work is distributed over more CPUs.
> I assume without proof, that the scheduler is deciding not to migrate
> the next bit of work to another CPU in the one case verses the other.

The above is incorrect. The CPUs involved between the "Good"
and "Bad" tests are very similar, mainly 2 CPUs with a little of
a 3rd and 4th. See the attached graph for more detail / clarity.

All of the tests show higher usage of shallower idle states with
the "Good" verses the "Bad", which was the expectation of the
original patch, as has been mentioned a few times in the emails.
 
My input is to revert the reversion.

... Doug


[-- Attachment #2: usage-v-cpu-v-state.png --]
[-- Type: image/png, Size: 104684 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-14  4:49                 ` Sergey Senozhatsky
  2026-01-14  5:15                   ` Tomasz Figa
@ 2026-01-29 22:47                   ` Doug Smythies
  1 sibling, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-01-29 22:47 UTC (permalink / raw)
  To: 'Sergey Senozhatsky', 'Tomasz Figa'
  Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
	'Christian Loehle', 'Sasha Levin',
	'Greg Kroah-Hartman', linux-pm, stable,
	'Daniel Lezcano', Doug Smythies

On 2026.01.13 20:50 Sergey Senozhatsky wrote:

> Cc-ing Tomasz
>
> On (26/01/14 13:28), Sergey Senozhatsky wrote:
>> Hi,
>> 
>> On (26/01/13 15:18), Rafael J. Wysocki wrote:
> [..]
>>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The
>>>>> reapplication seems inevitable. I shall get back to you with these
>>>>> details also.
>>>>
>>> > Yes, please, because I have another reason to restore the reverted commit.
>>> 
>>> Sergey, did you see a performance regression from 85975daeaa4d
>>> ("cpuidle: menu: Avoid discarding useful information") on any
>>> platforms other than the Jasper Lake it was reported for?
>> 
>> Let me try to dig it up.  I think I saw regressions on a number of
>> devices:
>> 
>> ---
>> cpu family      : 6
>> model           : 122
>> model name      : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
>> ---
>> cpu family      : 6
>> model           : 122
>> model name      : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
>> ---
>> cpu family      : 6
>> model           : 156
>> model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
>> ---
>> cpu family      : 6
>> model           : 156
>> model name      : Intel(R) Celeron(R) N4500 @ 1.10GHz
>> ---
>> cpu family      : 6
>> model           : 156
>> model name      : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
>> 

Those are all 6 watt TDP processors, the same as the earlier emails.
We know from the turbostat data that 6 watts is exceeded in the
test case and that it is likely that power limiting was involved.

I don't think we ever got to the bottom of it for your case.
I still would like to see test results where there is no chance
of throttling being involved via reducing the maximum
CPU frequency.

>> 
>> I guess family 6/model 122 is not Jasper Lake?
>> 
>> I also saw some where the patch in question seemed to improve the
>> metrics, but regressions are more important, so the revert simply
>> put all of the boards back to the previous state.

In all of our testing we saw some minor regressions in addition to
the improvements. Overall, the patch set it was deemed an improvement.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-29 22:27               ` Doug Smythies
@ 2026-01-30 19:28                 ` Rafael J. Wysocki
  2026-02-01 19:20                   ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-30 19:28 UTC (permalink / raw)
  To: Doug Smythies, Christian Loehle
  Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm,
	stable, Rafael J. Wysocki, Daniel Lezcano

On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2026.01.28 15:53 Doug Smythies wrote:
> > On 2026.01.27 21:07 Doug Smythies wrote:
> >> On 2026.01.27 07:45 Harshvardhan Jha wrote:
> >>> On 08/12/25 6:17 PM, Christian Loehle wrote:
> >>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
> >>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
> >>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
> >>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> > ... snip ...
> >
> >>>> It would be nice to get the idle states here, ideally how the states' usage changed
> >>>> from base to revert.
> >>>> The mentioned thread did this and should show how it can be done, but a dump of
> >>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> >>>> before and after the workload is usually fine to work with:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> >
> >>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> >>> states for the test suites before and after tests. The folders with the
> >>> name of the test contain two folders good-kernel and bad-kernel
> >>> containing two files having the before and after states. Please note
> >>> that different machines were used for different test suites due to
> >>> compatibility reasons. The jbb test was run using containers.
> >
> > Please provide the results of the test runs that were done for
> > the supplied before and after idle data.
> > In particular, what is the "fio" test and it results. Its idle data is not very revealing.
> > Is it a test I can run on my test computer?
>
> I see that I have fio installed on my test computer.
>
> >> It is a considerable amount of work to manually extract and summarize the data.
> >> I have only done it for the phoronix-sqlite data.
> >
> > I have done the rest now, see below.
> > I have also attached the results, in case the formatting gets screwed up.
> >
> >> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
> >> I remember seeing a Linux-pm email about why but couldn't find it just now.
> >> Summary (also attached as a PNG file, in case the formatting gets messed up):
> >> The total idle entries (usage)  and time seem low to me, which is why the ???.
> >>
> >> phoronix-sqlite
> >>      Good Kernel: Time between samples 4 seconds (estimated and ???)
> >>              Usage   Above   Below   Above   Below
> >> state 0      220     0       218     0.00%   99.09%
> >> state 1      70212   5213    34602   7.42%   49.28%
> >> state 2      30273   5237    1806    17.30%  5.97%
> >> state 3      0       0       0       0.00%   0.00%
> >> state 4      11824   2120    0       17.93%  0.00%
> >>
> >> total                112529  12570   36626   43.72%   <<< Misses %
> >>
> >>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
> >>              Usage   Above   Below   Above   Below
> >> state 0      262     0       260     0.00%   99.24%
> >> state 1      62751   3985    35588   6.35%   56.71%
> >> state 2      24941   7896    1433    31.66%  5.75%
> >> state 3      0       0       0       0.00%   0.00%
> >> state 4      24489   11543   0       47.14%  0.00%
> >>
> >> total                112443  23424   37281   53.99%   <<< Misses %
> >>
> >> Observe 2X use of idle state 4 for the "Bad Kernel"
> >>
> >> I have a template now, and can summarize the other 40 CPU data
> >> faster, but I would have to rework the template for the 56 CPU data,
> >> and is it a 64 CPU data set at 4 idle states per CPU?
> >
> > jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
> > POLL, C1, C1E, C3 (disabled), C6
> >
> >       Good Kernel: Time between samples > 2 hours (estimated)
> >       Usage           Above           Below           Above   Below
> > state 0       297550          0               296084          0.00%   99.51%
> > state 1       8062854 341043          4962635 4.23%   61.55%
> > state 2       56708358        12688379        6252051 22.37%  11.02%
> > state 3       0               0               0               0.00%   0.00%
> > state 4       54624476        15868752        0               29.05%  0.00%
> >
> > total 119693238       28898174        11510770        33.76%  <<< Misses
> >
> >       Bad Kernel: Time between samples > 2 hours (estimated)
> >       Usage           Above           Below           Above   Below
> > state 0       90715           0               75134           0.00%   82.82%
> > state 1       8878738 312970          6082180 3.52%   68.50%
> > state 2       12048728        2576251 603316          21.38%  5.01%
> > state 3       0               0               0               0.00%   0.00%
> > state 4       85999424        44723273        0               52.00%  0.00%
> >
> > total 107017605       47612494        6760630 50.81%  <<< Misses
> >
> > As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
> >
> > fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
> >
> > fio
> >       Good Kernel: Time between samples ~ 1 minute (estimated)
> >       Usage           Above   Below   Above   Below
> > state 0       3822            0       3818    0.00%   99.90%
> > state 1       148640          4406    68956   2.96%   46.39%
> > state 2       593455          45344   105675  7.64%   17.81%
> > state 3       3209648 807014  0       25.14%  0.00%
> >
> > total 3955565 856764  178449  26.17%  <<< Misses
> >
> >       Bad Kernel: Time between samples ~ 1 minute (estimated)
> >       Usage           Above   Below   Above   Below
> > state 0       916             0       756     0.00%   82.53%
> > state 1       80230           2028    42791   2.53%   53.34%
> > state 2       59231           6888    6791    11.63%  11.47%
> > state 3       2455784 564797  0       23.00%  0.00%
> >
> > total 2596161 573713  50338   24.04%  <<< Misses
> >
> > It is not clear why the number of idle entries differs so much
> > between the tests, but there is a bit of a different distribution
> > of the workload among the CPUs.
> >
> > rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
> > POLL, C1, C1E, C3 (disabled), C6
> >
> > rds-stress-test
> >       Good Kernel: Time between samples ~70 Seconds (estimated)
> >       Usage   Above   Below   Above   Below
> > state 0       1561    0       1435    0.00%   91.93%
> > state 1       13855   899     2410    6.49%   17.39%
> > state 2       467998  139254  23679   29.76%  5.06%
> > state 3       0       0       0       0.00%   0.00%
> > state 4       213132  107417  0       50.40%  0.00%
> >
> > total 696546  247570  27524   39.49%  <<< Misses
> >
> >       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
> >       Usage   Above   Below   Above   Below
> > state 0       231     0       231     0.00%   100.00%
> > state 1       5413    266     1186    4.91%   21.91%
> > state 2       54365   719     3789    1.32%   6.97%
> > state 3       0       0       0       0.00%   0.00%
> > state 4       267055  148327  0       55.54%  0.00%
> >
> > total 327064  149312  5206    47.24%  <<< Misses
> >
> > Again, differing numbers of idle entries between tests.
> > This time the load distribution between CPUs is more
> > obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
> > In the "Good" case the work is distributed over more CPUs.
> > I assume without proof, that the scheduler is deciding not to migrate
> > the next bit of work to another CPU in the one case verses the other.
>
> The above is incorrect. The CPUs involved between the "Good"
> and "Bad" tests are very similar, mainly 2 CPUs with a little of
> a 3rd and 4th. See the attached graph for more detail / clarity.
>
> All of the tests show higher usage of shallower idle states with
> the "Good" verses the "Bad", which was the expectation of the
> original patch, as has been mentioned a few times in the emails.
>
> My input is to revert the reversion.

OK, noted, thanks!

Christian, what do you think?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-01-30 19:28                 ` Rafael J. Wysocki
@ 2026-02-01 19:20                   ` Christian Loehle
  2026-02-02 17:31                     ` Harshvardhan Jha
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-01 19:20 UTC (permalink / raw)
  To: Rafael J. Wysocki, Doug Smythies
  Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm,
	stable, Daniel Lezcano, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 10013 bytes --]

On 1/30/26 19:28, Rafael J. Wysocki wrote:
> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>
>> On 2026.01.28 15:53 Doug Smythies wrote:
>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>> ... snip ...
>>>
>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>> from base to revert.
>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>> before and after the workload is usually fine to work with:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>
>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>> states for the test suites before and after tests. The folders with the
>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>> containing two files having the before and after states. Please note
>>>>> that different machines were used for different test suites due to
>>>>> compatibility reasons. The jbb test was run using containers.
>>>
>>> Please provide the results of the test runs that were done for
>>> the supplied before and after idle data.
>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>> Is it a test I can run on my test computer?
>>
>> I see that I have fio installed on my test computer.
>>
>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>> I have only done it for the phoronix-sqlite data.
>>>
>>> I have done the rest now, see below.
>>> I have also attached the results, in case the formatting gets screwed up.
>>>
>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>
>>>> phoronix-sqlite
>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>              Usage   Above   Below   Above   Below
>>>> state 0      220     0       218     0.00%   99.09%
>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>> state 3      0       0       0       0.00%   0.00%
>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>
>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>
>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>              Usage   Above   Below   Above   Below
>>>> state 0      262     0       260     0.00%   99.24%
>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>> state 3      0       0       0       0.00%   0.00%
>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>
>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>
>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>
>>>> I have a template now, and can summarize the other 40 CPU data
>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>
>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>> POLL, C1, C1E, C3 (disabled), C6
>>>
>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>       Usage           Above           Below           Above   Below
>>> state 0       297550          0               296084          0.00%   99.51%
>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>> state 3       0               0               0               0.00%   0.00%
>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>
>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>
>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>       Usage           Above           Below           Above   Below
>>> state 0       90715           0               75134           0.00%   82.82%
>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>> state 3       0               0               0               0.00%   0.00%
>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>
>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>
>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>
>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>
>>> fio
>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>       Usage           Above   Below   Above   Below
>>> state 0       3822            0       3818    0.00%   99.90%
>>> state 1       148640          4406    68956   2.96%   46.39%
>>> state 2       593455          45344   105675  7.64%   17.81%
>>> state 3       3209648 807014  0       25.14%  0.00%
>>>
>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>
>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>       Usage           Above   Below   Above   Below
>>> state 0       916             0       756     0.00%   82.53%
>>> state 1       80230           2028    42791   2.53%   53.34%
>>> state 2       59231           6888    6791    11.63%  11.47%
>>> state 3       2455784 564797  0       23.00%  0.00%
>>>
>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>
>>> It is not clear why the number of idle entries differs so much
>>> between the tests, but there is a bit of a different distribution
>>> of the workload among the CPUs.
>>>
>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>> POLL, C1, C1E, C3 (disabled), C6
>>>
>>> rds-stress-test
>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>       Usage   Above   Below   Above   Below
>>> state 0       1561    0       1435    0.00%   91.93%
>>> state 1       13855   899     2410    6.49%   17.39%
>>> state 2       467998  139254  23679   29.76%  5.06%
>>> state 3       0       0       0       0.00%   0.00%
>>> state 4       213132  107417  0       50.40%  0.00%
>>>
>>> total 696546  247570  27524   39.49%  <<< Misses
>>>
>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>       Usage   Above   Below   Above   Below
>>> state 0       231     0       231     0.00%   100.00%
>>> state 1       5413    266     1186    4.91%   21.91%
>>> state 2       54365   719     3789    1.32%   6.97%
>>> state 3       0       0       0       0.00%   0.00%
>>> state 4       267055  148327  0       55.54%  0.00%
>>>
>>> total 327064  149312  5206    47.24%  <<< Misses
>>>
>>> Again, differing numbers of idle entries between tests.
>>> This time the load distribution between CPUs is more
>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>> In the "Good" case the work is distributed over more CPUs.
>>> I assume without proof, that the scheduler is deciding not to migrate
>>> the next bit of work to another CPU in the one case verses the other.
>>
>> The above is incorrect. The CPUs involved between the "Good"
>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>
>> All of the tests show higher usage of shallower idle states with
>> the "Good" verses the "Bad", which was the expectation of the
>> original patch, as has been mentioned a few times in the emails.
>>
>> My input is to revert the reversion.
> 
> OK, noted, thanks!
> 
> Christian, what do you think?

I've attached readable diffs of the values provided the tldr is:

+--------------------+-----------+-----------+
| Workload           | Δ above % | Δ below % |
+--------------------+-----------+-----------+
| fio                |  -10.11   |  +2.36    |
| rds-stress-test    |   -0.44   |  +2.57    |
| jbb                |  -20.35   |  +3.30    |
| phoronix-sqlite    |   -9.66   |  -0.61    |
+--------------------+-----------+-----------+

I think the overall trend however is clear, the commit
85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
improved menu on many systems and workloads, I'd dare to say most.

Even on the reported regression introduced by it, the cpuidle governor
performed better on paper, system metrics regressed because other
CPUs' P-states weren't available due to being in a shallower state.
https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
(+CC Sergey)
It could be argued that this is a limitation of a per-CPU cpuidle
governor and a more holistic approach would be needed for that platform
(i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
skew towards deeper cpuidle states).

I also think that the change made sense, for small residency values
with a bit of random noise mixed in, performing the same statistical
test doesn't seem sensible, the short intervals will look noisier.

So options are:
1. Revert revert on mainline+stable
2. Revert revert on mainline only
3. Keep revert, miss out on the improvement for many.
4. Revert only when we have a good solution for the platforms like
Sergey's.

I'd lean towards 2 because 4 won't be easy, unless of course a minor
hack like playing with the deep idle state residency values would
be enough to mitigate.




[-- Attachment #2: diff_rds_stress-test.txt --]
[-- Type: text/plain, Size: 3573 bytes --]

bad-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 231
  usage                 : 231
  time                  : 4420
  avg_time_us_per_usage : 19.134199134199132
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 100.0

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 266
  below                 : 1186
  usage                 : 5413
  time                  : 264365
  avg_time_us_per_usage : 48.838906336597084
  above_pct_over_usage  : 4.914095695547755
  below_pct_over_usage  : 21.91021614631443

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 719
  below                 : 3789
  usage                 : 54365
  time                  : 11979213
  avg_time_us_per_usage : 220.34788926699164
  above_pct_over_usage  : 1.3225420767037617
  below_pct_over_usage  : 6.969557619792146

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 148327
  below                 : 0
  usage                 : 267055
  time                  : 3728423124
  avg_time_us_per_usage : 13961.255636479376
  above_pct_over_usage  : 55.54174233772069
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 149312
  below                 : 5206
  usage                 : 327064
  time                  : 3740671122
  avg_time_us_per_usage : 11437.122771078444
  above_pct_over_usage  : 45.6522270870533
  below_pct_over_usage  : 1.5917373969620625

good-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 1435
  usage                 : 1561
  time                  : 23997
  avg_time_us_per_usage : 15.372837924407431
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 91.92825112107623

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 899
  below                 : 2410
  usage                 : 13855
  time                  : 580895
  avg_time_us_per_usage : 41.9267412486467
  above_pct_over_usage  : 6.4886322627210395
  below_pct_over_usage  : 17.39444243955251

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 139254
  below                 : 23679
  usage                 : 467998
  time                  : 71044825
  avg_time_us_per_usage : 151.80583036679644
  above_pct_over_usage  : 29.755255364339163
  below_pct_over_usage  : 5.059637006995756

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 107417
  below                 : 0
  usage                 : 213132
  time                  : 3670745602
  avg_time_us_per_usage : 17222.87409680386
  above_pct_over_usage  : 50.399283073400525
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 247570
  below                 : 27524
  usage                 : 696546
  time                  : 3742395319
  avg_time_us_per_usage : 5372.789907629934
  above_pct_over_usage  : 35.54251980486573
  below_pct_over_usage  : 3.9514978192395045

[-- Attachment #3: diff_fio.txt --]
[-- Type: text/plain, Size: 3087 bytes --]

bad-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 756
  usage                 : 916
  time                  : 13304
  avg_time_us_per_usage : 14.524017467248909
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 82.53275109170306

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 2028
  below                 : 42791
  usage                 : 80230
  time                  : 14121043
  avg_time_us_per_usage : 176.00701732519008
  above_pct_over_usage  : 2.527732768291163
  below_pct_over_usage  : 53.33541069425402

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 6888
  below                 : 6791
  usage                 : 59231
  time                  : 9518489
  avg_time_us_per_usage : 160.70113622933937
  above_pct_over_usage  : 11.629045601121035
  below_pct_over_usage  : 11.465280005402576

index3 C6  latency=92 residency=276  desc='MWAIT 0x20'
  above                 : 564797
  below                 : 0
  usage                 : 2455784
  time                  : 3656232645
  avg_time_us_per_usage : 1488.8250127047004
  above_pct_over_usage  : 22.99864320314816
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 573713
  below                 : 50338
  usage                 : 2596161
  time                  : 3679885481
  avg_time_us_per_usage : 1417.4334646426012
  above_pct_over_usage  : 22.09851392113201
  below_pct_over_usage  : 1.9389398423287307

good-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 3818
  usage                 : 3822
  time                  : 84618
  avg_time_us_per_usage : 22.139717425431712
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 99.8953427524856

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 4406
  below                 : 68956
  usage                 : 148640
  time                  : 22527422
  avg_time_us_per_usage : 151.55692949407967
  above_pct_over_usage  : 2.9642088266953714
  below_pct_over_usage  : 46.39128094725511

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 45344
  below                 : 105675
  usage                 : 593455
  time                  : 121006522
  avg_time_us_per_usage : 203.9017650874961
  above_pct_over_usage  : 7.640680422272961
  below_pct_over_usage  : 17.806741875963638

index3 C6  latency=92 residency=276  desc='MWAIT 0x20'
  above                 : 807014
  below                 : 0
  usage                 : 3209648
  time                  : 3510698554
  avg_time_us_per_usage : 1093.7955046783945
  above_pct_over_usage  : 25.14338020867086
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 856764
  below                 : 178449
  usage                 : 3955565
  time                  : 3654317116
  avg_time_us_per_usage : 923.8420089165518
  above_pct_over_usage  : 21.65971232933854
  below_pct_over_usage  : 4.511340352136799


[-- Attachment #4: diff_jbb.txt --]
[-- Type: text/plain, Size: 3681 bytes --]

jbb:
bad-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 75134
  usage                 : 90715
  time                  : 1563844
  avg_time_us_per_usage : 17.239089455988534
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 82.82422973047456

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 312970
  below                 : 6082180
  usage                 : 8878738
  time                  : 1015443901
  avg_time_us_per_usage : 114.3680443099008
  above_pct_over_usage  : 3.5249378909480154
  below_pct_over_usage  : 68.50275343185034

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 2576251
  below                 : 603316
  usage                 : 12048728
  time                  : 1478419300
  avg_time_us_per_usage : 122.70335092633844
  above_pct_over_usage  : 21.38193342898935
  below_pct_over_usage  : 5.007300355688999

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 44723273
  below                 : 0
  usage                 : 85999424
  time                  : 300384703093
  avg_time_us_per_usage : 3492.868778900194
  above_pct_over_usage  : 52.0041541208462
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 47612494
  below                 : 6760630
  usage                 : 107017605
  time                  : 302880130138
  avg_time_us_per_usage : 2830.1897630581434
  above_pct_over_usage  : 44.490337828061094
  below_pct_over_usage  : 6.317306390850365
  
good-kernel

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 296084
  usage                 : 297550
  time                  : 5996079
  avg_time_us_per_usage : 20.151500588136447
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 99.50730969584944

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 341043
  below                 : 4962635
  usage                 : 8062854
  time                  : 994358306
  avg_time_us_per_usage : 123.32584789455446
  above_pct_over_usage  : 4.229804979725541
  below_pct_over_usage  : 61.549359569204654

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 12688379
  below                 : 6252051
  usage                 : 56708358
  time                  : 9902266327
  avg_time_us_per_usage : 174.61740519801333
  above_pct_over_usage  : 22.37479526386569
  below_pct_over_usage  : 11.024919818697624

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 15868752
  below                 : 0
  usage                 : 54624476
  time                  : 276236627330
  avg_time_us_per_usage : 5057.011939666021
  above_pct_over_usage  : 29.050625584033064
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 28898174
  below                 : 11510770
  usage                 : 119693238
  time                  : 287139248042
  avg_time_us_per_usage : 2398.9596475115827
  above_pct_over_usage  : 24.143530982092738
  below_pct_over_usage  : 9.616892476415417


[-- Attachment #5: diff_phoronix-sqlite.txt --]
[-- Type: text/plain, Size: 3571 bytes --]

bad-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 260
  usage                 : 262
  time                  : 6634
  avg_time_us_per_usage : 25.3206106870229
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 99.23664122137404

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 3985
  below                 : 35588
  usage                 : 62751
  time                  : 2972131
  avg_time_us_per_usage : 47.36388264728849
  above_pct_over_usage  : 6.35049640643177
  below_pct_over_usage  : 56.71304042963459

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 7896
  below                 : 1433
  usage                 : 24941
  time                  : 2185241
  avg_time_us_per_usage : 87.61641473878353
  above_pct_over_usage  : 31.65871456637665
  below_pct_over_usage  : 5.745559520468305

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 11543
  below                 : 0
  usage                 : 24489
  time                  : 123920172
  avg_time_us_per_usage : 5060.238147739801
  above_pct_over_usage  : 47.135448568745154
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 23424
  below                 : 37281
  usage                 : 112443
  time                  : 129084178
  avg_time_us_per_usage : 1147.996567149578
  above_pct_over_usage  : 20.831888156666043
  below_pct_over_usage  : 33.155465435820815

good-kernel:

index0 POLL  latency=0 residency=0  desc='CPUIDLE CORE POLL IDLE'
  above                 : 0
  below                 : 218
  usage                 : 220
  time                  : 5595
  avg_time_us_per_usage : 25.431818181818183
  above_pct_over_usage  : 0.0
  below_pct_over_usage  : 99.0909090909091

index1 C1  latency=2 residency=2  desc='MWAIT 0x00'
  above                 : 5213
  below                 : 34602
  usage                 : 70212
  time                  : 3288043
  avg_time_us_per_usage : 46.83021420839742
  above_pct_over_usage  : 7.424656753831254
  below_pct_over_usage  : 49.28217398735259

index2 C1E  latency=10 residency=20  desc='MWAIT 0x01'
  above                 : 5237
  below                 : 1806
  usage                 : 30273
  time                  : 4825950
  avg_time_us_per_usage : 159.41432960063423
  above_pct_over_usage  : 17.299243550358405
  below_pct_over_usage  : 5.965712020612427

index3 C3  latency=40 residency=100  desc='MWAIT 0x10'
  above                 : 0
  below                 : 0
  usage                 : 0
  time                  : 0
  avg_time_us_per_usage : nan
  above_pct_over_usage  : nan
  below_pct_over_usage  : nan

index4 C6  latency=133 residency=400  desc='MWAIT 0x20'
  above                 : 2120
  below                 : 0
  usage                 : 11824
  time                  : 124762813
  avg_time_us_per_usage : 10551.658744925575
  above_pct_over_usage  : 17.929634641407308
  below_pct_over_usage  : 0.0

OVERALL
  above                 : 12570
  below                 : 36626
  usage                 : 112529
  time                  : 132882401
  avg_time_us_per_usage : 1180.8724950901546
  above_pct_over_usage  : 11.170453838566058
  below_pct_over_usage  : 32.54805427934132


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-01 19:20                   ` Christian Loehle
@ 2026-02-02 17:31                     ` Harshvardhan Jha
  2026-02-03  9:07                       ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-02 17:31 UTC (permalink / raw)
  To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky


On 02/02/26 12:50 AM, Christian Loehle wrote:
> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>> ... snip ...
>>>>
>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>> from base to revert.
>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>> before and after the workload is usually fine to work with:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>> states for the test suites before and after tests. The folders with the
>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>> containing two files having the before and after states. Please note
>>>>>> that different machines were used for different test suites due to
>>>>>> compatibility reasons. The jbb test was run using containers.
>>>> Please provide the results of the test runs that were done for
>>>> the supplied before and after idle data.
>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>> Is it a test I can run on my test computer?
>>> I see that I have fio installed on my test computer.
>>>
>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>> I have only done it for the phoronix-sqlite data.
>>>> I have done the rest now, see below.
>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>
>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>
>>>>> phoronix-sqlite
>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>              Usage   Above   Below   Above   Below
>>>>> state 0      220     0       218     0.00%   99.09%
>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>> state 3      0       0       0       0.00%   0.00%
>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>
>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>
>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>              Usage   Above   Below   Above   Below
>>>>> state 0      262     0       260     0.00%   99.24%
>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>> state 3      0       0       0       0.00%   0.00%
>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>
>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>
>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>
>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>
>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>       Usage           Above           Below           Above   Below
>>>> state 0       297550          0               296084          0.00%   99.51%
>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>> state 3       0               0               0               0.00%   0.00%
>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>
>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>
>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>       Usage           Above           Below           Above   Below
>>>> state 0       90715           0               75134           0.00%   82.82%
>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>> state 3       0               0               0               0.00%   0.00%
>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>
>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>
>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>
>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>
>>>> fio
>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>       Usage           Above   Below   Above   Below
>>>> state 0       3822            0       3818    0.00%   99.90%
>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>
>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>
>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>       Usage           Above   Below   Above   Below
>>>> state 0       916             0       756     0.00%   82.53%
>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>
>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>
>>>> It is not clear why the number of idle entries differs so much
>>>> between the tests, but there is a bit of a different distribution
>>>> of the workload among the CPUs.
>>>>
>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>
>>>> rds-stress-test
>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>       Usage   Above   Below   Above   Below
>>>> state 0       1561    0       1435    0.00%   91.93%
>>>> state 1       13855   899     2410    6.49%   17.39%
>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>> state 3       0       0       0       0.00%   0.00%
>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>
>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>
>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>       Usage   Above   Below   Above   Below
>>>> state 0       231     0       231     0.00%   100.00%
>>>> state 1       5413    266     1186    4.91%   21.91%
>>>> state 2       54365   719     3789    1.32%   6.97%
>>>> state 3       0       0       0       0.00%   0.00%
>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>
>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>
>>>> Again, differing numbers of idle entries between tests.
>>>> This time the load distribution between CPUs is more
>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>> In the "Good" case the work is distributed over more CPUs.
>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>> the next bit of work to another CPU in the one case verses the other.
>>> The above is incorrect. The CPUs involved between the "Good"
>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>
>>> All of the tests show higher usage of shallower idle states with
>>> the "Good" verses the "Bad", which was the expectation of the
>>> original patch, as has been mentioned a few times in the emails.
>>>
>>> My input is to revert the reversion.
>> OK, noted, thanks!
>>
>> Christian, what do you think?
> I've attached readable diffs of the values provided the tldr is:
>
> +--------------------+-----------+-----------+
> | Workload           | Δ above % | Δ below % |
> +--------------------+-----------+-----------+
> | fio                |  -10.11   |  +2.36    |
> | rds-stress-test    |   -0.44   |  +2.57    |
> | jbb                |  -20.35   |  +3.30    |
> | phoronix-sqlite    |   -9.66   |  -0.61    |
> +--------------------+-----------+-----------+
>
> I think the overall trend however is clear, the commit
> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
> improved menu on many systems and workloads, I'd dare to say most.
>
> Even on the reported regression introduced by it, the cpuidle governor
> performed better on paper, system metrics regressed because other
> CPUs' P-states weren't available due to being in a shallower state.
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
> (+CC Sergey)
> It could be argued that this is a limitation of a per-CPU cpuidle
> governor and a more holistic approach would be needed for that platform
> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
> skew towards deeper cpuidle states).
>
> I also think that the change made sense, for small residency values
> with a bit of random noise mixed in, performing the same statistical
> test doesn't seem sensible, the short intervals will look noisier.
>
> So options are:
> 1. Revert revert on mainline+stable
> 2. Revert revert on mainline only
> 3. Keep revert, miss out on the improvement for many.
> 4. Revert only when we have a good solution for the platforms like
> Sergey's.
>
> I'd lean towards 2 because 4 won't be easy, unless of course a minor
> hack like playing with the deep idle state residency values would
> be enough to mitigate.

Wouldn't it be better to choose option 1 as reverting the revert has
even more pronounced improvements on older kernels? I've tested this on
6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
Since the revert has optimizations present only in Jasper Lake Systems
which is new, isn't reverting the revert more relevant on stable
kernels? It's more likely that older hardware runs older kernels than
newer hardware although not always necessary imo.

Harshvardhan

>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-02 17:31                     ` Harshvardhan Jha
@ 2026-02-03  9:07                       ` Christian Loehle
  2026-02-03  9:16                         ` Harshvardhan Jha
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-03  9:07 UTC (permalink / raw)
  To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky

On 2/2/26 17:31, Harshvardhan Jha wrote:
> 
> On 02/02/26 12:50 AM, Christian Loehle wrote:
>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> ... snip ...
>>>>>
>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>> from base to revert.
>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>> containing two files having the before and after states. Please note
>>>>>>> that different machines were used for different test suites due to
>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>> Please provide the results of the test runs that were done for
>>>>> the supplied before and after idle data.
>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>> Is it a test I can run on my test computer?
>>>> I see that I have fio installed on my test computer.
>>>>
>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>> I have only done it for the phoronix-sqlite data.
>>>>> I have done the rest now, see below.
>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>
>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>>
>>>>>> phoronix-sqlite
>>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>              Usage   Above   Below   Above   Below
>>>>>> state 0      220     0       218     0.00%   99.09%
>>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>>
>>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>>
>>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>              Usage   Above   Below   Above   Below
>>>>>> state 0      262     0       260     0.00%   99.24%
>>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>>
>>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>>
>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>
>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>
>>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>>       Usage           Above           Below           Above   Below
>>>>> state 0       297550          0               296084          0.00%   99.51%
>>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>>> state 3       0               0               0               0.00%   0.00%
>>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>>
>>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>>
>>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>       Usage           Above           Below           Above   Below
>>>>> state 0       90715           0               75134           0.00%   82.82%
>>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>>> state 3       0               0               0               0.00%   0.00%
>>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>>
>>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>>
>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>
>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>
>>>>> fio
>>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>       Usage           Above   Below   Above   Below
>>>>> state 0       3822            0       3818    0.00%   99.90%
>>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>>
>>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>>
>>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>       Usage           Above   Below   Above   Below
>>>>> state 0       916             0       756     0.00%   82.53%
>>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>>
>>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>>
>>>>> It is not clear why the number of idle entries differs so much
>>>>> between the tests, but there is a bit of a different distribution
>>>>> of the workload among the CPUs.
>>>>>
>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>
>>>>> rds-stress-test
>>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>       Usage   Above   Below   Above   Below
>>>>> state 0       1561    0       1435    0.00%   91.93%
>>>>> state 1       13855   899     2410    6.49%   17.39%
>>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>>> state 3       0       0       0       0.00%   0.00%
>>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>>
>>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>>
>>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>       Usage   Above   Below   Above   Below
>>>>> state 0       231     0       231     0.00%   100.00%
>>>>> state 1       5413    266     1186    4.91%   21.91%
>>>>> state 2       54365   719     3789    1.32%   6.97%
>>>>> state 3       0       0       0       0.00%   0.00%
>>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>>
>>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>>
>>>>> Again, differing numbers of idle entries between tests.
>>>>> This time the load distribution between CPUs is more
>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>> the next bit of work to another CPU in the one case verses the other.
>>>> The above is incorrect. The CPUs involved between the "Good"
>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>
>>>> All of the tests show higher usage of shallower idle states with
>>>> the "Good" verses the "Bad", which was the expectation of the
>>>> original patch, as has been mentioned a few times in the emails.
>>>>
>>>> My input is to revert the reversion.
>>> OK, noted, thanks!
>>>
>>> Christian, what do you think?
>> I've attached readable diffs of the values provided the tldr is:
>>
>> +--------------------+-----------+-----------+
>> | Workload           | Δ above % | Δ below % |
>> +--------------------+-----------+-----------+
>> | fio                |  -10.11   |  +2.36    |
>> | rds-stress-test    |   -0.44   |  +2.57    |
>> | jbb                |  -20.35   |  +3.30    |
>> | phoronix-sqlite    |   -9.66   |  -0.61    |
>> +--------------------+-----------+-----------+
>>
>> I think the overall trend however is clear, the commit
>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>> improved menu on many systems and workloads, I'd dare to say most.
>>
>> Even on the reported regression introduced by it, the cpuidle governor
>> performed better on paper, system metrics regressed because other
>> CPUs' P-states weren't available due to being in a shallower state.
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
>> (+CC Sergey)
>> It could be argued that this is a limitation of a per-CPU cpuidle
>> governor and a more holistic approach would be needed for that platform
>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>> skew towards deeper cpuidle states).
>>
>> I also think that the change made sense, for small residency values
>> with a bit of random noise mixed in, performing the same statistical
>> test doesn't seem sensible, the short intervals will look noisier.
>>
>> So options are:
>> 1. Revert revert on mainline+stable
>> 2. Revert revert on mainline only
>> 3. Keep revert, miss out on the improvement for many.
>> 4. Revert only when we have a good solution for the platforms like
>> Sergey's.
>>
>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>> hack like playing with the deep idle state residency values would
>> be enough to mitigate.
> 
> Wouldn't it be better to choose option 1 as reverting the revert has
> even more pronounced improvements on older kernels? I've tested this on
> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
> Since the revert has optimizations present only in Jasper Lake Systems
> which is new, isn't reverting the revert more relevant on stable
> kernels? It's more likely that older hardware runs older kernels than
> newer hardware although not always necessary imo.
> 

FWIW Jasper Lake seems to be supported from 5.6 on, see
b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03  9:07                       ` Christian Loehle
@ 2026-02-03  9:16                         ` Harshvardhan Jha
  2026-02-03  9:31                           ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-03  9:16 UTC (permalink / raw)
  To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky


On 03/02/26 2:37 PM, Christian Loehle wrote:
> On 2/2/26 17:31, Harshvardhan Jha wrote:
>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>> ... snip ...
>>>>>>
>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>> from base to revert.
>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>> that different machines were used for different test suites due to
>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>> Please provide the results of the test runs that were done for
>>>>>> the supplied before and after idle data.
>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>> Is it a test I can run on my test computer?
>>>>> I see that I have fio installed on my test computer.
>>>>>
>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>> I have done the rest now, see below.
>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>
>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>>>
>>>>>>> phoronix-sqlite
>>>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>> state 0      220     0       218     0.00%   99.09%
>>>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>>>
>>>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>>>
>>>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>> state 0      262     0       260     0.00%   99.24%
>>>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>>>
>>>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>>>
>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>
>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>
>>>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>       Usage           Above           Below           Above   Below
>>>>>> state 0       297550          0               296084          0.00%   99.51%
>>>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>>>
>>>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>>>
>>>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>       Usage           Above           Below           Above   Below
>>>>>> state 0       90715           0               75134           0.00%   82.82%
>>>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>>>
>>>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>>>
>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>
>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>
>>>>>> fio
>>>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>       Usage           Above   Below   Above   Below
>>>>>> state 0       3822            0       3818    0.00%   99.90%
>>>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>>>
>>>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>>>
>>>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>       Usage           Above   Below   Above   Below
>>>>>> state 0       916             0       756     0.00%   82.53%
>>>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>>>
>>>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>>>
>>>>>> It is not clear why the number of idle entries differs so much
>>>>>> between the tests, but there is a bit of a different distribution
>>>>>> of the workload among the CPUs.
>>>>>>
>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>
>>>>>> rds-stress-test
>>>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>       Usage   Above   Below   Above   Below
>>>>>> state 0       1561    0       1435    0.00%   91.93%
>>>>>> state 1       13855   899     2410    6.49%   17.39%
>>>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>>>
>>>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>>>
>>>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>       Usage   Above   Below   Above   Below
>>>>>> state 0       231     0       231     0.00%   100.00%
>>>>>> state 1       5413    266     1186    4.91%   21.91%
>>>>>> state 2       54365   719     3789    1.32%   6.97%
>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>>>
>>>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>>>
>>>>>> Again, differing numbers of idle entries between tests.
>>>>>> This time the load distribution between CPUs is more
>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>
>>>>> All of the tests show higher usage of shallower idle states with
>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>
>>>>> My input is to revert the reversion.
>>>> OK, noted, thanks!
>>>>
>>>> Christian, what do you think?
>>> I've attached readable diffs of the values provided the tldr is:
>>>
>>> +--------------------+-----------+-----------+
>>> | Workload           | Δ above % | Δ below % |
>>> +--------------------+-----------+-----------+
>>> | fio                |  -10.11   |  +2.36    |
>>> | rds-stress-test    |   -0.44   |  +2.57    |
>>> | jbb                |  -20.35   |  +3.30    |
>>> | phoronix-sqlite    |   -9.66   |  -0.61    |
>>> +--------------------+-----------+-----------+
>>>
>>> I think the overall trend however is clear, the commit
>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>> improved menu on many systems and workloads, I'd dare to say most.
>>>
>>> Even on the reported regression introduced by it, the cpuidle governor
>>> performed better on paper, system metrics regressed because other
>>> CPUs' P-states weren't available due to being in a shallower state.
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
>>> (+CC Sergey)
>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>> governor and a more holistic approach would be needed for that platform
>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>> skew towards deeper cpuidle states).
>>>
>>> I also think that the change made sense, for small residency values
>>> with a bit of random noise mixed in, performing the same statistical
>>> test doesn't seem sensible, the short intervals will look noisier.
>>>
>>> So options are:
>>> 1. Revert revert on mainline+stable
>>> 2. Revert revert on mainline only
>>> 3. Keep revert, miss out on the improvement for many.
>>> 4. Revert only when we have a good solution for the platforms like
>>> Sergey's.
>>>
>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>> hack like playing with the deep idle state residency values would
>>> be enough to mitigate.
>> Wouldn't it be better to choose option 1 as reverting the revert has
>> even more pronounced improvements on older kernels? I've tested this on
>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>> Since the revert has optimizations present only in Jasper Lake Systems
>> which is new, isn't reverting the revert more relevant on stable
>> kernels? It's more likely that older hardware runs older kernels than
>> newer hardware although not always necessary imo.
>>
> FWIW Jasper Lake seems to be supported from 5.6 on, see
> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")

Oh I see, but shouldn't avoiding regressions on established platforms be
a priority over further optimizing for specific newer platforms like
Jasper Lake?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03  9:16                         ` Harshvardhan Jha
@ 2026-02-03  9:31                           ` Christian Loehle
  2026-02-03 10:22                             ` Harshvardhan Jha
  2026-02-03 16:45                             ` Rafael J. Wysocki
  0 siblings, 2 replies; 44+ messages in thread
From: Christian Loehle @ 2026-02-03  9:31 UTC (permalink / raw)
  To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky

On 2/3/26 09:16, Harshvardhan Jha wrote:
> 
> On 03/02/26 2:37 PM, Christian Loehle wrote:
>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>> ... snip ...
>>>>>>>
>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>> from base to revert.
>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>> Please provide the results of the test runs that were done for
>>>>>>> the supplied before and after idle data.
>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>> Is it a test I can run on my test computer?
>>>>>> I see that I have fio installed on my test computer.
>>>>>>
>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>> I have done the rest now, see below.
>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>
>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>>>>
>>>>>>>> phoronix-sqlite
>>>>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>> state 0      220     0       218     0.00%   99.09%
>>>>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>>>>
>>>>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>>>>
>>>>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>> state 0      262     0       260     0.00%   99.24%
>>>>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>>>>
>>>>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>>>>
>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>
>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>
>>>>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>> state 0       297550          0               296084          0.00%   99.51%
>>>>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>>>>
>>>>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>>>>
>>>>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>> state 0       90715           0               75134           0.00%   82.82%
>>>>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>>>>
>>>>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>>>>
>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>
>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>
>>>>>>> fio
>>>>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>> state 0       3822            0       3818    0.00%   99.90%
>>>>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>>>>
>>>>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>>>>
>>>>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>> state 0       916             0       756     0.00%   82.53%
>>>>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>>>>
>>>>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>>>>
>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>> of the workload among the CPUs.
>>>>>>>
>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>
>>>>>>> rds-stress-test
>>>>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>> state 0       1561    0       1435    0.00%   91.93%
>>>>>>> state 1       13855   899     2410    6.49%   17.39%
>>>>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>>>>
>>>>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>>>>
>>>>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>> state 0       231     0       231     0.00%   100.00%
>>>>>>> state 1       5413    266     1186    4.91%   21.91%
>>>>>>> state 2       54365   719     3789    1.32%   6.97%
>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>>>>
>>>>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>>>>
>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>> This time the load distribution between CPUs is more
>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>
>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>
>>>>>> My input is to revert the reversion.
>>>>> OK, noted, thanks!
>>>>>
>>>>> Christian, what do you think?
>>>> I've attached readable diffs of the values provided the tldr is:
>>>>
>>>> +--------------------+-----------+-----------+
>>>> | Workload           | Δ above % | Δ below % |
>>>> +--------------------+-----------+-----------+
>>>> | fio                |  -10.11   |  +2.36    |
>>>> | rds-stress-test    |   -0.44   |  +2.57    |
>>>> | jbb                |  -20.35   |  +3.30    |
>>>> | phoronix-sqlite    |   -9.66   |  -0.61    |
>>>> +--------------------+-----------+-----------+
>>>>
>>>> I think the overall trend however is clear, the commit
>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>
>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>> performed better on paper, system metrics regressed because other
>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
>>>> (+CC Sergey)
>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>> governor and a more holistic approach would be needed for that platform
>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>> skew towards deeper cpuidle states).
>>>>
>>>> I also think that the change made sense, for small residency values
>>>> with a bit of random noise mixed in, performing the same statistical
>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>
>>>> So options are:
>>>> 1. Revert revert on mainline+stable
>>>> 2. Revert revert on mainline only
>>>> 3. Keep revert, miss out on the improvement for many.
>>>> 4. Revert only when we have a good solution for the platforms like
>>>> Sergey's.
>>>>
>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>> hack like playing with the deep idle state residency values would
>>>> be enough to mitigate.
>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>> even more pronounced improvements on older kernels? I've tested this on
>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>> Since the revert has optimizations present only in Jasper Lake Systems
>>> which is new, isn't reverting the revert more relevant on stable
>>> kernels? It's more likely that older hardware runs older kernels than
>>> newer hardware although not always necessary imo.
>>>
>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
> 
> Oh I see, but shouldn't avoiding regressions on established platforms be
> a priority over further optimizing for specific newer platforms like
> Jasper Lake?
> 

Well avoiding regressions on established platforms is what lead to
10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
being applied and backported.
The expectation for stable is that we avoid regressions and potentially
miss out on improvements. If you want the latest greatest performance you
should probably run a latest greatest kernel.
The original
85975daeaa4d cpuidle: menu: Avoid discarding useful information
was seen as a fix and overall improvement, that's why it was backported,
but Sergey's regression report contradicted that.
What is "established" and "newer" for a stable kernel is quite handwavy
IMO but even here Sergey's regression report is a clear data point...
Your report is only restoring 5.15 (and others) performance to 5.15
upstream-ish levels which is within the expectations of running a stable
kernel. No doubt it's frustrating either way!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03  9:31                           ` Christian Loehle
@ 2026-02-03 10:22                             ` Harshvardhan Jha
  2026-02-03 10:30                               ` Christian Loehle
  2026-02-03 16:45                             ` Rafael J. Wysocki
  1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-03 10:22 UTC (permalink / raw)
  To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky


On 03/02/26 3:01 PM, Christian Loehle wrote:
> On 2/3/26 09:16, Harshvardhan Jha wrote:
>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>>> ... snip ...
>>>>>>>>
>>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>>> from base to revert.
>>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>>> Please provide the results of the test runs that were done for
>>>>>>>> the supplied before and after idle data.
>>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>>> Is it a test I can run on my test computer?
>>>>>>> I see that I have fio installed on my test computer.
>>>>>>>
>>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>>> I have done the rest now, see below.
>>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>>
>>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>>>>>
>>>>>>>>> phoronix-sqlite
>>>>>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>>> state 0      220     0       218     0.00%   99.09%
>>>>>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>>>>>
>>>>>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>>>>>
>>>>>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>>> state 0      262     0       260     0.00%   99.24%
>>>>>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>>>>>
>>>>>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>>>>>
>>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>
>>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>
>>>>>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>>> state 0       297550          0               296084          0.00%   99.51%
>>>>>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>>>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>>>>>
>>>>>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>>>>>
>>>>>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>>> state 0       90715           0               75134           0.00%   82.82%
>>>>>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>>>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>>>>>
>>>>>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>>>>>
>>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>>
>>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>>
>>>>>>>> fio
>>>>>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>>> state 0       3822            0       3818    0.00%   99.90%
>>>>>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>>>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>>>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>>>>>
>>>>>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>>>>>
>>>>>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>>> state 0       916             0       756     0.00%   82.53%
>>>>>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>>>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>>>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>>>>>
>>>>>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>>>>>
>>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>>> of the workload among the CPUs.
>>>>>>>>
>>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>
>>>>>>>> rds-stress-test
>>>>>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>>> state 0       1561    0       1435    0.00%   91.93%
>>>>>>>> state 1       13855   899     2410    6.49%   17.39%
>>>>>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>>>>>
>>>>>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>>>>>
>>>>>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>>> state 0       231     0       231     0.00%   100.00%
>>>>>>>> state 1       5413    266     1186    4.91%   21.91%
>>>>>>>> state 2       54365   719     3789    1.32%   6.97%
>>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>>>>>
>>>>>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>>>>>
>>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>>> This time the load distribution between CPUs is more
>>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>>
>>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>>
>>>>>>> My input is to revert the reversion.
>>>>>> OK, noted, thanks!
>>>>>>
>>>>>> Christian, what do you think?
>>>>> I've attached readable diffs of the values provided the tldr is:
>>>>>
>>>>> +--------------------+-----------+-----------+
>>>>> | Workload           | Δ above % | Δ below % |
>>>>> +--------------------+-----------+-----------+
>>>>> | fio                |  -10.11   |  +2.36    |
>>>>> | rds-stress-test    |   -0.44   |  +2.57    |
>>>>> | jbb                |  -20.35   |  +3.30    |
>>>>> | phoronix-sqlite    |   -9.66   |  -0.61    |
>>>>> +--------------------+-----------+-----------+
>>>>>
>>>>> I think the overall trend however is clear, the commit
>>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>>
>>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>>> performed better on paper, system metrics regressed because other
>>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
>>>>> (+CC Sergey)
>>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>>> governor and a more holistic approach would be needed for that platform
>>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>>> skew towards deeper cpuidle states).
>>>>>
>>>>> I also think that the change made sense, for small residency values
>>>>> with a bit of random noise mixed in, performing the same statistical
>>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>>
>>>>> So options are:
>>>>> 1. Revert revert on mainline+stable
>>>>> 2. Revert revert on mainline only
>>>>> 3. Keep revert, miss out on the improvement for many.
>>>>> 4. Revert only when we have a good solution for the platforms like
>>>>> Sergey's.
>>>>>
>>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>>> hack like playing with the deep idle state residency values would
>>>>> be enough to mitigate.
>>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>>> even more pronounced improvements on older kernels? I've tested this on
>>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>>> Since the revert has optimizations present only in Jasper Lake Systems
>>>> which is new, isn't reverting the revert more relevant on stable
>>>> kernels? It's more likely that older hardware runs older kernels than
>>>> newer hardware although not always necessary imo.
>>>>
>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>> Oh I see, but shouldn't avoiding regressions on established platforms be
>> a priority over further optimizing for specific newer platforms like
>> Jasper Lake?
>>
> Well avoiding regressions on established platforms is what lead to
> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
> being applied and backported.
> The expectation for stable is that we avoid regressions and potentially
> miss out on improvements. If you want the latest greatest performance you
> should probably run a latest greatest kernel.
> The original
> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
> was seen as a fix and overall improvement, that's why it was backported,
> but Sergey's regression report contradicted that.
> What is "established" and "newer" for a stable kernel is quite handwavy
> IMO but even here Sergey's regression report is a clear data point...
> Your report is only restoring 5.15 (and others) performance to 5.15
> upstream-ish levels which is within the expectations of running a stable
> kernel. No doubt it's frustrating either way!
Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels
with the revert and reapplying it recovers performance across many
benchmarks. Hence, the previous suggestion.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03 10:22                             ` Harshvardhan Jha
@ 2026-02-03 10:30                               ` Christian Loehle
  0 siblings, 0 replies; 44+ messages in thread
From: Christian Loehle @ 2026-02-03 10:30 UTC (permalink / raw)
  To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
  Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky

On 2/3/26 10:22, Harshvardhan Jha wrote:
> 
> On 03/02/26 3:01 PM, Christian Loehle wrote:
>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>>>> ... snip ...
>>>>>>>>>
>>>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>>>> from base to revert.
>>>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>>>> Please provide the results of the test runs that were done for
>>>>>>>>> the supplied before and after idle data.
>>>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>>>> Is it a test I can run on my test computer?
>>>>>>>> I see that I have fio installed on my test computer.
>>>>>>>>
>>>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>>>> I have done the rest now, see below.
>>>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>>>
>>>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>>>> The total idle entries (usage)  and time seem low to me, which is why the ???.
>>>>>>>>>>
>>>>>>>>>> phoronix-sqlite
>>>>>>>>>>      Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>>>> state 0      220     0       218     0.00%   99.09%
>>>>>>>>>> state 1      70212   5213    34602   7.42%   49.28%
>>>>>>>>>> state 2      30273   5237    1806    17.30%  5.97%
>>>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>>>> state 4      11824   2120    0       17.93%  0.00%
>>>>>>>>>>
>>>>>>>>>> total                112529  12570   36626   43.72%   <<< Misses %
>>>>>>>>>>
>>>>>>>>>>      Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>>>>              Usage   Above   Below   Above   Below
>>>>>>>>>> state 0      262     0       260     0.00%   99.24%
>>>>>>>>>> state 1      62751   3985    35588   6.35%   56.71%
>>>>>>>>>> state 2      24941   7896    1433    31.66%  5.75%
>>>>>>>>>> state 3      0       0       0       0.00%   0.00%
>>>>>>>>>> state 4      24489   11543   0       47.14%  0.00%
>>>>>>>>>>
>>>>>>>>>> total                112443  23424   37281   53.99%   <<< Misses %
>>>>>>>>>>
>>>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>>
>>>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>>
>>>>>>>>>       Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>>>> state 0       297550          0               296084          0.00%   99.51%
>>>>>>>>> state 1       8062854 341043          4962635 4.23%   61.55%
>>>>>>>>> state 2       56708358        12688379        6252051 22.37%  11.02%
>>>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>>>> state 4       54624476        15868752        0               29.05%  0.00%
>>>>>>>>>
>>>>>>>>> total 119693238       28898174        11510770        33.76%  <<< Misses
>>>>>>>>>
>>>>>>>>>       Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>>       Usage           Above           Below           Above   Below
>>>>>>>>> state 0       90715           0               75134           0.00%   82.82%
>>>>>>>>> state 1       8878738 312970          6082180 3.52%   68.50%
>>>>>>>>> state 2       12048728        2576251 603316          21.38%  5.01%
>>>>>>>>> state 3       0               0               0               0.00%   0.00%
>>>>>>>>> state 4       85999424        44723273        0               52.00%  0.00%
>>>>>>>>>
>>>>>>>>> total 107017605       47612494        6760630 50.81%  <<< Misses
>>>>>>>>>
>>>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>
>>>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>>>
>>>>>>>>> fio
>>>>>>>>>       Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>>>> state 0       3822            0       3818    0.00%   99.90%
>>>>>>>>> state 1       148640          4406    68956   2.96%   46.39%
>>>>>>>>> state 2       593455          45344   105675  7.64%   17.81%
>>>>>>>>> state 3       3209648 807014  0       25.14%  0.00%
>>>>>>>>>
>>>>>>>>> total 3955565 856764  178449  26.17%  <<< Misses
>>>>>>>>>
>>>>>>>>>       Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>>       Usage           Above   Below   Above   Below
>>>>>>>>> state 0       916             0       756     0.00%   82.53%
>>>>>>>>> state 1       80230           2028    42791   2.53%   53.34%
>>>>>>>>> state 2       59231           6888    6791    11.63%  11.47%
>>>>>>>>> state 3       2455784 564797  0       23.00%  0.00%
>>>>>>>>>
>>>>>>>>> total 2596161 573713  50338   24.04%  <<< Misses
>>>>>>>>>
>>>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>>>> of the workload among the CPUs.
>>>>>>>>>
>>>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>>
>>>>>>>>> rds-stress-test
>>>>>>>>>       Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>>>> state 0       1561    0       1435    0.00%   91.93%
>>>>>>>>> state 1       13855   899     2410    6.49%   17.39%
>>>>>>>>> state 2       467998  139254  23679   29.76%  5.06%
>>>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>>>> state 4       213132  107417  0       50.40%  0.00%
>>>>>>>>>
>>>>>>>>> total 696546  247570  27524   39.49%  <<< Misses
>>>>>>>>>
>>>>>>>>>       Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>>>>       Usage   Above   Below   Above   Below
>>>>>>>>> state 0       231     0       231     0.00%   100.00%
>>>>>>>>> state 1       5413    266     1186    4.91%   21.91%
>>>>>>>>> state 2       54365   719     3789    1.32%   6.97%
>>>>>>>>> state 3       0       0       0       0.00%   0.00%
>>>>>>>>> state 4       267055  148327  0       55.54%  0.00%
>>>>>>>>>
>>>>>>>>> total 327064  149312  5206    47.24%  <<< Misses
>>>>>>>>>
>>>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>>>> This time the load distribution between CPUs is more
>>>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>>>
>>>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>>>
>>>>>>>> My input is to revert the reversion.
>>>>>>> OK, noted, thanks!
>>>>>>>
>>>>>>> Christian, what do you think?
>>>>>> I've attached readable diffs of the values provided the tldr is:
>>>>>>
>>>>>> +--------------------+-----------+-----------+
>>>>>> | Workload           | Δ above % | Δ below % |
>>>>>> +--------------------+-----------+-----------+
>>>>>> | fio                |  -10.11   |  +2.36    |
>>>>>> | rds-stress-test    |   -0.44   |  +2.57    |
>>>>>> | jbb                |  -20.35   |  +3.30    |
>>>>>> | phoronix-sqlite    |   -9.66   |  -0.61    |
>>>>>> +--------------------+-----------+-----------+
>>>>>>
>>>>>> I think the overall trend however is clear, the commit
>>>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>>>
>>>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>>>> performed better on paper, system metrics regressed because other
>>>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ 
>>>>>> (+CC Sergey)
>>>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>>>> governor and a more holistic approach would be needed for that platform
>>>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>>>> skew towards deeper cpuidle states).
>>>>>>
>>>>>> I also think that the change made sense, for small residency values
>>>>>> with a bit of random noise mixed in, performing the same statistical
>>>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>>>
>>>>>> So options are:
>>>>>> 1. Revert revert on mainline+stable
>>>>>> 2. Revert revert on mainline only
>>>>>> 3. Keep revert, miss out on the improvement for many.
>>>>>> 4. Revert only when we have a good solution for the platforms like
>>>>>> Sergey's.
>>>>>>
>>>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>>>> hack like playing with the deep idle state residency values would
>>>>>> be enough to mitigate.
>>>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>>>> even more pronounced improvements on older kernels? I've tested this on
>>>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>>>> Since the revert has optimizations present only in Jasper Lake Systems
>>>>> which is new, isn't reverting the revert more relevant on stable
>>>>> kernels? It's more likely that older hardware runs older kernels than
>>>>> newer hardware although not always necessary imo.
>>>>>
>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>> a priority over further optimizing for specific newer platforms like
>>> Jasper Lake?
>>>
>> Well avoiding regressions on established platforms is what lead to
>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>> being applied and backported.
>> The expectation for stable is that we avoid regressions and potentially
>> miss out on improvements. If you want the latest greatest performance you
>> should probably run a latest greatest kernel.
>> The original
>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>> was seen as a fix and overall improvement, that's why it was backported,
>> but Sergey's regression report contradicted that.
>> What is "established" and "newer" for a stable kernel is quite handwavy
>> IMO but even here Sergey's regression report is a clear data point...
>> Your report is only restoring 5.15 (and others) performance to 5.15
>> upstream-ish levels which is within the expectations of running a stable
>> kernel. No doubt it's frustrating either way!
> Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels
> with the revert and reapplying it recovers performance across many
> benchmarks. Hence, the previous suggestion.

Right but in the case of 5.15 (similar for the others) we have:
v5.15.196 5666bcc3c00f Revert "cpuidle: menu: Avoid discarding useful information"
v5.15.185 14790abc8779 cpuidle: menu: Avoid discarding useful information
So your regression on v5.15.196 only restores performance back to pre-v5.15.185
levels, you now don't get an improvement which wasn't originally part of v5.15.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03  9:31                           ` Christian Loehle
  2026-02-03 10:22                             ` Harshvardhan Jha
@ 2026-02-03 16:45                             ` Rafael J. Wysocki
  2026-02-05  0:45                               ` Doug Smythies
  1 sibling, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-03 16:45 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies, Sasha Levin,
	Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
	Sergey Senozhatsky

On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 2/3/26 09:16, Harshvardhan Jha wrote:
> >
> > On 03/02/26 2:37 PM, Christian Loehle wrote:
> >> On 2/2/26 17:31, Harshvardhan Jha wrote:

[cut]

> >> FWIW Jasper Lake seems to be supported from 5.6 on, see
> >> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
> >
> > Oh I see, but shouldn't avoiding regressions on established platforms be
> > a priority over further optimizing for specific newer platforms like
> > Jasper Lake?
> >
>
> Well avoiding regressions on established platforms is what lead to
> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
> being applied and backported.
> The expectation for stable is that we avoid regressions and potentially
> miss out on improvements. If you want the latest greatest performance you
> should probably run a latest greatest kernel.
> The original
> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
> was seen as a fix and overall improvement,

Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
Cc: stable.  It was picked up into stable kernels for another reason.

> that's why it was backported, but Sergey's regression report contradicted that.

Exactly.

> What is "established" and "newer" for a stable kernel is quite handwavy
> IMO but even here Sergey's regression report is a clear data point...

Which wasn't known at the time commit 85975daeaa4d went in.

> Your report is only restoring 5.15 (and others) performance to 5.15
> upstream-ish levels which is within the expectations of running a stable
> kernel. No doubt it's frustrating either way!

That is a consequence of the time it takes for mainline changes to
propagate to distributions (Chrome OS in this particular case) at
which point they get tested on a wider range of systems.  Until that
happens, it is not really guaranteed that the given change will stay
in.

In this particular case, restoring commit 85975daeaa4d would cause the
same problems on the systems adversely affected by it to become
visible again and I don't think it would be fair to say "Too bad" to
the users of those systems.  IMV, it cannot be restored without a way
to at least limit the adverse effect on performance.

I have an idea to test, but getting something workable out of it may
be a challenge, even if it turns out to be a good one.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-03 16:45                             ` Rafael J. Wysocki
@ 2026-02-05  0:45                               ` Doug Smythies
  2026-02-05  2:37                                 ` Sergey Senozhatsky
  2026-02-05  5:02                                 ` Doug Smythies
  0 siblings, 2 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-05  0:45 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Christian Loehle',
	'Harshvardhan Jha', 'Sergey Senozhatsky'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano', Doug Smythies

On 2026.02.03 08:46 Rafael J. Wysocki wrote:

-----Original Message-----
From: Rafael J. Wysocki <rafael@kernel.org> 
Sent: February 3, 2026 8:46 AM
To: Christian Loehle <christian.loehle@arm.com>
Cc: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>; Rafael J. Wysocki <rafael@kernel.org>; Doug Smythies <dsmythies@telus.net>; Sasha Levin <sashal@kernel.org>; Greg Kroah-Hartman <gregkh@linuxfoundation.org>; linux-pm@vger.kernel.org; stable@vger.kernel.org; Daniel Lezcano <daniel.lezcano@linaro.org>; Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS

> On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote:
>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>
> [cut]
>
>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>>
>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>> a priority over further optimizing for specific newer platforms like
>>> Jasper Lake?
>>>
>> Well avoiding regressions on established platforms is what lead to
>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>> being applied and backported.
>> The expectation for stable is that we avoid regressions and potentially
>> miss out on improvements. If you want the latest greatest performance you
>> should probably run a latest greatest kernel.
>> The original
>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>> was seen as a fix and overall improvement,
>
> Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
> Cc: stable.  It was picked up into stable kernels for another reason.
>
>> that's why it was backported, but Sergey's regression report contradicted that.
>
> Exactly.
>
>> What is "established" and "newer" for a stable kernel is quite handwavy
>> IMO but even here Sergey's regression report is a clear data point...
>
> Which wasn't known at the time commit 85975daeaa4d went in.
>
>> Your report is only restoring 5.15 (and others) performance to 5.15
>> upstream-ish levels which is within the expectations of running a stable
>> kernel. No doubt it's frustrating either way!
>
> That is a consequence of the time it takes for mainline changes to
> propagate to distributions (Chrome OS in this particular case) at
> which point they get tested on a wider range of systems.  Until that
> happens, it is not really guaranteed that the given change will stay
> in.
>
> In this particular case, restoring commit 85975daeaa4d would cause the
> same problems on the systems adversely affected by it to become
> visible again and I don't think it would be fair to say "Too bad" to
> the users of those systems.  IMV, it cannot be restored without a way
> to at least limit the adverse effect on performance.

I have been going over the old emails and the turbostat data again and again
and again.

I still do not understand how to breakdown Sergey's results into its
component contributions. I am certain there is power limit throttling
during the test, but have no idea to much or how little it contributes to the
differing results.

I think more work is needed to fully understand Sergey's test results from October.
I struggle with the dramatic test results difference of base=84.5 and revert=59.5
as being due to only the idle code changes.

That is why I keep asking for a test to be done with the CPU clock frequency limited
such that power limit throttling can not occur. I don't know what limit to use, but suggest
2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
@Sergey: are you willing to do the test?

>
> I have an idea to test, but getting something workable out of it may
> be a challenge, even if it turns out to be a good one.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  0:45                               ` Doug Smythies
@ 2026-02-05  2:37                                 ` Sergey Senozhatsky
  2026-02-05  5:18                                   ` Doug Smythies
  2026-02-05  7:15                                   ` Christian Loehle
  2026-02-05  5:02                                 ` Doug Smythies
  1 sibling, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-05  2:37 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Rafael J. Wysocki', 'Christian Loehle',
	'Harshvardhan Jha', 'Sergey Senozhatsky',
	'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano'

On (26/02/04 16:45), Doug Smythies wrote:
> >> What is "established" and "newer" for a stable kernel is quite handwavy
> >> IMO but even here Sergey's regression report is a clear data point...
> >
> > Which wasn't known at the time commit 85975daeaa4d went in.
> >
> >> Your report is only restoring 5.15 (and others) performance to 5.15
> >> upstream-ish levels which is within the expectations of running a stable
> >> kernel. No doubt it's frustrating either way!
> >
> > That is a consequence of the time it takes for mainline changes to
> > propagate to distributions (Chrome OS in this particular case) at
> > which point they get tested on a wider range of systems.  Until that
> > happens, it is not really guaranteed that the given change will stay
> > in.
> >
> > In this particular case, restoring commit 85975daeaa4d would cause the
> > same problems on the systems adversely affected by it to become
> > visible again and I don't think it would be fair to say "Too bad" to
> > the users of those systems.  IMV, it cannot be restored without a way
> > to at least limit the adverse effect on performance.
> 
> I have been going over the old emails and the turbostat data again and again
> and again.
> 
> I still do not understand how to breakdown Sergey's results into its
> component contributions. I am certain there is power limit throttling
> during the test, but have no idea to much or how little it contributes to the
> differing results.
> 
> I think more work is needed to fully understand Sergey's test results from October.
> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
> as being due to only the idle code changes.
> 
> That is why I keep asking for a test to be done with the CPU clock frequency limited
> such that power limit throttling can not occur. I don't know what limit to use, but suggest
> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.


> @Sergey: are you willing to do the test?

I can run tests, not immediately, though, but within some reasonable
time frame.

(I'll need some help with instructions/etc.)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  0:45                               ` Doug Smythies
  2026-02-05  2:37                                 ` Sergey Senozhatsky
@ 2026-02-05  5:02                                 ` Doug Smythies
  2026-02-10  9:33                                   ` Xueqin Luo
  1 sibling, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-02-05  5:02 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Christian Loehle',
	'Harshvardhan Jha', 'Sergey Senozhatsky'
  Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano', Doug Smythies

[-- Attachment #1: Type: text/plain, Size: 3851 bytes --]

On 2026.02.04 16:45 Doug Smythies wrote:
> On 2026.02.03 08:46 Rafael J. Wysocki wrote:
>> On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote:
>>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>
>> [cut]
>>
>>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>>>
>>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>>> a priority over further optimizing for specific newer platforms like
>>>> Jasper Lake?
>>>>
>>> Well avoiding regressions on established platforms is what lead to
>>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>>> being applied and backported.
>>> The expectation for stable is that we avoid regressions and potentially
>>> miss out on improvements. If you want the latest greatest performance you
>>> should probably run a latest greatest kernel.
>>> The original
>>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>>> was seen as a fix and overall improvement,
>>
>> Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
>> Cc: stable.  It was picked up into stable kernels for another reason.
>>
>>> that's why it was backported, but Sergey's regression report contradicted that.
>>
>> Exactly.
>>
>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>> IMO but even here Sergey's regression report is a clear data point...
>>
>> Which wasn't known at the time commit 85975daeaa4d went in.
>>
>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>> upstream-ish levels which is within the expectations of running a stable
>>> kernel. No doubt it's frustrating either way!
>>
>> That is a consequence of the time it takes for mainline changes to
>> propagate to distributions (Chrome OS in this particular case) at
>> which point they get tested on a wider range of systems.  Until that
>> happens, it is not really guaranteed that the given change will stay
>> in.
>>
>> In this particular case, restoring commit 85975daeaa4d would cause the
>> same problems on the systems adversely affected by it to become
>> visible again and I don't think it would be fair to say "Too bad" to
>> the users of those systems.  IMV, it cannot be restored without a way
>> to at least limit the adverse effect on performance.
>
> I have been going over the old emails and the turbostat data again and again
> and again.
>
> I still do not understand how to breakdown Sergey's results into its
> component contributions. I am certain there is power limit throttling
> during the test, but have no idea to much or how little it contributes to the
> differing results.
>
> I think more work is needed to fully understand Sergey's test results from October.
> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
> as being due to only the idle code changes.
>
> That is why I keep asking for a test to be done with the CPU clock frequency limited
> such that power limit throttling can not occur. I don't know what limit to use, but suggest
> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
> @Sergey: are you willing to do the test?

Further to my earlier email, there is an interesting area in Sergey's turbostat data from
October. It is not near any throttling threshold, yet the CPU frequencies are considerably
higher for the "revert" case verses "base", with everything seemingly similar. I do not
understand this area. An annotated graph is attached.

>> I have an idea to test, but getting something workable out of it may
>> be a challenge, even if it turns out to be a good one.


[-- Attachment #2: interesting-area.png --]
[-- Type: image/png, Size: 237347 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  2:37                                 ` Sergey Senozhatsky
@ 2026-02-05  5:18                                   ` Doug Smythies
  2026-02-10  9:17                                     ` Sergey Senozhatsky
  2026-02-05  7:15                                   ` Christian Loehle
  1 sibling, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-02-05  5:18 UTC (permalink / raw)
  To: 'Sergey Senozhatsky'
  Cc: 'Rafael J. Wysocki', 'Christian Loehle',
	'Harshvardhan Jha', 'Sasha Levin',
	'Greg Kroah-Hartman', linux-pm, stable,
	'Daniel Lezcano', Doug Smythies

On 2026.02.04 18:37 Sergey Senozhatsky wrote:
> On (26/02/04 16:45), Doug Smythies wrote:
>>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>>> IMO but even here Sergey's regression report is a clear data point...
>>>
>>> Which wasn't known at the time commit 85975daeaa4d went in.
>>>
>>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>>> upstream-ish levels which is within the expectations of running a stable
>>>> kernel. No doubt it's frustrating either way!
>>>
>>> That is a consequence of the time it takes for mainline changes to
>>> propagate to distributions (Chrome OS in this particular case) at
>>> which point they get tested on a wider range of systems.  Until that
>>> happens, it is not really guaranteed that the given change will stay
>>> in.
>>>
>>> In this particular case, restoring commit 85975daeaa4d would cause the
>>> same problems on the systems adversely affected by it to become
>>> visible again and I don't think it would be fair to say "Too bad" to
>>> the users of those systems.  IMV, it cannot be restored without a way
>>> to at least limit the adverse effect on performance.
>> 
>> I have been going over the old emails and the turbostat data again and again
>> and again.
>> 
>> I still do not understand how to breakdown Sergey's results into its
>> component contributions. I am certain there is power limit throttling
>> during the test, but have no idea to much or how little it contributes to the
>> differing results.
>> 
>> I think more work is needed to fully understand Sergey's test results from October.
>> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
>> as being due to only the idle code changes.
>> 
>> That is why I keep asking for a test to be done with the CPU clock frequency limited
>> such that power limit throttling can not occur. I don't know what limit to use, but suggest
>> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
>
>> @Sergey: are you willing to do the test?
>
> I can run tests, not immediately, though, but within some reasonable
> time frame.

Thanks.

> (I'll need some help with instructions/etc.)

From your turbostat data from October you are using the intel_pstate
CPU frequency scaling driver and the powersave CPU frequency scaling governor
and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.

To limit the maximum CPU frequency to around 2,200 MHz do:

echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct

Then run the tests acquiring turbostat logs the same way you did in October.

To restore the maximum CPU frequency afterwards do:

echo 100 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  2:37                                 ` Sergey Senozhatsky
  2026-02-05  5:18                                   ` Doug Smythies
@ 2026-02-05  7:15                                   ` Christian Loehle
  2026-02-10  8:02                                     ` Sergey Senozhatsky
  1 sibling, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-05  7:15 UTC (permalink / raw)
  To: Sergey Senozhatsky, Doug Smythies
  Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
	'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano'

On 2/5/26 02:37, Sergey Senozhatsky wrote:
> On (26/02/04 16:45), Doug Smythies wrote:
>>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>>> IMO but even here Sergey's regression report is a clear data point...
>>>
>>> Which wasn't known at the time commit 85975daeaa4d went in.
>>>
>>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>>> upstream-ish levels which is within the expectations of running a stable
>>>> kernel. No doubt it's frustrating either way!
>>>
>>> That is a consequence of the time it takes for mainline changes to
>>> propagate to distributions (Chrome OS in this particular case) at
>>> which point they get tested on a wider range of systems.  Until that
>>> happens, it is not really guaranteed that the given change will stay
>>> in.
>>>
>>> In this particular case, restoring commit 85975daeaa4d would cause the
>>> same problems on the systems adversely affected by it to become
>>> visible again and I don't think it would be fair to say "Too bad" to
>>> the users of those systems.  IMV, it cannot be restored without a way
>>> to at least limit the adverse effect on performance.
>>
>> I have been going over the old emails and the turbostat data again and again
>> and again.
>>
>> I still do not understand how to breakdown Sergey's results into its
>> component contributions. I am certain there is power limit throttling
>> during the test, but have no idea to much or how little it contributes to the
>> differing results.
>>
>> I think more work is needed to fully understand Sergey's test results from October.
>> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
>> as being due to only the idle code changes.
>>
>> That is why I keep asking for a test to be done with the CPU clock frequency limited
>> such that power limit throttling can not occur. I don't know what limit to use, but suggest
>> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
> 
> 
>> @Sergey: are you willing to do the test?
> 
> I can run tests, not immediately, though, but within some reasonable
> time frame.
> 
> (I'll need some help with instructions/etc.)
> 

@Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
29.6% decrease in system performance in a traditional throughput sense.
The "benchmark" might me measuring dropped frames, user input latency or what have you.
Nonetheless @Sergey do feel free to expand.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  7:15                                   ` Christian Loehle
@ 2026-02-10  8:02                                     ` Sergey Senozhatsky
  2026-02-10  8:57                                       ` Christian Loehle
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10  8:02 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Sergey Senozhatsky, Doug Smythies, 'Rafael J. Wysocki',
	'Harshvardhan Jha', 'Sasha Levin',
	'Greg Kroah-Hartman', linux-pm, stable,
	'Daniel Lezcano'

On (26/02/05 07:15), Christian Loehle wrote:
[..]
> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
> 29.6% decrease in system performance in a traditional throughput sense.
> The "benchmark" might me measuring dropped frames, user input latency or what have you.
> Nonetheless @Sergey do feel free to expand.

I'm not on the performance team and I don't define those metrics, so
I can't really comment.  But frame drops during Google Docs scrolling,
for instance, or typing is a user visible regression, that people tend
to notice.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10  8:02                                     ` Sergey Senozhatsky
@ 2026-02-10  8:57                                       ` Christian Loehle
  2026-02-11  4:27                                         ` Doug Smythies
  0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-10  8:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Doug Smythies, 'Rafael J. Wysocki',
	'Harshvardhan Jha', 'Sasha Levin',
	'Greg Kroah-Hartman', linux-pm, stable,
	'Daniel Lezcano'

On 2/10/26 08:02, Sergey Senozhatsky wrote:
> On (26/02/05 07:15), Christian Loehle wrote:
> [..]
>> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
>> 29.6% decrease in system performance in a traditional throughput sense.
>> The "benchmark" might me measuring dropped frames, user input latency or what have you.
>> Nonetheless @Sergey do feel free to expand.
> 
> I'm not on the performance team and I don't define those metrics, so
> I can't really comment.  But frame drops during Google Docs scrolling,
> for instance, or typing is a user visible regression, that people tend
> to notice.

Yeah I guess that was my point already, i.e. it isn't implausible that
e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in
double the number of dropped frames (= score reduction of 50%).
Everything just an example but don't be thrown off by the 29.6% reduction in
score and expect to go looking for -29.6% cpu frequency (like you would expect
for many purely cpubound benchmarks).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  5:18                                   ` Doug Smythies
@ 2026-02-10  9:17                                     ` Sergey Senozhatsky
  2026-02-11  4:27                                       ` Doug Smythies
  0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10  9:17 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Sergey Senozhatsky', 'Rafael J. Wysocki',
	'Christian Loehle', 'Harshvardhan Jha',
	'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano'

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

On (26/02/04 21:18), Doug Smythies wrote:
> 
> > (I'll need some help with instructions/etc.)
> 
> From your turbostat data from October you are using the intel_pstate
> CPU frequency scaling driver and the powersave CPU frequency scaling governor
> and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.
> 
> To limit the maximum CPU frequency to around 2,200 MHz do:
> 
> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct

So I only set max_perf_pct

echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct

and re-ran the test:

revert: 72.5
base:   82.5

// as before "revert" means revert of "cpuidle: menu: Avoid discarding useful
// information", while base means that the patch is applied.

Please find turbostat logs attached.

[-- Attachment #2: turbostat-base.gz --]
[-- Type: application/gzip, Size: 40773 bytes --]

[-- Attachment #3: turbostat-revert.gz --]
[-- Type: application/gzip, Size: 39194 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-05  5:02                                 ` Doug Smythies
@ 2026-02-10  9:33                                   ` Xueqin Luo
  2026-02-10 10:04                                     ` Sergey Senozhatsky
  2026-02-10 14:20                                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 44+ messages in thread
From: Xueqin Luo @ 2026-02-10  9:33 UTC (permalink / raw)
  To: dsmythies
  Cc: christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha,
	linux-pm, rafael, sashal, senozhatsky, stable, Xueqin Luo

Hi Doug, Rafael, and all,

I would like to share an additional data point from a different
platform that also shows a power regression associated with commit
85975daeaa4d ("cpuidle: menu: Avoid discarding useful information").

The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test
scenario is system idle power measurement.

Below are the cpuidle statistics for CPU1. Other CPU cores show the
same trend.

With commit 85975daeaa4d applied (base):

    State   Ratio    Avg(ms)     Count      Over-pred      Under-pred
    -----------------------------------------------------------------
    POLL    0.0%      0.10         68       0.0%  (0)      100.0% (68)
    C1      0.05%     0.82        187       0.0%  (0)       61.5% (115)
    C2      0.0%      0.50         23      30.4%  (7)       69.6%  (16)
    C3      0.01%     0.59         35      37.1% (13)       62.9%  (22)
    C4      0.24%     0.65       1092      46.9% (512)      50.0% (546)
    C5     81.88%     1.45     169745      52.7% (89450)     0.0%   (0)

After reverting the commit (revert):

    State   Ratio    Avg(ms)     Count      Over-pred      Under-pred
    -----------------------------------------------------------------
    POLL    0.0%      0.09         24       0.0%  (0)      100.0% (24)
    C1      0.03%     0.44        222       0.0%  (0)       57.2% (127)
    C2      0.01%     0.44         50      20.0% (10)       74.0%  (37)
    C3      0.01%     0.49         43      25.6% (11)       60.5%  (26)
    C4      0.15%     0.52        875      45.1% (395)      41.4% (362)
    C5     97.1%      5.30      55099      13.9% (7645)      0.0%   (0)

With commit 85975daeaa4d present:

* C5 entry count is very high
* C5 average residency is short (~1.45 ms)
* Over-prediction ratio for C5 is around 50%

After reverting the commit:

* C5 residency ratio exceeds 90%
* C5 average residency increases to ~5 ms
* Entry count drops significantly
* Over-prediction ratio is greatly reduced

This indicates that the commit makes idle residency more fragmented,
leading to more frequent C-state transitions.

In addition to the cpuidle statistics, measured system idle power is
about 2W higher when this commit is applied.

Thanks,
Xueqin Luo
-- 
2.43.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10  9:33                                   ` Xueqin Luo
@ 2026-02-10 10:04                                     ` Sergey Senozhatsky
  2026-02-10 14:24                                       ` Rafael J. Wysocki
  2026-02-10 14:20                                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10 10:04 UTC (permalink / raw)
  To: Xueqin Luo
  Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh,
	harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable

On (26/02/10 17:33), Xueqin Luo wrote:
> 
> In addition to the cpuidle statistics, measured system idle power is
> about 2W higher when this commit is applied.
> 

We also noticed shorted battery life on some of the affected laptops.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10  9:33                                   ` Xueqin Luo
  2026-02-10 10:04                                     ` Sergey Senozhatsky
@ 2026-02-10 14:20                                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-10 14:20 UTC (permalink / raw)
  To: Xueqin Luo
  Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh,
	harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable

On Tue, Feb 10, 2026 at 10:33 AM Xueqin Luo <luoxueqin@kylinos.cn> wrote:
>
> Hi Doug, Rafael, and all,
>
> I would like to share an additional data point from a different
> platform that also shows a power regression associated with commit
> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information").
>
> The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test
> scenario is system idle power measurement.
>
> Below are the cpuidle statistics for CPU1. Other CPU cores show the
> same trend.
>
> With commit 85975daeaa4d applied (base):
>
>     State   Ratio    Avg(ms)     Count      Over-pred      Under-pred
>     -----------------------------------------------------------------
>     POLL    0.0%      0.10         68       0.0%  (0)      100.0% (68)
>     C1      0.05%     0.82        187       0.0%  (0)       61.5% (115)
>     C2      0.0%      0.50         23      30.4%  (7)       69.6%  (16)
>     C3      0.01%     0.59         35      37.1% (13)       62.9%  (22)
>     C4      0.24%     0.65       1092      46.9% (512)      50.0% (546)
>     C5     81.88%     1.45     169745      52.7% (89450)     0.0%   (0)
>
> After reverting the commit (revert):
>
>     State   Ratio    Avg(ms)     Count      Over-pred      Under-pred
>     -----------------------------------------------------------------
>     POLL    0.0%      0.09         24       0.0%  (0)      100.0% (24)
>     C1      0.03%     0.44        222       0.0%  (0)       57.2% (127)
>     C2      0.01%     0.44         50      20.0% (10)       74.0%  (37)
>     C3      0.01%     0.49         43      25.6% (11)       60.5%  (26)
>     C4      0.15%     0.52        875      45.1% (395)      41.4% (362)
>     C5     97.1%      5.30      55099      13.9% (7645)      0.0%   (0)
>
> With commit 85975daeaa4d present:
>
> * C5 entry count is very high
> * C5 average residency is short (~1.45 ms)
> * Over-prediction ratio for C5 is around 50%
>
> After reverting the commit:
>
> * C5 residency ratio exceeds 90%
> * C5 average residency increases to ~5 ms
> * Entry count drops significantly
> * Over-prediction ratio is greatly reduced
>
> This indicates that the commit makes idle residency more fragmented,
> leading to more frequent C-state transitions.

Thanks for the data!

> In addition to the cpuidle statistics, measured system idle power is
> about 2W higher when this commit is applied.

Well, 2W of a difference is not good.  I'm wondering what the idle
power with the commit in question reverted is.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10 10:04                                     ` Sergey Senozhatsky
@ 2026-02-10 14:24                                       ` Rafael J. Wysocki
  2026-02-11  1:34                                         ` Sergey Senozhatsky
  2026-02-11  8:58                                         ` Xueqin Luo
  0 siblings, 2 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-10 14:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Xueqin Luo, dsmythies, christian.loehle, daniel.lezcano, gregkh,
	harshvardhan.j.jha, linux-pm, rafael, sashal, stable

On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (26/02/10 17:33), Xueqin Luo wrote:
> >
> > In addition to the cpuidle statistics, measured system idle power is
> > about 2W higher when this commit is applied.
> >
>
> We also noticed shorted battery life on some of the affected laptops.

Was the difference significant?

Overall, this clearly is a "help some - hurt some" situation and I am
not at all convinced that restoring the commit in question is a good
idea (with all due respect to everyone who thinks otherwise or got
better results when it was there).

Honestly, I'd rather stop tweaking the menu governor at this point and
kindly ask people who want to sacrifice some energy for more
performance to try the teo governor instead.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10 14:24                                       ` Rafael J. Wysocki
@ 2026-02-11  1:34                                         ` Sergey Senozhatsky
  2026-02-11  4:17                                           ` Doug Smythies
  2026-02-11  8:58                                         ` Xueqin Luo
  1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-11  1:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Sergey Senozhatsky, Xueqin Luo, dsmythies, christian.loehle,
	daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, sashal,
	stable

On (26/02/10 15:24), Rafael J. Wysocki wrote:
> On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (26/02/10 17:33), Xueqin Luo wrote:
> > >
> > > In addition to the cpuidle statistics, measured system idle power is
> > > about 2W higher when this commit is applied.
> > >
> >
> > We also noticed shorted battery life on some of the affected laptops.
> 
> Was the difference significant?

I think I saw up to "5.16% regression in perf.minutes_battery_life"

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-11  1:34                                         ` Sergey Senozhatsky
@ 2026-02-11  4:17                                           ` Doug Smythies
  0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11  4:17 UTC (permalink / raw)
  To: 'Sergey Senozhatsky', 'Rafael J. Wysocki'
  Cc: 'Xueqin Luo', christian.loehle, daniel.lezcano, gregkh,
	harshvardhan.j.jha, linux-pm, sashal, stable, Doug Smythies

On 2026.02.10 17:34 Sergey Senozhatsky wrote:
> On (26/02/10 15:24), Rafael J. Wysocki wrote:
>> On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky wrote:
>>> On (26/02/10 17:33), Xueqin Luo wrote:
>>>>
>>>> In addition to the cpuidle statistics, measured system idle power is
>>>> about 2W higher when this commit is applied.
>>>>
>>>
>>> We also noticed shorted battery life on some of the affected laptops.
>> 
>> Was the difference significant?
>
> I think I saw up to "5.16% regression in perf.minutes_battery_life"

Note: I get a fair bit of noise on my idle tests of recent.

For what it's worth: On my test computer I got:

kernel 6.19-rc8 and with reapply.

Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.

CPU frequency scaling driver: intel_pstate.
CPU frequency scaling governor: powersave.
HWP: enabled.
idle governor menu.

Git Branch "doug"

3856f38e5bb9 (HEAD -> doug) Reapply "cpuidle: menu: Avoid discarding useful information"
18f7fcd5e69a (tag: v6.19-rc8) Linux 6.19-rc8

Processor Package Power:

average reapply: 140 minutes: 1.783028571
average rc8: 512 minutes: 1.656212891
7.65% more power



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10  9:17                                     ` Sergey Senozhatsky
@ 2026-02-11  4:27                                       ` Doug Smythies
  0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11  4:27 UTC (permalink / raw)
  To: 'Sergey Senozhatsky'
  Cc: 'Rafael J. Wysocki', 'Christian Loehle',
	'Harshvardhan Jha', 'Sasha Levin',
	'Greg Kroah-Hartman', linux-pm, stable,
	'Daniel Lezcano', Doug Smythies

On 2026.02.10 01:17 Sergey Senozhatsky wrote:
> On (26/02/04 21:18), Doug Smythies wrote:
>> 
>>> (I'll need some help with instructions/etc.)
>> 
>> From your turbostat data from October you are using the intel_pstate
>> CPU frequency scaling driver and the powersave CPU frequency scaling governor
>> and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.
>> 
>> To limit the maximum CPU frequency to around 2,200 MHz do:
>> 
>> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
>
> So I only set max_perf_pct
>
> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
>
> and re-ran the test:
>
> revert: 72.5
> base:   82.5
>
> // as before "revert" means revert of "cpuidle: menu: Avoid discarding useful
> // information", while base means that the patch is applied.
>
> Please find turbostat logs attached.

Than you for doing the test.

So now the turbostat data shows that the processor never gets anywhere
near any throttling threshold, eliminating it form consideration.

... Doug



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10  8:57                                       ` Christian Loehle
@ 2026-02-11  4:27                                         ` Doug Smythies
  0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11  4:27 UTC (permalink / raw)
  To: 'Christian Loehle', 'Sergey Senozhatsky'
  Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
	'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
	stable, 'Daniel Lezcano', Doug Smythies

On 2026.02.10 00:57 Christian Loehle wrote:
> On 2/10/26 08:02, Sergey Senozhatsky wrote:
>> On (26/02/05 07:15), Christian Loehle wrote:
>> [..]
>>> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
>>> 29.6% decrease in system performance in a traditional throughput sense.
>>> The "benchmark" might me measuring dropped frames, user input latency or what have you.
>>> Nonetheless @Sergey do feel free to expand.
>> 
>> I'm not on the performance team and I don't define those metrics, so
>> I can't really comment.  But frame drops during Google Docs scrolling,
>> for instance, or typing is a user visible regression, that people tend
>> to notice.
>
> Yeah I guess that was my point already, i.e. it isn't implausible that
> e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in
> double the number of dropped frames (= score reduction of 50%).
> Everything just an example but don't be thrown off by the 29.6% reduction in
> score and expect to go looking for -29.6% cpu frequency (like you would expect
> for many purely cpubound benchmarks).

Thanks for the inputs. Agreed.
In my defense, I was just attempting to extract whatever I could from
the very limited data we had.

... Doug



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
  2026-02-10 14:24                                       ` Rafael J. Wysocki
  2026-02-11  1:34                                         ` Sergey Senozhatsky
@ 2026-02-11  8:58                                         ` Xueqin Luo
  1 sibling, 0 replies; 44+ messages in thread
From: Xueqin Luo @ 2026-02-11  8:58 UTC (permalink / raw)
  To: rafael
  Cc: christian.loehle, daniel.lezcano, dsmythies, gregkh,
	harshvardhan.j.jha, linux-pm, luoxueqin, sashal, senozhatsky,
	stable

On this platform (ZHAOXIN KaiXian KX-7000), we evaluated the impact
of commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
under a screen-on idle scenario. During testing, the cpufreq driver
was acpi-cpufreq and the scaling governor was set to ondemand.

With this commit applied, measured system idle power increases by
approximately 2W compared to the revert case. In addition, battery life
testing on the same system shows a reduction of roughly 80 minutes when
this commit is present.

These results were consistently reproduced across multiple test runs
under identical conditions.

-- 
2.43.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2026-02-11  8:59 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
2025-12-03 16:44 ` Christian Loehle
2025-12-03 22:30   ` Doug Smythies
2025-12-08 11:33     ` Harshvardhan Jha
2025-12-08 12:47       ` Christian Loehle
2026-01-13  7:06         ` Harshvardhan Jha
2026-01-13 14:13           ` Rafael J. Wysocki
2026-01-13 14:18             ` Rafael J. Wysocki
2026-01-14  4:28               ` Sergey Senozhatsky
2026-01-14  4:49                 ` Sergey Senozhatsky
2026-01-14  5:15                   ` Tomasz Figa
2026-01-14 20:07                     ` Rafael J. Wysocki
2026-01-29 10:23                       ` Harshvardhan Jha
2026-01-29 22:47                   ` Doug Smythies
2026-01-27 15:45         ` Harshvardhan Jha
2026-01-28  5:06           ` Doug Smythies
2026-01-28 23:53             ` Doug Smythies
2026-01-29 22:27               ` Doug Smythies
2026-01-30 19:28                 ` Rafael J. Wysocki
2026-02-01 19:20                   ` Christian Loehle
2026-02-02 17:31                     ` Harshvardhan Jha
2026-02-03  9:07                       ` Christian Loehle
2026-02-03  9:16                         ` Harshvardhan Jha
2026-02-03  9:31                           ` Christian Loehle
2026-02-03 10:22                             ` Harshvardhan Jha
2026-02-03 10:30                               ` Christian Loehle
2026-02-03 16:45                             ` Rafael J. Wysocki
2026-02-05  0:45                               ` Doug Smythies
2026-02-05  2:37                                 ` Sergey Senozhatsky
2026-02-05  5:18                                   ` Doug Smythies
2026-02-10  9:17                                     ` Sergey Senozhatsky
2026-02-11  4:27                                       ` Doug Smythies
2026-02-05  7:15                                   ` Christian Loehle
2026-02-10  8:02                                     ` Sergey Senozhatsky
2026-02-10  8:57                                       ` Christian Loehle
2026-02-11  4:27                                         ` Doug Smythies
2026-02-05  5:02                                 ` Doug Smythies
2026-02-10  9:33                                   ` Xueqin Luo
2026-02-10 10:04                                     ` Sergey Senozhatsky
2026-02-10 14:24                                       ` Rafael J. Wysocki
2026-02-11  1:34                                         ` Sergey Senozhatsky
2026-02-11  4:17                                           ` Doug Smythies
2026-02-11  8:58                                         ` Xueqin Luo
2026-02-10 14:20                                     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox