* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
@ 2025-12-03 16:18 Harshvardhan Jha
2025-12-03 16:44 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2025-12-03 16:18 UTC (permalink / raw)
To: Rafael J. Wysocki, Daniel Lezcano
Cc: Sasha Levin, Christian Loehle, Greg Kroah-Hartman,
linux-pm@vger.kernel.org, stable@vger.kernel.org
Hi there,
While running performance benchmarks for the 5.15.196 LTS tags , it was
observed that several regressions across different benchmarks is being
introduced when compared to the previous 5.15.193 kernel tag. Running an
automated bisect on both of them narrowed down the culprit commit to:
- 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
information" for 5.15
Regressions on 5.15.196 include:
-9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
-6.3% : Phoronix system/sqlite on OnPrem X6-2
-18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
thread & 1M buffer size on OnPrem X6-2
-4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
1 thread & 1M buffer size on OnPrem X6-2
Up to -30% : Some Netpipe metrics on OnPrem X5-2
The culprit commits' messages mention that these reverts were done due
to performance regressions introduced in Intel Jasper Lake systems but
this revert is causing issues in other systems unfortunately. I wanted
to know the maintainers' opinion on how we should proceed in order to
fix this. If we reapply it'll bring back the previous regressions on
Jasper Lake systems and if we don't revert it then it's stuck with
current regressions. If this problem has been reported before and a fix
is in the works then please let me know I shall follow developments to
that mail thread.
Thanks & Regards,
Harshvardhan
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
@ 2025-12-03 16:44 ` Christian Loehle
2025-12-03 22:30 ` Doug Smythies
0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2025-12-03 16:44 UTC (permalink / raw)
To: Harshvardhan Jha, Rafael J. Wysocki, Daniel Lezcano
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm@vger.kernel.org,
stable@vger.kernel.org
On 12/3/25 16:18, Harshvardhan Jha wrote:
> Hi there,
>
> While running performance benchmarks for the 5.15.196 LTS tags , it was
> observed that several regressions across different benchmarks is being
> introduced when compared to the previous 5.15.193 kernel tag. Running an
> automated bisect on both of them narrowed down the culprit commit to:
> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> information" for 5.15
>
> Regressions on 5.15.196 include:
> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> thread & 1M buffer size on OnPrem X6-2
> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> 1 thread & 1M buffer size on OnPrem X6-2
> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>
> The culprit commits' messages mention that these reverts were done due
> to performance regressions introduced in Intel Jasper Lake systems but
> this revert is causing issues in other systems unfortunately. I wanted
> to know the maintainers' opinion on how we should proceed in order to
> fix this. If we reapply it'll bring back the previous regressions on
> Jasper Lake systems and if we don't revert it then it's stuck with
> current regressions. If this problem has been reported before and a fix
> is in the works then please let me know I shall follow developments to
> that mail thread.
The discussion regarding this can be found here:
https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
we explored an alternative to the full revert here:
https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/
unfortunately that didn't lead anywhere useful, so Rafael went with the
full revert you're seeing now.
Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
will highly depend on your platform, but also your workload, Jasper Lake
in particular seems to favor deep idle states even when they don't seem
to be a 'good' choice from a purely cpuidle (governor) perspective, so
we're kind of stuck with that.
For teo we've discussed a tunable knob in the past, which comes naturally with
the logic, for menu there's nothing obvious that would be comparable.
But for teo such a knob didn't generate any further interest (so far).
That's the status, unless I missed anything?
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-03 16:44 ` Christian Loehle
@ 2025-12-03 22:30 ` Doug Smythies
2025-12-08 11:33 ` Harshvardhan Jha
0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2025-12-03 22:30 UTC (permalink / raw)
To: 'Harshvardhan Jha'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, Doug Smythies, 'Christian Loehle',
'Rafael J. Wysocki', 'Daniel Lezcano'
[-- Attachment #1: Type: text/plain, Size: 3917 bytes --]
On 2025.12.03 08:45 Christian Loehle wrote:
> On 12/3/25 16:18, Harshvardhan Jha wrote:
>> Hi there,
>>
>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>> observed that several regressions across different benchmarks is being
>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>> automated bisect on both of them narrowed down the culprit commit to:
>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>> information" for 5.15
>>
>> Regressions on 5.15.196 include:
>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>> thread & 1M buffer size on OnPrem X6-2
>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>> 1 thread & 1M buffer size on OnPrem X6-2
>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>
>> The culprit commits' messages mention that these reverts were done due
>> to performance regressions introduced in Intel Jasper Lake systems but
>> this revert is causing issues in other systems unfortunately. I wanted
>> to know the maintainers' opinion on how we should proceed in order to
>> fix this. If we reapply it'll bring back the previous regressions on
>> Jasper Lake systems and if we don't revert it then it's stuck with
>> current regressions. If this problem has been reported before and a fix
>> is in the works then please let me know I shall follow developments to
>> that mail thread.
>
> The discussion regarding this can be found here:
> https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
> we explored an alternative to the full revert here:
> https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/
> unfortunately that didn't lead anywhere useful, so Rafael went with the
> full revert you're seeing now.
>
> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> will highly depend on your platform, but also your workload, Jasper Lake
> in particular seems to favor deep idle states even when they don't seem
> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> we're kind of stuck with that.
>
> For teo we've discussed a tunable knob in the past, which comes naturally with
> the logic, for menu there's nothing obvious that would be comparable.
> But for teo such a knob didn't generate any further interest (so far).
>
> That's the status, unless I missed anything?
By reading everything in the links Chrsitian provided, you can see
that we had difficulties repeating test results on other platforms.
Of the tests listed herein, the only one that was easy to repeat on my
test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
Kernel 6.18 Reverted
pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
performance performance performance performance
test what ave ave ave ave
1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
Where:
T/C means: Threads / Copies
performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
Data points are in Seconds.
Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
The reversion was manually applied to kernel 6.18-rc1 for that test.
The reversion was included in kernel 6.18-rc3.
Kernel 6.18-rc4 had another code change to menu.c
In case the formatting gets messed up, the table is also attached.
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
HWP: Enabled.
... Doug
[-- Attachment #2: sqlite-test-table.png --]
[-- Type: image/png, Size: 21984 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-03 22:30 ` Doug Smythies
@ 2025-12-08 11:33 ` Harshvardhan Jha
2025-12-08 12:47 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2025-12-08 11:33 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Christian Loehle', 'Rafael J. Wysocki',
'Daniel Lezcano'
Hi Doug,
On 04/12/25 4:00 AM, Doug Smythies wrote:
> On 2025.12.03 08:45 Christian Loehle wrote:
>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>> Hi there,
>>>
>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>> observed that several regressions across different benchmarks is being
>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>> automated bisect on both of them narrowed down the culprit commit to:
>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>> information" for 5.15
>>>
>>> Regressions on 5.15.196 include:
>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>> thread & 1M buffer size on OnPrem X6-2
>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>> 1 thread & 1M buffer size on OnPrem X6-2
>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>
>>> The culprit commits' messages mention that these reverts were done due
>>> to performance regressions introduced in Intel Jasper Lake systems but
>>> this revert is causing issues in other systems unfortunately. I wanted
>>> to know the maintainers' opinion on how we should proceed in order to
>>> fix this. If we reapply it'll bring back the previous regressions on
>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>> current regressions. If this problem has been reported before and a fix
>>> is in the works then please let me know I shall follow developments to
>>> that mail thread.
>> The discussion regarding this can be found here:
>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>> we explored an alternative to the full revert here:
>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>> full revert you're seeing now.
>>
>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>> will highly depend on your platform, but also your workload, Jasper Lake
>> in particular seems to favor deep idle states even when they don't seem
>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>> we're kind of stuck with that.
>>
>> For teo we've discussed a tunable knob in the past, which comes naturally with
>> the logic, for menu there's nothing obvious that would be comparable.
>> But for teo such a knob didn't generate any further interest (so far).
>>
>> That's the status, unless I missed anything?
> By reading everything in the links Chrsitian provided, you can see
> that we had difficulties repeating test results on other platforms.
>
> Of the tests listed herein, the only one that was easy to repeat on my
> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>
> Kernel 6.18 Reverted
> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
> performance performance performance performance
> test what ave ave ave ave
> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>
> Where:
> T/C means: Threads / Copies
> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> Data points are in Seconds.
> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> The reversion was manually applied to kernel 6.18-rc1 for that test.
> The reversion was included in kernel 6.18-rc3.
> Kernel 6.18-rc4 had another code change to menu.c
>
> In case the formatting gets messed up, the table is also attached.
>
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> HWP: Enabled.
I was able to recover performance on 5.15 and 5.4 LTS based kernels
after reapplying the revert on X6-2 systems.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
Stepping: 1
CPU(s) scaling MHz: 98%
CPU max MHz: 2600.0000
CPU min MHz: 1200.0000
BogoMIPS: 5188.26
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pg
e mca cmov pat pse36 clflush dts acpi mmx
fxsr sse
sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
lm cons
tant_tsc arch_perfmon pebs bts rep_good
nopl xtopol
ogy nonstop_tsc cpuid aperfmperf pni
pclmulqdq dtes
64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx
16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe po
pcnt tsc_deadline_timer aes xsave avx f16c
rdrand l
ahf_lm abm 3dnowprefetch cpuid_fault epb
cat_l3 cdp
_l3 pti intel_ppin ssbd ibrs ibpb stibp
tpr_shadow
flexpriority ept vpid ept_ad fsgsbase
tsc_adjust bm
i1 hle avx2 smep bmi2 erms invpcid rtm cqm
rdt_a rd
seed adx smap intel_pt xsaveopt cqm_llc
cqm_occup_l
lc cqm_mbm_total cqm_mbm_local dtherm arat
pln pts
vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (28 instances)
L1i: 896 KiB (28 instances)
L2: 7 MiB (28 instances)
L3: 70 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
Vulnerabilities:
Gather data sampling: Not affected
Indirect target selection: Not affected
Itlb multihit: KVM: Mitigation: Split huge pages
L1tf: Mitigation; PTE Inversion; VMX conditional
cache fl
ushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass
disabled via p
rctl
Spectre v1: Mitigation; usercopy/swapgs barriers and
__user poi
nter sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional;
IBRS_FW;
STIBP conditional; RSB filling; PBRSB-eIBRS
Not aff
ected; BHI Not affected
Srbds: Not affected
Tsa: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Vmscape: Mitigation; IBPB before exit to userspace
Thanks & Regards,
Harshvardhan
>
> ... Doug
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-08 11:33 ` Harshvardhan Jha
@ 2025-12-08 12:47 ` Christian Loehle
2026-01-13 7:06 ` Harshvardhan Jha
2026-01-27 15:45 ` Harshvardhan Jha
0 siblings, 2 replies; 44+ messages in thread
From: Christian Loehle @ 2025-12-08 12:47 UTC (permalink / raw)
To: Harshvardhan Jha, Doug Smythies
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano'
On 12/8/25 11:33, Harshvardhan Jha wrote:
> Hi Doug,
>
> On 04/12/25 4:00 AM, Doug Smythies wrote:
>> On 2025.12.03 08:45 Christian Loehle wrote:
>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>> Hi there,
>>>>
>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>> observed that several regressions across different benchmarks is being
>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>> information" for 5.15
>>>>
>>>> Regressions on 5.15.196 include:
>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>> thread & 1M buffer size on OnPrem X6-2
>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>
>>>> The culprit commits' messages mention that these reverts were done due
>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>> to know the maintainers' opinion on how we should proceed in order to
>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>> current regressions. If this problem has been reported before and a fix
>>>> is in the works then please let me know I shall follow developments to
>>>> that mail thread.
>>> The discussion regarding this can be found here:
>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>> we explored an alternative to the full revert here:
>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>> full revert you're seeing now.
>>>
>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>> will highly depend on your platform, but also your workload, Jasper Lake
>>> in particular seems to favor deep idle states even when they don't seem
>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>> we're kind of stuck with that.
>>>
>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>> the logic, for menu there's nothing obvious that would be comparable.
>>> But for teo such a knob didn't generate any further interest (so far).
>>>
>>> That's the status, unless I missed anything?
>> By reading everything in the links Chrsitian provided, you can see
>> that we had difficulties repeating test results on other platforms.
>>
>> Of the tests listed herein, the only one that was easy to repeat on my
>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>
>> Kernel 6.18 Reverted
>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>> performance performance performance performance
>> test what ave ave ave ave
>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>
>> Where:
>> T/C means: Threads / Copies
>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>> Data points are in Seconds.
>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>> The reversion was included in kernel 6.18-rc3.
>> Kernel 6.18-rc4 had another code change to menu.c
>>
>> In case the formatting gets messed up, the table is also attached.
>>
>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>> HWP: Enabled.
>
> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> after reapplying the revert on X6-2 systems.
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 46 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 56
> On-line CPU(s) list: 0-55
> Vendor ID: GenuineIntel
> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> CPU family: 6
> Model: 79
> Thread(s) per core: 2
> Core(s) per socket: 14
> Socket(s): 2
> Stepping: 1
> CPU(s) scaling MHz: 98%
> CPU max MHz: 2600.0000
> CPU min MHz: 1200.0000
> BogoMIPS: 5188.26
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pg
> e mca cmov pat pse36 clflush dts acpi mmx
> fxsr sse
> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> lm cons
> tant_tsc arch_perfmon pebs bts rep_good
> nopl xtopol
> ogy nonstop_tsc cpuid aperfmperf pni
> pclmulqdq dtes
> 64 monitor ds_cpl vmx smx est tm2 ssse3
> sdbg fma cx
> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> movbe po
> pcnt tsc_deadline_timer aes xsave avx f16c
> rdrand l
> ahf_lm abm 3dnowprefetch cpuid_fault epb
> cat_l3 cdp
> _l3 pti intel_ppin ssbd ibrs ibpb stibp
> tpr_shadow
> flexpriority ept vpid ept_ad fsgsbase
> tsc_adjust bm
> i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> rdt_a rd
> seed adx smap intel_pt xsaveopt cqm_llc
> cqm_occup_l
> lc cqm_mbm_total cqm_mbm_local dtherm arat
> pln pts
> vnmi md_clear flush_l1d
> Virtualization features:
> Virtualization: VT-x
> Caches (sum of all):
> L1d: 896 KiB (28 instances)
> L1i: 896 KiB (28 instances)
> L2: 7 MiB (28 instances)
> L3: 70 MiB (2 instances)
> NUMA:
> NUMA node(s): 2
> NUMA node0 CPU(s): 0-13,28-41
> NUMA node1 CPU(s): 14-27,42-55
> Vulnerabilities:
> Gather data sampling: Not affected
> Indirect target selection: Not affected
> Itlb multihit: KVM: Mitigation: Split huge pages
> L1tf: Mitigation; PTE Inversion; VMX conditional
> cache fl
> ushes, SMT vulnerable
> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
> Meltdown: Mitigation; PTI
> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
> Reg file data sampling: Not affected
> Retbleed: Not affected
> Spec rstack overflow: Not affected
> Spec store bypass: Mitigation; Speculative Store Bypass
> disabled via p
> rctl
> Spectre v1: Mitigation; usercopy/swapgs barriers and
> __user poi
> nter sanitization
> Spectre v2: Mitigation; Retpolines; IBPB conditional;
> IBRS_FW;
> STIBP conditional; RSB filling; PBRSB-eIBRS
> Not aff
> ected; BHI Not affected
> Srbds: Not affected
> Tsa: Not affected
> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
> Vmscape: Mitigation; IBPB before exit to userspace
>
It would be nice to get the idle states here, ideally how the states' usage changed
from base to revert.
The mentioned thread did this and should show how it can be done, but a dump of
cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
before and after the workload is usually fine to work with:
https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-08 12:47 ` Christian Loehle
@ 2026-01-13 7:06 ` Harshvardhan Jha
2026-01-13 14:13 ` Rafael J. Wysocki
2026-01-27 15:45 ` Harshvardhan Jha
1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-13 7:06 UTC (permalink / raw)
To: Christian Loehle, Doug Smythies
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano'
Hi Crhistian,
On 08/12/25 6:17 PM, Christian Loehle wrote:
> On 12/8/25 11:33, Harshvardhan Jha wrote:
>> Hi Doug,
>>
>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> Hi there,
>>>>>
>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>> observed that several regressions across different benchmarks is being
>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>> information" for 5.15
>>>>>
>>>>> Regressions on 5.15.196 include:
>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>
>>>>> The culprit commits' messages mention that these reverts were done due
>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>> current regressions. If this problem has been reported before and a fix
>>>>> is in the works then please let me know I shall follow developments to
>>>>> that mail thread.
>>>> The discussion regarding this can be found here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>>> we explored an alternative to the full revert here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>> full revert you're seeing now.
>>>>
>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>> in particular seems to favor deep idle states even when they don't seem
>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>> we're kind of stuck with that.
>>>>
>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>
>>>> That's the status, unless I missed anything?
>>> By reading everything in the links Chrsitian provided, you can see
>>> that we had difficulties repeating test results on other platforms.
>>>
>>> Of the tests listed herein, the only one that was easy to repeat on my
>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>
>>> Kernel 6.18 Reverted
>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>>> performance performance performance performance
>>> test what ave ave ave ave
>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>>
>>> Where:
>>> T/C means: Threads / Copies
>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>> Data points are in Seconds.
>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>> The reversion was included in kernel 6.18-rc3.
>>> Kernel 6.18-rc4 had another code change to menu.c
>>>
>>> In case the formatting gets messed up, the table is also attached.
>>>
>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>> HWP: Enabled.
>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>> after reapplying the revert on X6-2 systems.
>>
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Address sizes: 46 bits physical, 48 bits virtual
>> Byte Order: Little Endian
>> CPU(s): 56
>> On-line CPU(s) list: 0-55
>> Vendor ID: GenuineIntel
>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>> CPU family: 6
>> Model: 79
>> Thread(s) per core: 2
>> Core(s) per socket: 14
>> Socket(s): 2
>> Stepping: 1
>> CPU(s) scaling MHz: 98%
>> CPU max MHz: 2600.0000
>> CPU min MHz: 1200.0000
>> BogoMIPS: 5188.26
>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pg
>> e mca cmov pat pse36 clflush dts acpi mmx
>> fxsr sse
>> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
>> lm cons
>> tant_tsc arch_perfmon pebs bts rep_good
>> nopl xtopol
>> ogy nonstop_tsc cpuid aperfmperf pni
>> pclmulqdq dtes
>> 64 monitor ds_cpl vmx smx est tm2 ssse3
>> sdbg fma cx
>> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>> movbe po
>> pcnt tsc_deadline_timer aes xsave avx f16c
>> rdrand l
>> ahf_lm abm 3dnowprefetch cpuid_fault epb
>> cat_l3 cdp
>> _l3 pti intel_ppin ssbd ibrs ibpb stibp
>> tpr_shadow
>> flexpriority ept vpid ept_ad fsgsbase
>> tsc_adjust bm
>> i1 hle avx2 smep bmi2 erms invpcid rtm cqm
>> rdt_a rd
>> seed adx smap intel_pt xsaveopt cqm_llc
>> cqm_occup_l
>> lc cqm_mbm_total cqm_mbm_local dtherm arat
>> pln pts
>> vnmi md_clear flush_l1d
>> Virtualization features:
>> Virtualization: VT-x
>> Caches (sum of all):
>> L1d: 896 KiB (28 instances)
>> L1i: 896 KiB (28 instances)
>> L2: 7 MiB (28 instances)
>> L3: 70 MiB (2 instances)
>> NUMA:
>> NUMA node(s): 2
>> NUMA node0 CPU(s): 0-13,28-41
>> NUMA node1 CPU(s): 14-27,42-55
>> Vulnerabilities:
>> Gather data sampling: Not affected
>> Indirect target selection: Not affected
>> Itlb multihit: KVM: Mitigation: Split huge pages
>> L1tf: Mitigation; PTE Inversion; VMX conditional
>> cache fl
>> ushes, SMT vulnerable
>> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
>> Meltdown: Mitigation; PTI
>> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
>> Reg file data sampling: Not affected
>> Retbleed: Not affected
>> Spec rstack overflow: Not affected
>> Spec store bypass: Mitigation; Speculative Store Bypass
>> disabled via p
>> rctl
>> Spectre v1: Mitigation; usercopy/swapgs barriers and
>> __user poi
>> nter sanitization
>> Spectre v2: Mitigation; Retpolines; IBPB conditional;
>> IBRS_FW;
>> STIBP conditional; RSB filling; PBRSB-eIBRS
>> Not aff
>> ected; BHI Not affected
>> Srbds: Not affected
>> Tsa: Not affected
>> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
>> Vmscape: Mitigation; IBPB before exit to userspace
>>
> It would be nice to get the idle states here, ideally how the states' usage changed
> from base to revert.
> The mentioned thread did this and should show how it can be done, but a dump of
> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> before and after the workload is usually fine to work with:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
Bumping this as I discovered this issue on 6.12 stable branch also. The
reapplication seems inevitable. I shall get back to you with these
details also.
Thanks & Regards,
Harshvardhan
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-13 7:06 ` Harshvardhan Jha
@ 2026-01-13 14:13 ` Rafael J. Wysocki
2026-01-13 14:18 ` Rafael J. Wysocki
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-13 14:13 UTC (permalink / raw)
To: Harshvardhan Jha
Cc: Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman,
linux-pm, stable, Rafael J. Wysocki, Daniel Lezcano
On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha
<harshvardhan.j.jha@oracle.com> wrote:
>
> Hi Crhistian,
>
> On 08/12/25 6:17 PM, Christian Loehle wrote:
> > On 12/8/25 11:33, Harshvardhan Jha wrote:
> >> Hi Doug,
> >>
> >> On 04/12/25 4:00 AM, Doug Smythies wrote:
> >>> On 2025.12.03 08:45 Christian Loehle wrote:
> >>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> >>>>> Hi there,
> >>>>>
> >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
> >>>>> observed that several regressions across different benchmarks is being
> >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
> >>>>> automated bisect on both of them narrowed down the culprit commit to:
> >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> >>>>> information" for 5.15
> >>>>>
> >>>>> Regressions on 5.15.196 include:
> >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> >>>>> thread & 1M buffer size on OnPrem X6-2
> >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> >>>>> 1 thread & 1M buffer size on OnPrem X6-2
> >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
> >>>>>
> >>>>> The culprit commits' messages mention that these reverts were done due
> >>>>> to performance regressions introduced in Intel Jasper Lake systems but
> >>>>> this revert is causing issues in other systems unfortunately. I wanted
> >>>>> to know the maintainers' opinion on how we should proceed in order to
> >>>>> fix this. If we reapply it'll bring back the previous regressions on
> >>>>> Jasper Lake systems and if we don't revert it then it's stuck with
> >>>>> current regressions. If this problem has been reported before and a fix
> >>>>> is in the works then please let me know I shall follow developments to
> >>>>> that mail thread.
> >>>> The discussion regarding this can be found here:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
> >>>> we explored an alternative to the full revert here:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
> >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
> >>>> full revert you're seeing now.
> >>>>
> >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> >>>> will highly depend on your platform, but also your workload, Jasper Lake
> >>>> in particular seems to favor deep idle states even when they don't seem
> >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> >>>> we're kind of stuck with that.
> >>>>
> >>>> For teo we've discussed a tunable knob in the past, which comes naturally with
> >>>> the logic, for menu there's nothing obvious that would be comparable.
> >>>> But for teo such a knob didn't generate any further interest (so far).
> >>>>
> >>>> That's the status, unless I missed anything?
> >>> By reading everything in the links Chrsitian provided, you can see
> >>> that we had difficulties repeating test results on other platforms.
> >>>
> >>> Of the tests listed herein, the only one that was easy to repeat on my
> >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
> >>>
> >>> Kernel 6.18 Reverted
> >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
> >>> performance performance performance performance
> >>> test what ave ave ave ave
> >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
> >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
> >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
> >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
> >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
> >>>
> >>> Where:
> >>> T/C means: Threads / Copies
> >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> >>> Data points are in Seconds.
> >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> >>> The reversion was manually applied to kernel 6.18-rc1 for that test.
> >>> The reversion was included in kernel 6.18-rc3.
> >>> Kernel 6.18-rc4 had another code change to menu.c
> >>>
> >>> In case the formatting gets messed up, the table is also attached.
> >>>
> >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> >>> HWP: Enabled.
> >> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> >> after reapplying the revert on X6-2 systems.
> >>
> >> Architecture: x86_64
> >> CPU op-mode(s): 32-bit, 64-bit
> >> Address sizes: 46 bits physical, 48 bits virtual
> >> Byte Order: Little Endian
> >> CPU(s): 56
> >> On-line CPU(s) list: 0-55
> >> Vendor ID: GenuineIntel
> >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >> CPU family: 6
> >> Model: 79
> >> Thread(s) per core: 2
> >> Core(s) per socket: 14
> >> Socket(s): 2
> >> Stepping: 1
> >> CPU(s) scaling MHz: 98%
> >> CPU max MHz: 2600.0000
> >> CPU min MHz: 1200.0000
> >> BogoMIPS: 5188.26
> >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> >> mtrr pg
> >> e mca cmov pat pse36 clflush dts acpi mmx
> >> fxsr sse
> >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> >> lm cons
> >> tant_tsc arch_perfmon pebs bts rep_good
> >> nopl xtopol
> >> ogy nonstop_tsc cpuid aperfmperf pni
> >> pclmulqdq dtes
> >> 64 monitor ds_cpl vmx smx est tm2 ssse3
> >> sdbg fma cx
> >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> >> movbe po
> >> pcnt tsc_deadline_timer aes xsave avx f16c
> >> rdrand l
> >> ahf_lm abm 3dnowprefetch cpuid_fault epb
> >> cat_l3 cdp
> >> _l3 pti intel_ppin ssbd ibrs ibpb stibp
> >> tpr_shadow
> >> flexpriority ept vpid ept_ad fsgsbase
> >> tsc_adjust bm
> >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> >> rdt_a rd
> >> seed adx smap intel_pt xsaveopt cqm_llc
> >> cqm_occup_l
> >> lc cqm_mbm_total cqm_mbm_local dtherm arat
> >> pln pts
> >> vnmi md_clear flush_l1d
> >> Virtualization features:
> >> Virtualization: VT-x
> >> Caches (sum of all):
> >> L1d: 896 KiB (28 instances)
> >> L1i: 896 KiB (28 instances)
> >> L2: 7 MiB (28 instances)
> >> L3: 70 MiB (2 instances)
> >> NUMA:
> >> NUMA node(s): 2
> >> NUMA node0 CPU(s): 0-13,28-41
> >> NUMA node1 CPU(s): 14-27,42-55
> >> Vulnerabilities:
> >> Gather data sampling: Not affected
> >> Indirect target selection: Not affected
> >> Itlb multihit: KVM: Mitigation: Split huge pages
> >> L1tf: Mitigation; PTE Inversion; VMX conditional
> >> cache fl
> >> ushes, SMT vulnerable
> >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
> >> Meltdown: Mitigation; PTI
> >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
> >> Reg file data sampling: Not affected
> >> Retbleed: Not affected
> >> Spec rstack overflow: Not affected
> >> Spec store bypass: Mitigation; Speculative Store Bypass
> >> disabled via p
> >> rctl
> >> Spectre v1: Mitigation; usercopy/swapgs barriers and
> >> __user poi
> >> nter sanitization
> >> Spectre v2: Mitigation; Retpolines; IBPB conditional;
> >> IBRS_FW;
> >> STIBP conditional; RSB filling; PBRSB-eIBRS
> >> Not aff
> >> ected; BHI Not affected
> >> Srbds: Not affected
> >> Tsa: Not affected
> >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
> >> Vmscape: Mitigation; IBPB before exit to userspace
> >>
> > It would be nice to get the idle states here, ideally how the states' usage changed
> > from base to revert.
> > The mentioned thread did this and should show how it can be done, but a dump of
> > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> > before and after the workload is usually fine to work with:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>
> Bumping this as I discovered this issue on 6.12 stable branch also. The
> reapplication seems inevitable. I shall get back to you with these
> details also.
Yes, please, because I have another reason to restore the reverted commit.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-13 14:13 ` Rafael J. Wysocki
@ 2026-01-13 14:18 ` Rafael J. Wysocki
2026-01-14 4:28 ` Sergey Senozhatsky
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-13 14:18 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin,
Greg Kroah-Hartman, linux-pm, stable, Rafael J. Wysocki,
Daniel Lezcano
On Tue, Jan 13, 2026 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha
> <harshvardhan.j.jha@oracle.com> wrote:
> >
> > Hi Crhistian,
> >
> > On 08/12/25 6:17 PM, Christian Loehle wrote:
> > > On 12/8/25 11:33, Harshvardhan Jha wrote:
> > >> Hi Doug,
> > >>
> > >> On 04/12/25 4:00 AM, Doug Smythies wrote:
> > >>> On 2025.12.03 08:45 Christian Loehle wrote:
> > >>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> > >>>>> Hi there,
> > >>>>>
> > >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
> > >>>>> observed that several regressions across different benchmarks is being
> > >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
> > >>>>> automated bisect on both of them narrowed down the culprit commit to:
> > >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
> > >>>>> information" for 5.15
> > >>>>>
> > >>>>> Regressions on 5.15.196 include:
> > >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
> > >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
> > >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
> > >>>>> thread & 1M buffer size on OnPrem X6-2
> > >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
> > >>>>> 1 thread & 1M buffer size on OnPrem X6-2
> > >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
> > >>>>>
> > >>>>> The culprit commits' messages mention that these reverts were done due
> > >>>>> to performance regressions introduced in Intel Jasper Lake systems but
> > >>>>> this revert is causing issues in other systems unfortunately. I wanted
> > >>>>> to know the maintainers' opinion on how we should proceed in order to
> > >>>>> fix this. If we reapply it'll bring back the previous regressions on
> > >>>>> Jasper Lake systems and if we don't revert it then it's stuck with
> > >>>>> current regressions. If this problem has been reported before and a fix
> > >>>>> is in the works then please let me know I shall follow developments to
> > >>>>> that mail thread.
> > >>>> The discussion regarding this can be found here:
> > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
> > >>>> we explored an alternative to the full revert here:
> > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
> > >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
> > >>>> full revert you're seeing now.
> > >>>>
> > >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
> > >>>> will highly depend on your platform, but also your workload, Jasper Lake
> > >>>> in particular seems to favor deep idle states even when they don't seem
> > >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
> > >>>> we're kind of stuck with that.
> > >>>>
> > >>>> For teo we've discussed a tunable knob in the past, which comes naturally with
> > >>>> the logic, for menu there's nothing obvious that would be comparable.
> > >>>> But for teo such a knob didn't generate any further interest (so far).
> > >>>>
> > >>>> That's the status, unless I missed anything?
> > >>> By reading everything in the links Chrsitian provided, you can see
> > >>> that we had difficulties repeating test results on other platforms.
> > >>>
> > >>> Of the tests listed herein, the only one that was easy to repeat on my
> > >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
> > >>>
> > >>> Kernel 6.18 Reverted
> > >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
> > >>> performance performance performance performance
> > >>> test what ave ave ave ave
> > >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
> > >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
> > >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
> > >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
> > >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
> > >>>
> > >>> Where:
> > >>> T/C means: Threads / Copies
> > >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
> > >>> Data points are in Seconds.
> > >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
> > >>> The reversion was manually applied to kernel 6.18-rc1 for that test.
> > >>> The reversion was included in kernel 6.18-rc3.
> > >>> Kernel 6.18-rc4 had another code change to menu.c
> > >>>
> > >>> In case the formatting gets messed up, the table is also attached.
> > >>>
> > >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
> > >>> HWP: Enabled.
> > >> I was able to recover performance on 5.15 and 5.4 LTS based kernels
> > >> after reapplying the revert on X6-2 systems.
> > >>
> > >> Architecture: x86_64
> > >> CPU op-mode(s): 32-bit, 64-bit
> > >> Address sizes: 46 bits physical, 48 bits virtual
> > >> Byte Order: Little Endian
> > >> CPU(s): 56
> > >> On-line CPU(s) list: 0-55
> > >> Vendor ID: GenuineIntel
> > >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> > >> CPU family: 6
> > >> Model: 79
> > >> Thread(s) per core: 2
> > >> Core(s) per socket: 14
> > >> Socket(s): 2
> > >> Stepping: 1
> > >> CPU(s) scaling MHz: 98%
> > >> CPU max MHz: 2600.0000
> > >> CPU min MHz: 1200.0000
> > >> BogoMIPS: 5188.26
> > >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
> > >> mtrr pg
> > >> e mca cmov pat pse36 clflush dts acpi mmx
> > >> fxsr sse
> > >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> > >> lm cons
> > >> tant_tsc arch_perfmon pebs bts rep_good
> > >> nopl xtopol
> > >> ogy nonstop_tsc cpuid aperfmperf pni
> > >> pclmulqdq dtes
> > >> 64 monitor ds_cpl vmx smx est tm2 ssse3
> > >> sdbg fma cx
> > >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> > >> movbe po
> > >> pcnt tsc_deadline_timer aes xsave avx f16c
> > >> rdrand l
> > >> ahf_lm abm 3dnowprefetch cpuid_fault epb
> > >> cat_l3 cdp
> > >> _l3 pti intel_ppin ssbd ibrs ibpb stibp
> > >> tpr_shadow
> > >> flexpriority ept vpid ept_ad fsgsbase
> > >> tsc_adjust bm
> > >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm
> > >> rdt_a rd
> > >> seed adx smap intel_pt xsaveopt cqm_llc
> > >> cqm_occup_l
> > >> lc cqm_mbm_total cqm_mbm_local dtherm arat
> > >> pln pts
> > >> vnmi md_clear flush_l1d
> > >> Virtualization features:
> > >> Virtualization: VT-x
> > >> Caches (sum of all):
> > >> L1d: 896 KiB (28 instances)
> > >> L1i: 896 KiB (28 instances)
> > >> L2: 7 MiB (28 instances)
> > >> L3: 70 MiB (2 instances)
> > >> NUMA:
> > >> NUMA node(s): 2
> > >> NUMA node0 CPU(s): 0-13,28-41
> > >> NUMA node1 CPU(s): 14-27,42-55
> > >> Vulnerabilities:
> > >> Gather data sampling: Not affected
> > >> Indirect target selection: Not affected
> > >> Itlb multihit: KVM: Mitigation: Split huge pages
> > >> L1tf: Mitigation; PTE Inversion; VMX conditional
> > >> cache fl
> > >> ushes, SMT vulnerable
> > >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
> > >> Meltdown: Mitigation; PTI
> > >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
> > >> Reg file data sampling: Not affected
> > >> Retbleed: Not affected
> > >> Spec rstack overflow: Not affected
> > >> Spec store bypass: Mitigation; Speculative Store Bypass
> > >> disabled via p
> > >> rctl
> > >> Spectre v1: Mitigation; usercopy/swapgs barriers and
> > >> __user poi
> > >> nter sanitization
> > >> Spectre v2: Mitigation; Retpolines; IBPB conditional;
> > >> IBRS_FW;
> > >> STIBP conditional; RSB filling; PBRSB-eIBRS
> > >> Not aff
> > >> ected; BHI Not affected
> > >> Srbds: Not affected
> > >> Tsa: Not affected
> > >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
> > >> Vmscape: Mitigation; IBPB before exit to userspace
> > >>
> > > It would be nice to get the idle states here, ideally how the states' usage changed
> > > from base to revert.
> > > The mentioned thread did this and should show how it can be done, but a dump of
> > > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> > > before and after the workload is usually fine to work with:
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> >
> > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > reapplication seems inevitable. I shall get back to you with these
> > details also.
>
> Yes, please, because I have another reason to restore the reverted commit.
Sergey, did you see a performance regression from 85975daeaa4d
("cpuidle: menu: Avoid discarding useful information") on any
platforms other than the Jasper Lake it was reported for?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-13 14:18 ` Rafael J. Wysocki
@ 2026-01-14 4:28 ` Sergey Senozhatsky
2026-01-14 4:49 ` Sergey Senozhatsky
0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-01-14 4:28 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Sergey Senozhatsky, Harshvardhan Jha, Christian Loehle,
Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
Daniel Lezcano
Hi,
On (26/01/13 15:18), Rafael J. Wysocki wrote:
[..]
> > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > reapplication seems inevitable. I shall get back to you with these
> > > details also.
> >
> > Yes, please, because I have another reason to restore the reverted commit.
>
> Sergey, did you see a performance regression from 85975daeaa4d
> ("cpuidle: menu: Avoid discarding useful information") on any
> platforms other than the Jasper Lake it was reported for?
Let me try to dig it up. I think I saw regressions on a number of
devices:
---
cpu family : 6
model : 122
model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
---
cpu family : 6
model : 122
model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
---
cpu family : 6
model : 156
model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
---
cpu family : 6
model : 156
model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
---
cpu family : 6
model : 156
model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
I guess family 6/model 122 is not Jasper Lake?
I also saw some where the patch in question seemed to improve the
metrics, but regressions are more important, so the revert simply
put all of the boards back to the previous state.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-14 4:28 ` Sergey Senozhatsky
@ 2026-01-14 4:49 ` Sergey Senozhatsky
2026-01-14 5:15 ` Tomasz Figa
2026-01-29 22:47 ` Doug Smythies
0 siblings, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-01-14 4:49 UTC (permalink / raw)
To: Tomasz Figa
Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle,
Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
Daniel Lezcano, Sergey Senozhatsky
Cc-ing Tomasz
On (26/01/14 13:28), Sergey Senozhatsky wrote:
> Hi,
>
> On (26/01/13 15:18), Rafael J. Wysocki wrote:
> [..]
> > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > reapplication seems inevitable. I shall get back to you with these
> > > > details also.
> > >
> > > Yes, please, because I have another reason to restore the reverted commit.
> >
> > Sergey, did you see a performance regression from 85975daeaa4d
> > ("cpuidle: menu: Avoid discarding useful information") on any
> > platforms other than the Jasper Lake it was reported for?
>
> Let me try to dig it up. I think I saw regressions on a number of
> devices:
>
> ---
> cpu family : 6
> model : 122
> model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
> ---
> cpu family : 6
> model : 122
> model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
> ---
> cpu family : 6
> model : 156
> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
> ---
> cpu family : 6
> model : 156
> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
> ---
> cpu family : 6
> model : 156
> model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
>
>
> I guess family 6/model 122 is not Jasper Lake?
>
> I also saw some where the patch in question seemed to improve the
> metrics, but regressions are more important, so the revert simply
> put all of the boards back to the previous state.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-14 4:49 ` Sergey Senozhatsky
@ 2026-01-14 5:15 ` Tomasz Figa
2026-01-14 20:07 ` Rafael J. Wysocki
2026-01-29 22:47 ` Doug Smythies
1 sibling, 1 reply; 44+ messages in thread
From: Tomasz Figa @ 2026-01-14 5:15 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle,
Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable,
Daniel Lezcano
Hi all,
On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Cc-ing Tomasz
>
> On (26/01/14 13:28), Sergey Senozhatsky wrote:
> > Hi,
> >
> > On (26/01/13 15:18), Rafael J. Wysocki wrote:
> > [..]
> > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > > reapplication seems inevitable. I shall get back to you with these
> > > > > details also.
> > > >
> > > > Yes, please, because I have another reason to restore the reverted commit.
Is the performance difference the reporter observed an actual
regression, or is it just a return to the level before the
optimization was merged into stable branches? If the latter, shouldn't
avoiding regressions be a priority over further optimizing for other
users?
If there is a really strong desire to reland this optimization, could
it at least be applied selectively to the CPUs that it's known to
help, or alternatively, made configurable?
Best,
Tomasz
> > >
> > > Sergey, did you see a performance regression from 85975daeaa4d
> > > ("cpuidle: menu: Avoid discarding useful information") on any
> > > platforms other than the Jasper Lake it was reported for?
> >
> > Let me try to dig it up. I think I saw regressions on a number of
> > devices:
> >
> > ---
> > cpu family : 6
> > model : 122
> > model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
> > ---
> > cpu family : 6
> > model : 122
> > model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
> > ---
> > cpu family : 6
> > model : 156
> > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
> > ---
> > cpu family : 6
> > model : 156
> > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
> > ---
> > cpu family : 6
> > model : 156
> > model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
> >
> >
> > I guess family 6/model 122 is not Jasper Lake?
> >
> > I also saw some where the patch in question seemed to improve the
> > metrics, but regressions are more important, so the revert simply
> > put all of the boards back to the previous state.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-14 5:15 ` Tomasz Figa
@ 2026-01-14 20:07 ` Rafael J. Wysocki
2026-01-29 10:23 ` Harshvardhan Jha
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-14 20:07 UTC (permalink / raw)
To: Tomasz Figa, Harshvardhan Jha
Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin,
Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano
On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi all,
>
> On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > Cc-ing Tomasz
> >
> > On (26/01/14 13:28), Sergey Senozhatsky wrote:
> > > Hi,
> > >
> > > On (26/01/13 15:18), Rafael J. Wysocki wrote:
> > > [..]
> > > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The
> > > > > > reapplication seems inevitable. I shall get back to you with these
> > > > > > details also.
> > > > >
> > > > > Yes, please, because I have another reason to restore the reverted commit.
>
> Is the performance difference the reporter observed an actual
> regression, or is it just a return to the level before the
> optimization was merged into stable branches?
Good question.
Harshvardhan, which one is the case?
> If the latter, shouldn't avoiding regressions be a priority over further optimizing for other
> users?
>
> If there is a really strong desire to reland this optimization, could
> it at least be applied selectively to the CPUs that it's known to
> help, or alternatively, made configurable?
That wouldn't be easy in practice, but I think that it may be
compensated by reducing the target residency values of the deepest
idle states on those systems.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2025-12-08 12:47 ` Christian Loehle
2026-01-13 7:06 ` Harshvardhan Jha
@ 2026-01-27 15:45 ` Harshvardhan Jha
2026-01-28 5:06 ` Doug Smythies
1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-27 15:45 UTC (permalink / raw)
To: Christian Loehle, Doug Smythies
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano'
[-- Attachment #1: Type: text/plain, Size: 10297 bytes --]
Hi Christian,
On 08/12/25 6:17 PM, Christian Loehle wrote:
> On 12/8/25 11:33, Harshvardhan Jha wrote:
>> Hi Doug,
>>
>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> Hi there,
>>>>>
>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>> observed that several regressions across different benchmarks is being
>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>> information" for 5.15
>>>>>
>>>>> Regressions on 5.15.196 include:
>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>
>>>>> The culprit commits' messages mention that these reverts were done due
>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>> current regressions. If this problem has been reported before and a fix
>>>>> is in the works then please let me know I shall follow developments to
>>>>> that mail thread.
>>>> The discussion regarding this can be found here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>>> we explored an alternative to the full revert here:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>> full revert you're seeing now.
>>>>
>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>> in particular seems to favor deep idle states even when they don't seem
>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>> we're kind of stuck with that.
>>>>
>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>
>>>> That's the status, unless I missed anything?
>>> By reading everything in the links Chrsitian provided, you can see
>>> that we had difficulties repeating test results on other platforms.
>>>
>>> Of the tests listed herein, the only one that was easy to repeat on my
>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>
>>> Kernel 6.18 Reverted
>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>>> performance performance performance performance
>>> test what ave ave ave ave
>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>>
>>> Where:
>>> T/C means: Threads / Copies
>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>> Data points are in Seconds.
>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>> The reversion was included in kernel 6.18-rc3.
>>> Kernel 6.18-rc4 had another code change to menu.c
>>>
>>> In case the formatting gets messed up, the table is also attached.
>>>
>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>> HWP: Enabled.
>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>> after reapplying the revert on X6-2 systems.
>>
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> Address sizes: 46 bits physical, 48 bits virtual
>> Byte Order: Little Endian
>> CPU(s): 56
>> On-line CPU(s) list: 0-55
>> Vendor ID: GenuineIntel
>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>> CPU family: 6
>> Model: 79
>> Thread(s) per core: 2
>> Core(s) per socket: 14
>> Socket(s): 2
>> Stepping: 1
>> CPU(s) scaling MHz: 98%
>> CPU max MHz: 2600.0000
>> CPU min MHz: 1200.0000
>> BogoMIPS: 5188.26
>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pg
>> e mca cmov pat pse36 clflush dts acpi mmx
>> fxsr sse
>> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
>> lm cons
>> tant_tsc arch_perfmon pebs bts rep_good
>> nopl xtopol
>> ogy nonstop_tsc cpuid aperfmperf pni
>> pclmulqdq dtes
>> 64 monitor ds_cpl vmx smx est tm2 ssse3
>> sdbg fma cx
>> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>> movbe po
>> pcnt tsc_deadline_timer aes xsave avx f16c
>> rdrand l
>> ahf_lm abm 3dnowprefetch cpuid_fault epb
>> cat_l3 cdp
>> _l3 pti intel_ppin ssbd ibrs ibpb stibp
>> tpr_shadow
>> flexpriority ept vpid ept_ad fsgsbase
>> tsc_adjust bm
>> i1 hle avx2 smep bmi2 erms invpcid rtm cqm
>> rdt_a rd
>> seed adx smap intel_pt xsaveopt cqm_llc
>> cqm_occup_l
>> lc cqm_mbm_total cqm_mbm_local dtherm arat
>> pln pts
>> vnmi md_clear flush_l1d
>> Virtualization features:
>> Virtualization: VT-x
>> Caches (sum of all):
>> L1d: 896 KiB (28 instances)
>> L1i: 896 KiB (28 instances)
>> L2: 7 MiB (28 instances)
>> L3: 70 MiB (2 instances)
>> NUMA:
>> NUMA node(s): 2
>> NUMA node0 CPU(s): 0-13,28-41
>> NUMA node1 CPU(s): 14-27,42-55
>> Vulnerabilities:
>> Gather data sampling: Not affected
>> Indirect target selection: Not affected
>> Itlb multihit: KVM: Mitigation: Split huge pages
>> L1tf: Mitigation; PTE Inversion; VMX conditional
>> cache fl
>> ushes, SMT vulnerable
>> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
>> Meltdown: Mitigation; PTI
>> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
>> Reg file data sampling: Not affected
>> Retbleed: Not affected
>> Spec rstack overflow: Not affected
>> Spec store bypass: Mitigation; Speculative Store Bypass
>> disabled via p
>> rctl
>> Spectre v1: Mitigation; usercopy/swapgs barriers and
>> __user poi
>> nter sanitization
>> Spectre v2: Mitigation; Retpolines; IBPB conditional;
>> IBRS_FW;
>> STIBP conditional; RSB filling; PBRSB-eIBRS
>> Not aff
>> ected; BHI Not affected
>> Srbds: Not affected
>> Tsa: Not affected
>> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
>> Vmscape: Mitigation; IBPB before exit to userspace
>>
> It would be nice to get the idle states here, ideally how the states' usage changed
> from base to revert.
> The mentioned thread did this and should show how it can be done, but a dump of
> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> before and after the workload is usually fine to work with:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
Apologies for the late reply, I'm attaching a tar ball which has the cpu
states for the test suites before and after tests. The folders with the
name of the test contain two folders good-kernel and bad-kernel
containing two files having the before and after states. Please note
that different machines were used for different test suites due to
compatibility reasons. The jbb test was run using containers.
Thanks & Regards,
Harshvardhan
[-- Attachment #2: cpuidle.tar.gz --]
[-- Type: application/x-gzip, Size: 36882 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-27 15:45 ` Harshvardhan Jha
@ 2026-01-28 5:06 ` Doug Smythies
2026-01-28 23:53 ` Doug Smythies
0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-28 5:06 UTC (permalink / raw)
To: 'Harshvardhan Jha', 'Christian Loehle'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
Doug Smythies
[-- Attachment #1: Type: text/plain, Size: 8076 bytes --]
On 2026.01.27 07:45 Harshvardhan Jha wrote:
>On 08/12/25 6:17 PM, Christian Loehle wrote:
>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>
>>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>>> observed that several regressions across different benchmarks is being
>>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>>> information" for 5.15
>>>>>>
>>>>>> Regressions on 5.15.196 include:
>>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>>
>>>>>> The culprit commits' messages mention that these reverts were done due
>>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>>> current regressions. If this problem has been reported before and a fix
>>>>>> is in the works then please let me know I shall follow developments to
>>>>>> that mail thread.
>>>>> The discussion regarding this can be found here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>>>> we explored an alternative to the full revert here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>>> full revert you're seeing now.
>>>>>
>>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>>> in particular seems to favor deep idle states even when they don't seem
>>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>>> we're kind of stuck with that.
>>>>>
>>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>>
>>>>> That's the status, unless I missed anything?
>>>> By reading everything in the links Chrsitian provided, you can see
>>>> that we had difficulties repeating test results on other platforms.
>>>>
>>>> Of the tests listed herein, the only one that was easy to repeat on my
>>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>>
>>>> Kernel 6.18 Reverted
>>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>>>> performance performance performance performance
>>>> test what ave ave ave ave
>>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>>>
>>>> Where:
>>>> T/C means: Threads / Copies
>>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>>> Data points are in Seconds.
>>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>>> The reversion was included in kernel 6.18-rc3.
>>>> Kernel 6.18-rc4 had another code change to menu.c
>>>>
>>>> In case the formatting gets messed up, the table is also attached.
>>>>
>>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>>> HWP: Enabled.
>>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>>> after reapplying the revert on X6-2 systems.
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 46 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 56
>>> On-line CPU(s) list: 0-55
>>> Vendor ID: GenuineIntel
>>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>> CPU family: 6
>>> Model: 79
>>> Thread(s) per core: 2
>>> Core(s) per socket: 14
>>> Socket(s): 2
>>> Stepping: 1
>>> CPU(s) scaling MHz: 98%
>>> CPU max MHz: 2600.0000
>>> CPU min MHz: 1200.0000
>>> BogoMIPS: 5188.26
... snip ...
>> It would be nice to get the idle states here, ideally how the states' usage changed
>> from base to revert.
>> The mentioned thread did this and should show how it can be done, but a dump of
>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>> before and after the workload is usually fine to work with:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> states for the test suites before and after tests. The folders with the
> name of the test contain two folders good-kernel and bad-kernel
> containing two files having the before and after states. Please note
> that different machines were used for different test suites due to
> compatibility reasons. The jbb test was run using containers.
It is a considerable amount of work to manually extract and summarize the data.
I have only done it for the phoronix-sqlite data.
There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
I remember seeing a Linux-pm email about why but couldn't find it just now.
Summary (also attached as a PNG file, in case the formatting gets messed up):
The total idle entries (usage) and time seem low to me, which is why the ???.
phoronix-sqlite
Good Kernel: Time between samples 4 seconds (estimated and ???)
Usage Above Below Above Below
state 0 220 0 218 0.00% 99.09%
state 1 70212 5213 34602 7.42% 49.28%
state 2 30273 5237 1806 17.30% 5.97%
state 3 0 0 0 0.00% 0.00%
state 4 11824 2120 0 17.93% 0.00%
total 112529 12570 36626 43.72% <<< Misses %
Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
Usage Above Below Above Below
state 0 262 0 260 0.00% 99.24%
state 1 62751 3985 35588 6.35% 56.71%
state 2 24941 7896 1433 31.66% 5.75%
state 3 0 0 0 0.00% 0.00%
state 4 24489 11543 0 47.14% 0.00%
total 112443 23424 37281 53.99% <<< Misses %
Observe 2X use of idle state 4 for the "Bad Kernel"
I have a template now, and can summarize the other 40 CPU data
faster, but I would have to rework the template for the 56 CPU data,
and is it a 64 CPU data set at 4 idle states per CPU?
... Doug
[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-28 5:06 ` Doug Smythies
@ 2026-01-28 23:53 ` Doug Smythies
2026-01-29 22:27 ` Doug Smythies
0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-28 23:53 UTC (permalink / raw)
To: 'Harshvardhan Jha', 'Christian Loehle'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
Doug Smythies
[-- Attachment #1: Type: text/plain, Size: 6136 bytes --]
On 2026.01.27 21:07 Doug Smythies wrote:
> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
... snip ...
>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>> from base to revert.
>>> The mentioned thread did this and should show how it can be done, but a dump of
>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>> before and after the workload is usually fine to work with:
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>> states for the test suites before and after tests. The folders with the
>> name of the test contain two folders good-kernel and bad-kernel
>> containing two files having the before and after states. Please note
>> that different machines were used for different test suites due to
>> compatibility reasons. The jbb test was run using containers.
Please provide the results of the test runs that were done for
the supplied before and after idle data.
In particular, what is the "fio" test and it results. Its idle data is not very revealing.
Is it a test I can run on my test computer?
> It is a considerable amount of work to manually extract and summarize the data.
> I have only done it for the phoronix-sqlite data.
I have done the rest now, see below.
I have also attached the results, in case the formatting gets screwed up.
> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
> I remember seeing a Linux-pm email about why but couldn't find it just now.
> Summary (also attached as a PNG file, in case the formatting gets messed up):
> The total idle entries (usage) and time seem low to me, which is why the ???.
>
> phoronix-sqlite
> Good Kernel: Time between samples 4 seconds (estimated and ???)
> Usage Above Below Above Below
> state 0 220 0 218 0.00% 99.09%
> state 1 70212 5213 34602 7.42% 49.28%
> state 2 30273 5237 1806 17.30% 5.97%
> state 3 0 0 0 0.00% 0.00%
> state 4 11824 2120 0 17.93% 0.00%
>
> total 112529 12570 36626 43.72% <<< Misses %
>
> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
> Usage Above Below Above Below
> state 0 262 0 260 0.00% 99.24%
> state 1 62751 3985 35588 6.35% 56.71%
> state 2 24941 7896 1433 31.66% 5.75%
> state 3 0 0 0 0.00% 0.00%
> state 4 24489 11543 0 47.14% 0.00%
>
> total 112443 23424 37281 53.99% <<< Misses %
>
> Observe 2X use of idle state 4 for the "Bad Kernel"
>
> I have a template now, and can summarize the other 40 CPU data
> faster, but I would have to rework the template for the 56 CPU data,
> and is it a 64 CPU data set at 4 idle states per CPU?
jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
POLL, C1, C1E, C3 (disabled), C6
Good Kernel: Time between samples > 2 hours (estimated)
Usage Above Below Above Below
state 0 297550 0 296084 0.00% 99.51%
state 1 8062854 341043 4962635 4.23% 61.55%
state 2 56708358 12688379 6252051 22.37% 11.02%
state 3 0 0 0 0.00% 0.00%
state 4 54624476 15868752 0 29.05% 0.00%
total 119693238 28898174 11510770 33.76% <<< Misses
Bad Kernel: Time between samples > 2 hours (estimated)
Usage Above Below Above Below
state 0 90715 0 75134 0.00% 82.82%
state 1 8878738 312970 6082180 3.52% 68.50%
state 2 12048728 2576251 603316 21.38% 5.01%
state 3 0 0 0 0.00% 0.00%
state 4 85999424 44723273 0 52.00% 0.00%
total 107017605 47612494 6760630 50.81% <<< Misses
As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
fio
Good Kernel: Time between samples ~ 1 minute (estimated)
Usage Above Below Above Below
state 0 3822 0 3818 0.00% 99.90%
state 1 148640 4406 68956 2.96% 46.39%
state 2 593455 45344 105675 7.64% 17.81%
state 3 3209648 807014 0 25.14% 0.00%
total 3955565 856764 178449 26.17% <<< Misses
Bad Kernel: Time between samples ~ 1 minute (estimated)
Usage Above Below Above Below
state 0 916 0 756 0.00% 82.53%
state 1 80230 2028 42791 2.53% 53.34%
state 2 59231 6888 6791 11.63% 11.47%
state 3 2455784 564797 0 23.00% 0.00%
total 2596161 573713 50338 24.04% <<< Misses
It is not clear why the number of idle entries differs so much
between the tests, but there is a bit of a different distribution
of the workload among the CPUs.
rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
POLL, C1, C1E, C3 (disabled), C6
rds-stress-test
Good Kernel: Time between samples ~70 Seconds (estimated)
Usage Above Below Above Below
state 0 1561 0 1435 0.00% 91.93%
state 1 13855 899 2410 6.49% 17.39%
state 2 467998 139254 23679 29.76% 5.06%
state 3 0 0 0 0.00% 0.00%
state 4 213132 107417 0 50.40% 0.00%
total 696546 247570 27524 39.49% <<< Misses
Bad Kernel: Time between samples ~ 70 Seconds (estimated)
Usage Above Below Above Below
state 0 231 0 231 0.00% 100.00%
state 1 5413 266 1186 4.91% 21.91%
state 2 54365 719 3789 1.32% 6.97%
state 3 0 0 0 0.00% 0.00%
state 4 267055 148327 0 55.54% 0.00%
total 327064 149312 5206 47.24% <<< Misses
Again, differing numbers of idle entries between tests.
This time the load distribution between CPUs is more
obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
In the "Good" case the work is distributed over more CPUs.
I assume without proof, that the scheduler is deciding not to migrate
the next bit of work to another CPU in the one case verses the other.
... Doug
[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]
[-- Attachment #3: jbb-summary.png --]
[-- Type: image/png, Size: 38778 bytes --]
[-- Attachment #4: fio-summary.png --]
[-- Type: image/png, Size: 34474 bytes --]
[-- Attachment #5: rds-stress-summary.png --]
[-- Type: image/png, Size: 37087 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-14 20:07 ` Rafael J. Wysocki
@ 2026-01-29 10:23 ` Harshvardhan Jha
0 siblings, 0 replies; 44+ messages in thread
From: Harshvardhan Jha @ 2026-01-29 10:23 UTC (permalink / raw)
To: Rafael J. Wysocki, Tomasz Figa
Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin,
Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano
Hi Rafael,
On 15/01/26 1:37 AM, Rafael J. Wysocki wrote:
> On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote:
>> Hi all,
>>
>> On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky
>> <senozhatsky@chromium.org> wrote:
>>> Cc-ing Tomasz
>>>
>>> On (26/01/14 13:28), Sergey Senozhatsky wrote:
>>>> Hi,
>>>>
>>>> On (26/01/13 15:18), Rafael J. Wysocki wrote:
>>>> [..]
>>>>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The
>>>>>>> reapplication seems inevitable. I shall get back to you with these
>>>>>>> details also.
>>>>>> Yes, please, because I have another reason to restore the reverted commit.
>> Is the performance difference the reporter observed an actual
>> regression, or is it just a return to the level before the
>> optimization was merged into stable branches?
> Good question.
>
> Harshvardhan, which one is the case?
The commit first introduced a performance improvement and then with the
revert it returned back to the baseline.
Harshvardhan
>
>> If the latter, shouldn't avoiding regressions be a priority over further optimizing for other
>> users?
>>
>> If there is a really strong desire to reland this optimization, could
>> it at least be applied selectively to the CPUs that it's known to
>> help, or alternatively, made configurable?
> That wouldn't be easy in practice, but I think that it may be
> compensated by reducing the target residency values of the deepest
> idle states on those systems.
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-28 23:53 ` Doug Smythies
@ 2026-01-29 22:27 ` Doug Smythies
2026-01-30 19:28 ` Rafael J. Wysocki
0 siblings, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-01-29 22:27 UTC (permalink / raw)
To: 'Harshvardhan Jha', 'Christian Loehle'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Rafael J. Wysocki', 'Daniel Lezcano',
Doug Smythies
[-- Attachment #1: Type: text/plain, Size: 6908 bytes --]
On 2026.01.28 15:53 Doug Smythies wrote:
> On 2026.01.27 21:07 Doug Smythies wrote:
>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> ... snip ...
>
>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>> from base to revert.
>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>> before and after the workload is usually fine to work with:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>
>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>> states for the test suites before and after tests. The folders with the
>>> name of the test contain two folders good-kernel and bad-kernel
>>> containing two files having the before and after states. Please note
>>> that different machines were used for different test suites due to
>>> compatibility reasons. The jbb test was run using containers.
>
> Please provide the results of the test runs that were done for
> the supplied before and after idle data.
> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
> Is it a test I can run on my test computer?
I see that I have fio installed on my test computer.
>> It is a considerable amount of work to manually extract and summarize the data.
>> I have only done it for the phoronix-sqlite data.
>
> I have done the rest now, see below.
> I have also attached the results, in case the formatting gets screwed up.
>
>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>
>> phoronix-sqlite
>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>> Usage Above Below Above Below
>> state 0 220 0 218 0.00% 99.09%
>> state 1 70212 5213 34602 7.42% 49.28%
>> state 2 30273 5237 1806 17.30% 5.97%
>> state 3 0 0 0 0.00% 0.00%
>> state 4 11824 2120 0 17.93% 0.00%
>>
>> total 112529 12570 36626 43.72% <<< Misses %
>>
>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>> Usage Above Below Above Below
>> state 0 262 0 260 0.00% 99.24%
>> state 1 62751 3985 35588 6.35% 56.71%
>> state 2 24941 7896 1433 31.66% 5.75%
>> state 3 0 0 0 0.00% 0.00%
>> state 4 24489 11543 0 47.14% 0.00%
>>
>> total 112443 23424 37281 53.99% <<< Misses %
>>
>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>
>> I have a template now, and can summarize the other 40 CPU data
>> faster, but I would have to rework the template for the 56 CPU data,
>> and is it a 64 CPU data set at 4 idle states per CPU?
>
> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
> POLL, C1, C1E, C3 (disabled), C6
>
> Good Kernel: Time between samples > 2 hours (estimated)
> Usage Above Below Above Below
> state 0 297550 0 296084 0.00% 99.51%
> state 1 8062854 341043 4962635 4.23% 61.55%
> state 2 56708358 12688379 6252051 22.37% 11.02%
> state 3 0 0 0 0.00% 0.00%
> state 4 54624476 15868752 0 29.05% 0.00%
>
> total 119693238 28898174 11510770 33.76% <<< Misses
>
> Bad Kernel: Time between samples > 2 hours (estimated)
> Usage Above Below Above Below
> state 0 90715 0 75134 0.00% 82.82%
> state 1 8878738 312970 6082180 3.52% 68.50%
> state 2 12048728 2576251 603316 21.38% 5.01%
> state 3 0 0 0 0.00% 0.00%
> state 4 85999424 44723273 0 52.00% 0.00%
>
> total 107017605 47612494 6760630 50.81% <<< Misses
>
> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>
> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>
> fio
> Good Kernel: Time between samples ~ 1 minute (estimated)
> Usage Above Below Above Below
> state 0 3822 0 3818 0.00% 99.90%
> state 1 148640 4406 68956 2.96% 46.39%
> state 2 593455 45344 105675 7.64% 17.81%
> state 3 3209648 807014 0 25.14% 0.00%
>
> total 3955565 856764 178449 26.17% <<< Misses
>
> Bad Kernel: Time between samples ~ 1 minute (estimated)
> Usage Above Below Above Below
> state 0 916 0 756 0.00% 82.53%
> state 1 80230 2028 42791 2.53% 53.34%
> state 2 59231 6888 6791 11.63% 11.47%
> state 3 2455784 564797 0 23.00% 0.00%
>
> total 2596161 573713 50338 24.04% <<< Misses
>
> It is not clear why the number of idle entries differs so much
> between the tests, but there is a bit of a different distribution
> of the workload among the CPUs.
>
> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
> POLL, C1, C1E, C3 (disabled), C6
>
> rds-stress-test
> Good Kernel: Time between samples ~70 Seconds (estimated)
> Usage Above Below Above Below
> state 0 1561 0 1435 0.00% 91.93%
> state 1 13855 899 2410 6.49% 17.39%
> state 2 467998 139254 23679 29.76% 5.06%
> state 3 0 0 0 0.00% 0.00%
> state 4 213132 107417 0 50.40% 0.00%
>
> total 696546 247570 27524 39.49% <<< Misses
>
> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
> Usage Above Below Above Below
> state 0 231 0 231 0.00% 100.00%
> state 1 5413 266 1186 4.91% 21.91%
> state 2 54365 719 3789 1.32% 6.97%
> state 3 0 0 0 0.00% 0.00%
> state 4 267055 148327 0 55.54% 0.00%
>
> total 327064 149312 5206 47.24% <<< Misses
>
> Again, differing numbers of idle entries between tests.
> This time the load distribution between CPUs is more
> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
> In the "Good" case the work is distributed over more CPUs.
> I assume without proof, that the scheduler is deciding not to migrate
> the next bit of work to another CPU in the one case verses the other.
The above is incorrect. The CPUs involved between the "Good"
and "Bad" tests are very similar, mainly 2 CPUs with a little of
a 3rd and 4th. See the attached graph for more detail / clarity.
All of the tests show higher usage of shallower idle states with
the "Good" verses the "Bad", which was the expectation of the
original patch, as has been mentioned a few times in the emails.
My input is to revert the reversion.
... Doug
[-- Attachment #2: usage-v-cpu-v-state.png --]
[-- Type: image/png, Size: 104684 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-14 4:49 ` Sergey Senozhatsky
2026-01-14 5:15 ` Tomasz Figa
@ 2026-01-29 22:47 ` Doug Smythies
1 sibling, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-01-29 22:47 UTC (permalink / raw)
To: 'Sergey Senozhatsky', 'Tomasz Figa'
Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
'Christian Loehle', 'Sasha Levin',
'Greg Kroah-Hartman', linux-pm, stable,
'Daniel Lezcano', Doug Smythies
On 2026.01.13 20:50 Sergey Senozhatsky wrote:
> Cc-ing Tomasz
>
> On (26/01/14 13:28), Sergey Senozhatsky wrote:
>> Hi,
>>
>> On (26/01/13 15:18), Rafael J. Wysocki wrote:
> [..]
>>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The
>>>>> reapplication seems inevitable. I shall get back to you with these
>>>>> details also.
>>>>
>>> > Yes, please, because I have another reason to restore the reverted commit.
>>>
>>> Sergey, did you see a performance regression from 85975daeaa4d
>>> ("cpuidle: menu: Avoid discarding useful information") on any
>>> platforms other than the Jasper Lake it was reported for?
>>
>> Let me try to dig it up. I think I saw regressions on a number of
>> devices:
>>
>> ---
>> cpu family : 6
>> model : 122
>> model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz
>> ---
>> cpu family : 6
>> model : 122
>> model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz
>> ---
>> cpu family : 6
>> model : 156
>> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
>> ---
>> cpu family : 6
>> model : 156
>> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
>> ---
>> cpu family : 6
>> model : 156
>> model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
>>
Those are all 6 watt TDP processors, the same as the earlier emails.
We know from the turbostat data that 6 watts is exceeded in the
test case and that it is likely that power limiting was involved.
I don't think we ever got to the bottom of it for your case.
I still would like to see test results where there is no chance
of throttling being involved via reducing the maximum
CPU frequency.
>>
>> I guess family 6/model 122 is not Jasper Lake?
>>
>> I also saw some where the patch in question seemed to improve the
>> metrics, but regressions are more important, so the revert simply
>> put all of the boards back to the previous state.
In all of our testing we saw some minor regressions in addition to
the improvements. Overall, the patch set it was deemed an improvement.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-29 22:27 ` Doug Smythies
@ 2026-01-30 19:28 ` Rafael J. Wysocki
2026-02-01 19:20 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-01-30 19:28 UTC (permalink / raw)
To: Doug Smythies, Christian Loehle
Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm,
stable, Rafael J. Wysocki, Daniel Lezcano
On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2026.01.28 15:53 Doug Smythies wrote:
> > On 2026.01.27 21:07 Doug Smythies wrote:
> >> On 2026.01.27 07:45 Harshvardhan Jha wrote:
> >>> On 08/12/25 6:17 PM, Christian Loehle wrote:
> >>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
> >>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
> >>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
> >>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
> > ... snip ...
> >
> >>>> It would be nice to get the idle states here, ideally how the states' usage changed
> >>>> from base to revert.
> >>>> The mentioned thread did this and should show how it can be done, but a dump of
> >>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
> >>>> before and after the workload is usually fine to work with:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> >
> >>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> >>> states for the test suites before and after tests. The folders with the
> >>> name of the test contain two folders good-kernel and bad-kernel
> >>> containing two files having the before and after states. Please note
> >>> that different machines were used for different test suites due to
> >>> compatibility reasons. The jbb test was run using containers.
> >
> > Please provide the results of the test runs that were done for
> > the supplied before and after idle data.
> > In particular, what is the "fio" test and it results. Its idle data is not very revealing.
> > Is it a test I can run on my test computer?
>
> I see that I have fio installed on my test computer.
>
> >> It is a considerable amount of work to manually extract and summarize the data.
> >> I have only done it for the phoronix-sqlite data.
> >
> > I have done the rest now, see below.
> > I have also attached the results, in case the formatting gets screwed up.
> >
> >> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
> >> I remember seeing a Linux-pm email about why but couldn't find it just now.
> >> Summary (also attached as a PNG file, in case the formatting gets messed up):
> >> The total idle entries (usage) and time seem low to me, which is why the ???.
> >>
> >> phoronix-sqlite
> >> Good Kernel: Time between samples 4 seconds (estimated and ???)
> >> Usage Above Below Above Below
> >> state 0 220 0 218 0.00% 99.09%
> >> state 1 70212 5213 34602 7.42% 49.28%
> >> state 2 30273 5237 1806 17.30% 5.97%
> >> state 3 0 0 0 0.00% 0.00%
> >> state 4 11824 2120 0 17.93% 0.00%
> >>
> >> total 112529 12570 36626 43.72% <<< Misses %
> >>
> >> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
> >> Usage Above Below Above Below
> >> state 0 262 0 260 0.00% 99.24%
> >> state 1 62751 3985 35588 6.35% 56.71%
> >> state 2 24941 7896 1433 31.66% 5.75%
> >> state 3 0 0 0 0.00% 0.00%
> >> state 4 24489 11543 0 47.14% 0.00%
> >>
> >> total 112443 23424 37281 53.99% <<< Misses %
> >>
> >> Observe 2X use of idle state 4 for the "Bad Kernel"
> >>
> >> I have a template now, and can summarize the other 40 CPU data
> >> faster, but I would have to rework the template for the 56 CPU data,
> >> and is it a 64 CPU data set at 4 idle states per CPU?
> >
> > jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
> > POLL, C1, C1E, C3 (disabled), C6
> >
> > Good Kernel: Time between samples > 2 hours (estimated)
> > Usage Above Below Above Below
> > state 0 297550 0 296084 0.00% 99.51%
> > state 1 8062854 341043 4962635 4.23% 61.55%
> > state 2 56708358 12688379 6252051 22.37% 11.02%
> > state 3 0 0 0 0.00% 0.00%
> > state 4 54624476 15868752 0 29.05% 0.00%
> >
> > total 119693238 28898174 11510770 33.76% <<< Misses
> >
> > Bad Kernel: Time between samples > 2 hours (estimated)
> > Usage Above Below Above Below
> > state 0 90715 0 75134 0.00% 82.82%
> > state 1 8878738 312970 6082180 3.52% 68.50%
> > state 2 12048728 2576251 603316 21.38% 5.01%
> > state 3 0 0 0 0.00% 0.00%
> > state 4 85999424 44723273 0 52.00% 0.00%
> >
> > total 107017605 47612494 6760630 50.81% <<< Misses
> >
> > As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
> >
> > fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
> >
> > fio
> > Good Kernel: Time between samples ~ 1 minute (estimated)
> > Usage Above Below Above Below
> > state 0 3822 0 3818 0.00% 99.90%
> > state 1 148640 4406 68956 2.96% 46.39%
> > state 2 593455 45344 105675 7.64% 17.81%
> > state 3 3209648 807014 0 25.14% 0.00%
> >
> > total 3955565 856764 178449 26.17% <<< Misses
> >
> > Bad Kernel: Time between samples ~ 1 minute (estimated)
> > Usage Above Below Above Below
> > state 0 916 0 756 0.00% 82.53%
> > state 1 80230 2028 42791 2.53% 53.34%
> > state 2 59231 6888 6791 11.63% 11.47%
> > state 3 2455784 564797 0 23.00% 0.00%
> >
> > total 2596161 573713 50338 24.04% <<< Misses
> >
> > It is not clear why the number of idle entries differs so much
> > between the tests, but there is a bit of a different distribution
> > of the workload among the CPUs.
> >
> > rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
> > POLL, C1, C1E, C3 (disabled), C6
> >
> > rds-stress-test
> > Good Kernel: Time between samples ~70 Seconds (estimated)
> > Usage Above Below Above Below
> > state 0 1561 0 1435 0.00% 91.93%
> > state 1 13855 899 2410 6.49% 17.39%
> > state 2 467998 139254 23679 29.76% 5.06%
> > state 3 0 0 0 0.00% 0.00%
> > state 4 213132 107417 0 50.40% 0.00%
> >
> > total 696546 247570 27524 39.49% <<< Misses
> >
> > Bad Kernel: Time between samples ~ 70 Seconds (estimated)
> > Usage Above Below Above Below
> > state 0 231 0 231 0.00% 100.00%
> > state 1 5413 266 1186 4.91% 21.91%
> > state 2 54365 719 3789 1.32% 6.97%
> > state 3 0 0 0 0.00% 0.00%
> > state 4 267055 148327 0 55.54% 0.00%
> >
> > total 327064 149312 5206 47.24% <<< Misses
> >
> > Again, differing numbers of idle entries between tests.
> > This time the load distribution between CPUs is more
> > obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
> > In the "Good" case the work is distributed over more CPUs.
> > I assume without proof, that the scheduler is deciding not to migrate
> > the next bit of work to another CPU in the one case verses the other.
>
> The above is incorrect. The CPUs involved between the "Good"
> and "Bad" tests are very similar, mainly 2 CPUs with a little of
> a 3rd and 4th. See the attached graph for more detail / clarity.
>
> All of the tests show higher usage of shallower idle states with
> the "Good" verses the "Bad", which was the expectation of the
> original patch, as has been mentioned a few times in the emails.
>
> My input is to revert the reversion.
OK, noted, thanks!
Christian, what do you think?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-01-30 19:28 ` Rafael J. Wysocki
@ 2026-02-01 19:20 ` Christian Loehle
2026-02-02 17:31 ` Harshvardhan Jha
0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-01 19:20 UTC (permalink / raw)
To: Rafael J. Wysocki, Doug Smythies
Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm,
stable, Daniel Lezcano, Sergey Senozhatsky
[-- Attachment #1: Type: text/plain, Size: 10013 bytes --]
On 1/30/26 19:28, Rafael J. Wysocki wrote:
> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>
>> On 2026.01.28 15:53 Doug Smythies wrote:
>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>> ... snip ...
>>>
>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>> from base to revert.
>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>> before and after the workload is usually fine to work with:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>
>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>> states for the test suites before and after tests. The folders with the
>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>> containing two files having the before and after states. Please note
>>>>> that different machines were used for different test suites due to
>>>>> compatibility reasons. The jbb test was run using containers.
>>>
>>> Please provide the results of the test runs that were done for
>>> the supplied before and after idle data.
>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>> Is it a test I can run on my test computer?
>>
>> I see that I have fio installed on my test computer.
>>
>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>> I have only done it for the phoronix-sqlite data.
>>>
>>> I have done the rest now, see below.
>>> I have also attached the results, in case the formatting gets screwed up.
>>>
>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>
>>>> phoronix-sqlite
>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>> Usage Above Below Above Below
>>>> state 0 220 0 218 0.00% 99.09%
>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>
>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>
>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>> Usage Above Below Above Below
>>>> state 0 262 0 260 0.00% 99.24%
>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>
>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>
>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>
>>>> I have a template now, and can summarize the other 40 CPU data
>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>
>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>> POLL, C1, C1E, C3 (disabled), C6
>>>
>>> Good Kernel: Time between samples > 2 hours (estimated)
>>> Usage Above Below Above Below
>>> state 0 297550 0 296084 0.00% 99.51%
>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>> state 3 0 0 0 0.00% 0.00%
>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>
>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>
>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>> Usage Above Below Above Below
>>> state 0 90715 0 75134 0.00% 82.82%
>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>> state 3 0 0 0 0.00% 0.00%
>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>
>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>
>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>
>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>
>>> fio
>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>> Usage Above Below Above Below
>>> state 0 3822 0 3818 0.00% 99.90%
>>> state 1 148640 4406 68956 2.96% 46.39%
>>> state 2 593455 45344 105675 7.64% 17.81%
>>> state 3 3209648 807014 0 25.14% 0.00%
>>>
>>> total 3955565 856764 178449 26.17% <<< Misses
>>>
>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>> Usage Above Below Above Below
>>> state 0 916 0 756 0.00% 82.53%
>>> state 1 80230 2028 42791 2.53% 53.34%
>>> state 2 59231 6888 6791 11.63% 11.47%
>>> state 3 2455784 564797 0 23.00% 0.00%
>>>
>>> total 2596161 573713 50338 24.04% <<< Misses
>>>
>>> It is not clear why the number of idle entries differs so much
>>> between the tests, but there is a bit of a different distribution
>>> of the workload among the CPUs.
>>>
>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>> POLL, C1, C1E, C3 (disabled), C6
>>>
>>> rds-stress-test
>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>> Usage Above Below Above Below
>>> state 0 1561 0 1435 0.00% 91.93%
>>> state 1 13855 899 2410 6.49% 17.39%
>>> state 2 467998 139254 23679 29.76% 5.06%
>>> state 3 0 0 0 0.00% 0.00%
>>> state 4 213132 107417 0 50.40% 0.00%
>>>
>>> total 696546 247570 27524 39.49% <<< Misses
>>>
>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>> Usage Above Below Above Below
>>> state 0 231 0 231 0.00% 100.00%
>>> state 1 5413 266 1186 4.91% 21.91%
>>> state 2 54365 719 3789 1.32% 6.97%
>>> state 3 0 0 0 0.00% 0.00%
>>> state 4 267055 148327 0 55.54% 0.00%
>>>
>>> total 327064 149312 5206 47.24% <<< Misses
>>>
>>> Again, differing numbers of idle entries between tests.
>>> This time the load distribution between CPUs is more
>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>> In the "Good" case the work is distributed over more CPUs.
>>> I assume without proof, that the scheduler is deciding not to migrate
>>> the next bit of work to another CPU in the one case verses the other.
>>
>> The above is incorrect. The CPUs involved between the "Good"
>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>
>> All of the tests show higher usage of shallower idle states with
>> the "Good" verses the "Bad", which was the expectation of the
>> original patch, as has been mentioned a few times in the emails.
>>
>> My input is to revert the reversion.
>
> OK, noted, thanks!
>
> Christian, what do you think?
I've attached readable diffs of the values provided the tldr is:
+--------------------+-----------+-----------+
| Workload | Δ above % | Δ below % |
+--------------------+-----------+-----------+
| fio | -10.11 | +2.36 |
| rds-stress-test | -0.44 | +2.57 |
| jbb | -20.35 | +3.30 |
| phoronix-sqlite | -9.66 | -0.61 |
+--------------------+-----------+-----------+
I think the overall trend however is clear, the commit
85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
improved menu on many systems and workloads, I'd dare to say most.
Even on the reported regression introduced by it, the cpuidle governor
performed better on paper, system metrics regressed because other
CPUs' P-states weren't available due to being in a shallower state.
https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/
(+CC Sergey)
It could be argued that this is a limitation of a per-CPU cpuidle
governor and a more holistic approach would be needed for that platform
(i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
skew towards deeper cpuidle states).
I also think that the change made sense, for small residency values
with a bit of random noise mixed in, performing the same statistical
test doesn't seem sensible, the short intervals will look noisier.
So options are:
1. Revert revert on mainline+stable
2. Revert revert on mainline only
3. Keep revert, miss out on the improvement for many.
4. Revert only when we have a good solution for the platforms like
Sergey's.
I'd lean towards 2 because 4 won't be easy, unless of course a minor
hack like playing with the deep idle state residency values would
be enough to mitigate.
[-- Attachment #2: diff_rds_stress-test.txt --]
[-- Type: text/plain, Size: 3573 bytes --]
bad-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 231
usage : 231
time : 4420
avg_time_us_per_usage : 19.134199134199132
above_pct_over_usage : 0.0
below_pct_over_usage : 100.0
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 266
below : 1186
usage : 5413
time : 264365
avg_time_us_per_usage : 48.838906336597084
above_pct_over_usage : 4.914095695547755
below_pct_over_usage : 21.91021614631443
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 719
below : 3789
usage : 54365
time : 11979213
avg_time_us_per_usage : 220.34788926699164
above_pct_over_usage : 1.3225420767037617
below_pct_over_usage : 6.969557619792146
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 148327
below : 0
usage : 267055
time : 3728423124
avg_time_us_per_usage : 13961.255636479376
above_pct_over_usage : 55.54174233772069
below_pct_over_usage : 0.0
OVERALL
above : 149312
below : 5206
usage : 327064
time : 3740671122
avg_time_us_per_usage : 11437.122771078444
above_pct_over_usage : 45.6522270870533
below_pct_over_usage : 1.5917373969620625
good-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 1435
usage : 1561
time : 23997
avg_time_us_per_usage : 15.372837924407431
above_pct_over_usage : 0.0
below_pct_over_usage : 91.92825112107623
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 899
below : 2410
usage : 13855
time : 580895
avg_time_us_per_usage : 41.9267412486467
above_pct_over_usage : 6.4886322627210395
below_pct_over_usage : 17.39444243955251
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 139254
below : 23679
usage : 467998
time : 71044825
avg_time_us_per_usage : 151.80583036679644
above_pct_over_usage : 29.755255364339163
below_pct_over_usage : 5.059637006995756
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 107417
below : 0
usage : 213132
time : 3670745602
avg_time_us_per_usage : 17222.87409680386
above_pct_over_usage : 50.399283073400525
below_pct_over_usage : 0.0
OVERALL
above : 247570
below : 27524
usage : 696546
time : 3742395319
avg_time_us_per_usage : 5372.789907629934
above_pct_over_usage : 35.54251980486573
below_pct_over_usage : 3.9514978192395045
[-- Attachment #3: diff_fio.txt --]
[-- Type: text/plain, Size: 3087 bytes --]
bad-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 756
usage : 916
time : 13304
avg_time_us_per_usage : 14.524017467248909
above_pct_over_usage : 0.0
below_pct_over_usage : 82.53275109170306
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 2028
below : 42791
usage : 80230
time : 14121043
avg_time_us_per_usage : 176.00701732519008
above_pct_over_usage : 2.527732768291163
below_pct_over_usage : 53.33541069425402
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 6888
below : 6791
usage : 59231
time : 9518489
avg_time_us_per_usage : 160.70113622933937
above_pct_over_usage : 11.629045601121035
below_pct_over_usage : 11.465280005402576
index3 C6 latency=92 residency=276 desc='MWAIT 0x20'
above : 564797
below : 0
usage : 2455784
time : 3656232645
avg_time_us_per_usage : 1488.8250127047004
above_pct_over_usage : 22.99864320314816
below_pct_over_usage : 0.0
OVERALL
above : 573713
below : 50338
usage : 2596161
time : 3679885481
avg_time_us_per_usage : 1417.4334646426012
above_pct_over_usage : 22.09851392113201
below_pct_over_usage : 1.9389398423287307
good-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 3818
usage : 3822
time : 84618
avg_time_us_per_usage : 22.139717425431712
above_pct_over_usage : 0.0
below_pct_over_usage : 99.8953427524856
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 4406
below : 68956
usage : 148640
time : 22527422
avg_time_us_per_usage : 151.55692949407967
above_pct_over_usage : 2.9642088266953714
below_pct_over_usage : 46.39128094725511
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 45344
below : 105675
usage : 593455
time : 121006522
avg_time_us_per_usage : 203.9017650874961
above_pct_over_usage : 7.640680422272961
below_pct_over_usage : 17.806741875963638
index3 C6 latency=92 residency=276 desc='MWAIT 0x20'
above : 807014
below : 0
usage : 3209648
time : 3510698554
avg_time_us_per_usage : 1093.7955046783945
above_pct_over_usage : 25.14338020867086
below_pct_over_usage : 0.0
OVERALL
above : 856764
below : 178449
usage : 3955565
time : 3654317116
avg_time_us_per_usage : 923.8420089165518
above_pct_over_usage : 21.65971232933854
below_pct_over_usage : 4.511340352136799
[-- Attachment #4: diff_jbb.txt --]
[-- Type: text/plain, Size: 3681 bytes --]
jbb:
bad-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 75134
usage : 90715
time : 1563844
avg_time_us_per_usage : 17.239089455988534
above_pct_over_usage : 0.0
below_pct_over_usage : 82.82422973047456
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 312970
below : 6082180
usage : 8878738
time : 1015443901
avg_time_us_per_usage : 114.3680443099008
above_pct_over_usage : 3.5249378909480154
below_pct_over_usage : 68.50275343185034
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 2576251
below : 603316
usage : 12048728
time : 1478419300
avg_time_us_per_usage : 122.70335092633844
above_pct_over_usage : 21.38193342898935
below_pct_over_usage : 5.007300355688999
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 44723273
below : 0
usage : 85999424
time : 300384703093
avg_time_us_per_usage : 3492.868778900194
above_pct_over_usage : 52.0041541208462
below_pct_over_usage : 0.0
OVERALL
above : 47612494
below : 6760630
usage : 107017605
time : 302880130138
avg_time_us_per_usage : 2830.1897630581434
above_pct_over_usage : 44.490337828061094
below_pct_over_usage : 6.317306390850365
good-kernel
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 296084
usage : 297550
time : 5996079
avg_time_us_per_usage : 20.151500588136447
above_pct_over_usage : 0.0
below_pct_over_usage : 99.50730969584944
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 341043
below : 4962635
usage : 8062854
time : 994358306
avg_time_us_per_usage : 123.32584789455446
above_pct_over_usage : 4.229804979725541
below_pct_over_usage : 61.549359569204654
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 12688379
below : 6252051
usage : 56708358
time : 9902266327
avg_time_us_per_usage : 174.61740519801333
above_pct_over_usage : 22.37479526386569
below_pct_over_usage : 11.024919818697624
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 15868752
below : 0
usage : 54624476
time : 276236627330
avg_time_us_per_usage : 5057.011939666021
above_pct_over_usage : 29.050625584033064
below_pct_over_usage : 0.0
OVERALL
above : 28898174
below : 11510770
usage : 119693238
time : 287139248042
avg_time_us_per_usage : 2398.9596475115827
above_pct_over_usage : 24.143530982092738
below_pct_over_usage : 9.616892476415417
[-- Attachment #5: diff_phoronix-sqlite.txt --]
[-- Type: text/plain, Size: 3571 bytes --]
bad-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 260
usage : 262
time : 6634
avg_time_us_per_usage : 25.3206106870229
above_pct_over_usage : 0.0
below_pct_over_usage : 99.23664122137404
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 3985
below : 35588
usage : 62751
time : 2972131
avg_time_us_per_usage : 47.36388264728849
above_pct_over_usage : 6.35049640643177
below_pct_over_usage : 56.71304042963459
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 7896
below : 1433
usage : 24941
time : 2185241
avg_time_us_per_usage : 87.61641473878353
above_pct_over_usage : 31.65871456637665
below_pct_over_usage : 5.745559520468305
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 11543
below : 0
usage : 24489
time : 123920172
avg_time_us_per_usage : 5060.238147739801
above_pct_over_usage : 47.135448568745154
below_pct_over_usage : 0.0
OVERALL
above : 23424
below : 37281
usage : 112443
time : 129084178
avg_time_us_per_usage : 1147.996567149578
above_pct_over_usage : 20.831888156666043
below_pct_over_usage : 33.155465435820815
good-kernel:
index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE'
above : 0
below : 218
usage : 220
time : 5595
avg_time_us_per_usage : 25.431818181818183
above_pct_over_usage : 0.0
below_pct_over_usage : 99.0909090909091
index1 C1 latency=2 residency=2 desc='MWAIT 0x00'
above : 5213
below : 34602
usage : 70212
time : 3288043
avg_time_us_per_usage : 46.83021420839742
above_pct_over_usage : 7.424656753831254
below_pct_over_usage : 49.28217398735259
index2 C1E latency=10 residency=20 desc='MWAIT 0x01'
above : 5237
below : 1806
usage : 30273
time : 4825950
avg_time_us_per_usage : 159.41432960063423
above_pct_over_usage : 17.299243550358405
below_pct_over_usage : 5.965712020612427
index3 C3 latency=40 residency=100 desc='MWAIT 0x10'
above : 0
below : 0
usage : 0
time : 0
avg_time_us_per_usage : nan
above_pct_over_usage : nan
below_pct_over_usage : nan
index4 C6 latency=133 residency=400 desc='MWAIT 0x20'
above : 2120
below : 0
usage : 11824
time : 124762813
avg_time_us_per_usage : 10551.658744925575
above_pct_over_usage : 17.929634641407308
below_pct_over_usage : 0.0
OVERALL
above : 12570
below : 36626
usage : 112529
time : 132882401
avg_time_us_per_usage : 1180.8724950901546
above_pct_over_usage : 11.170453838566058
below_pct_over_usage : 32.54805427934132
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-01 19:20 ` Christian Loehle
@ 2026-02-02 17:31 ` Harshvardhan Jha
2026-02-03 9:07 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-02 17:31 UTC (permalink / raw)
To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 02/02/26 12:50 AM, Christian Loehle wrote:
> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>> ... snip ...
>>>>
>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>> from base to revert.
>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>> before and after the workload is usually fine to work with:
>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>> states for the test suites before and after tests. The folders with the
>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>> containing two files having the before and after states. Please note
>>>>>> that different machines were used for different test suites due to
>>>>>> compatibility reasons. The jbb test was run using containers.
>>>> Please provide the results of the test runs that were done for
>>>> the supplied before and after idle data.
>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>> Is it a test I can run on my test computer?
>>> I see that I have fio installed on my test computer.
>>>
>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>> I have only done it for the phoronix-sqlite data.
>>>> I have done the rest now, see below.
>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>
>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>
>>>>> phoronix-sqlite
>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>> Usage Above Below Above Below
>>>>> state 0 220 0 218 0.00% 99.09%
>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>
>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>
>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>> Usage Above Below Above Below
>>>>> state 0 262 0 260 0.00% 99.24%
>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>
>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>
>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>
>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>
>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 297550 0 296084 0.00% 99.51%
>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>
>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>
>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 90715 0 75134 0.00% 82.82%
>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>
>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>
>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>
>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>
>>>> fio
>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 3822 0 3818 0.00% 99.90%
>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>
>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>
>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 916 0 756 0.00% 82.53%
>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>
>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>
>>>> It is not clear why the number of idle entries differs so much
>>>> between the tests, but there is a bit of a different distribution
>>>> of the workload among the CPUs.
>>>>
>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>
>>>> rds-stress-test
>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 1561 0 1435 0.00% 91.93%
>>>> state 1 13855 899 2410 6.49% 17.39%
>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>
>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>
>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>> Usage Above Below Above Below
>>>> state 0 231 0 231 0.00% 100.00%
>>>> state 1 5413 266 1186 4.91% 21.91%
>>>> state 2 54365 719 3789 1.32% 6.97%
>>>> state 3 0 0 0 0.00% 0.00%
>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>
>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>
>>>> Again, differing numbers of idle entries between tests.
>>>> This time the load distribution between CPUs is more
>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>> In the "Good" case the work is distributed over more CPUs.
>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>> the next bit of work to another CPU in the one case verses the other.
>>> The above is incorrect. The CPUs involved between the "Good"
>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>
>>> All of the tests show higher usage of shallower idle states with
>>> the "Good" verses the "Bad", which was the expectation of the
>>> original patch, as has been mentioned a few times in the emails.
>>>
>>> My input is to revert the reversion.
>> OK, noted, thanks!
>>
>> Christian, what do you think?
> I've attached readable diffs of the values provided the tldr is:
>
> +--------------------+-----------+-----------+
> | Workload | Δ above % | Δ below % |
> +--------------------+-----------+-----------+
> | fio | -10.11 | +2.36 |
> | rds-stress-test | -0.44 | +2.57 |
> | jbb | -20.35 | +3.30 |
> | phoronix-sqlite | -9.66 | -0.61 |
> +--------------------+-----------+-----------+
>
> I think the overall trend however is clear, the commit
> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
> improved menu on many systems and workloads, I'd dare to say most.
>
> Even on the reported regression introduced by it, the cpuidle governor
> performed better on paper, system metrics regressed because other
> CPUs' P-states weren't available due to being in a shallower state.
> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
> (+CC Sergey)
> It could be argued that this is a limitation of a per-CPU cpuidle
> governor and a more holistic approach would be needed for that platform
> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
> skew towards deeper cpuidle states).
>
> I also think that the change made sense, for small residency values
> with a bit of random noise mixed in, performing the same statistical
> test doesn't seem sensible, the short intervals will look noisier.
>
> So options are:
> 1. Revert revert on mainline+stable
> 2. Revert revert on mainline only
> 3. Keep revert, miss out on the improvement for many.
> 4. Revert only when we have a good solution for the platforms like
> Sergey's.
>
> I'd lean towards 2 because 4 won't be easy, unless of course a minor
> hack like playing with the deep idle state residency values would
> be enough to mitigate.
Wouldn't it be better to choose option 1 as reverting the revert has
even more pronounced improvements on older kernels? I've tested this on
6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
Since the revert has optimizations present only in Jasper Lake Systems
which is new, isn't reverting the revert more relevant on stable
kernels? It's more likely that older hardware runs older kernels than
newer hardware although not always necessary imo.
Harshvardhan
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-02 17:31 ` Harshvardhan Jha
@ 2026-02-03 9:07 ` Christian Loehle
2026-02-03 9:16 ` Harshvardhan Jha
0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-03 9:07 UTC (permalink / raw)
To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 2/2/26 17:31, Harshvardhan Jha wrote:
>
> On 02/02/26 12:50 AM, Christian Loehle wrote:
>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>> ... snip ...
>>>>>
>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>> from base to revert.
>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>> containing two files having the before and after states. Please note
>>>>>>> that different machines were used for different test suites due to
>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>> Please provide the results of the test runs that were done for
>>>>> the supplied before and after idle data.
>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>> Is it a test I can run on my test computer?
>>>> I see that I have fio installed on my test computer.
>>>>
>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>> I have only done it for the phoronix-sqlite data.
>>>>> I have done the rest now, see below.
>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>
>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>>
>>>>>> phoronix-sqlite
>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 220 0 218 0.00% 99.09%
>>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>>
>>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>>
>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 262 0 260 0.00% 99.24%
>>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>>
>>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>>
>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>
>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>
>>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 297550 0 296084 0.00% 99.51%
>>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>>
>>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>>
>>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 90715 0 75134 0.00% 82.82%
>>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>>
>>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>>
>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>
>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>
>>>>> fio
>>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 3822 0 3818 0.00% 99.90%
>>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>>
>>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>>
>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 916 0 756 0.00% 82.53%
>>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>>
>>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>>
>>>>> It is not clear why the number of idle entries differs so much
>>>>> between the tests, but there is a bit of a different distribution
>>>>> of the workload among the CPUs.
>>>>>
>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>
>>>>> rds-stress-test
>>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 1561 0 1435 0.00% 91.93%
>>>>> state 1 13855 899 2410 6.49% 17.39%
>>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>>
>>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>>
>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>> Usage Above Below Above Below
>>>>> state 0 231 0 231 0.00% 100.00%
>>>>> state 1 5413 266 1186 4.91% 21.91%
>>>>> state 2 54365 719 3789 1.32% 6.97%
>>>>> state 3 0 0 0 0.00% 0.00%
>>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>>
>>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>>
>>>>> Again, differing numbers of idle entries between tests.
>>>>> This time the load distribution between CPUs is more
>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>> the next bit of work to another CPU in the one case verses the other.
>>>> The above is incorrect. The CPUs involved between the "Good"
>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>
>>>> All of the tests show higher usage of shallower idle states with
>>>> the "Good" verses the "Bad", which was the expectation of the
>>>> original patch, as has been mentioned a few times in the emails.
>>>>
>>>> My input is to revert the reversion.
>>> OK, noted, thanks!
>>>
>>> Christian, what do you think?
>> I've attached readable diffs of the values provided the tldr is:
>>
>> +--------------------+-----------+-----------+
>> | Workload | Δ above % | Δ below % |
>> +--------------------+-----------+-----------+
>> | fio | -10.11 | +2.36 |
>> | rds-stress-test | -0.44 | +2.57 |
>> | jbb | -20.35 | +3.30 |
>> | phoronix-sqlite | -9.66 | -0.61 |
>> +--------------------+-----------+-----------+
>>
>> I think the overall trend however is clear, the commit
>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>> improved menu on many systems and workloads, I'd dare to say most.
>>
>> Even on the reported regression introduced by it, the cpuidle governor
>> performed better on paper, system metrics regressed because other
>> CPUs' P-states weren't available due to being in a shallower state.
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
>> (+CC Sergey)
>> It could be argued that this is a limitation of a per-CPU cpuidle
>> governor and a more holistic approach would be needed for that platform
>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>> skew towards deeper cpuidle states).
>>
>> I also think that the change made sense, for small residency values
>> with a bit of random noise mixed in, performing the same statistical
>> test doesn't seem sensible, the short intervals will look noisier.
>>
>> So options are:
>> 1. Revert revert on mainline+stable
>> 2. Revert revert on mainline only
>> 3. Keep revert, miss out on the improvement for many.
>> 4. Revert only when we have a good solution for the platforms like
>> Sergey's.
>>
>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>> hack like playing with the deep idle state residency values would
>> be enough to mitigate.
>
> Wouldn't it be better to choose option 1 as reverting the revert has
> even more pronounced improvements on older kernels? I've tested this on
> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
> Since the revert has optimizations present only in Jasper Lake Systems
> which is new, isn't reverting the revert more relevant on stable
> kernels? It's more likely that older hardware runs older kernels than
> newer hardware although not always necessary imo.
>
FWIW Jasper Lake seems to be supported from 5.6 on, see
b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 9:07 ` Christian Loehle
@ 2026-02-03 9:16 ` Harshvardhan Jha
2026-02-03 9:31 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-03 9:16 UTC (permalink / raw)
To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 03/02/26 2:37 PM, Christian Loehle wrote:
> On 2/2/26 17:31, Harshvardhan Jha wrote:
>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>> ... snip ...
>>>>>>
>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>> from base to revert.
>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>> that different machines were used for different test suites due to
>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>> Please provide the results of the test runs that were done for
>>>>>> the supplied before and after idle data.
>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>> Is it a test I can run on my test computer?
>>>>> I see that I have fio installed on my test computer.
>>>>>
>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>> I have done the rest now, see below.
>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>
>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>>>
>>>>>>> phoronix-sqlite
>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 220 0 218 0.00% 99.09%
>>>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>>>
>>>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>>>
>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 262 0 260 0.00% 99.24%
>>>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>>>
>>>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>>>
>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>
>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>
>>>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 297550 0 296084 0.00% 99.51%
>>>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>>>
>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>>>
>>>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 90715 0 75134 0.00% 82.82%
>>>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>>>
>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>>>
>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>
>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>
>>>>>> fio
>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 3822 0 3818 0.00% 99.90%
>>>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>>>
>>>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>>>
>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 916 0 756 0.00% 82.53%
>>>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>>>
>>>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>>>
>>>>>> It is not clear why the number of idle entries differs so much
>>>>>> between the tests, but there is a bit of a different distribution
>>>>>> of the workload among the CPUs.
>>>>>>
>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>
>>>>>> rds-stress-test
>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 1561 0 1435 0.00% 91.93%
>>>>>> state 1 13855 899 2410 6.49% 17.39%
>>>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>>>
>>>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>>>
>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>> Usage Above Below Above Below
>>>>>> state 0 231 0 231 0.00% 100.00%
>>>>>> state 1 5413 266 1186 4.91% 21.91%
>>>>>> state 2 54365 719 3789 1.32% 6.97%
>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>>>
>>>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>>>
>>>>>> Again, differing numbers of idle entries between tests.
>>>>>> This time the load distribution between CPUs is more
>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>
>>>>> All of the tests show higher usage of shallower idle states with
>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>
>>>>> My input is to revert the reversion.
>>>> OK, noted, thanks!
>>>>
>>>> Christian, what do you think?
>>> I've attached readable diffs of the values provided the tldr is:
>>>
>>> +--------------------+-----------+-----------+
>>> | Workload | Δ above % | Δ below % |
>>> +--------------------+-----------+-----------+
>>> | fio | -10.11 | +2.36 |
>>> | rds-stress-test | -0.44 | +2.57 |
>>> | jbb | -20.35 | +3.30 |
>>> | phoronix-sqlite | -9.66 | -0.61 |
>>> +--------------------+-----------+-----------+
>>>
>>> I think the overall trend however is clear, the commit
>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>> improved menu on many systems and workloads, I'd dare to say most.
>>>
>>> Even on the reported regression introduced by it, the cpuidle governor
>>> performed better on paper, system metrics regressed because other
>>> CPUs' P-states weren't available due to being in a shallower state.
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
>>> (+CC Sergey)
>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>> governor and a more holistic approach would be needed for that platform
>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>> skew towards deeper cpuidle states).
>>>
>>> I also think that the change made sense, for small residency values
>>> with a bit of random noise mixed in, performing the same statistical
>>> test doesn't seem sensible, the short intervals will look noisier.
>>>
>>> So options are:
>>> 1. Revert revert on mainline+stable
>>> 2. Revert revert on mainline only
>>> 3. Keep revert, miss out on the improvement for many.
>>> 4. Revert only when we have a good solution for the platforms like
>>> Sergey's.
>>>
>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>> hack like playing with the deep idle state residency values would
>>> be enough to mitigate.
>> Wouldn't it be better to choose option 1 as reverting the revert has
>> even more pronounced improvements on older kernels? I've tested this on
>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>> Since the revert has optimizations present only in Jasper Lake Systems
>> which is new, isn't reverting the revert more relevant on stable
>> kernels? It's more likely that older hardware runs older kernels than
>> newer hardware although not always necessary imo.
>>
> FWIW Jasper Lake seems to be supported from 5.6 on, see
> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
Oh I see, but shouldn't avoiding regressions on established platforms be
a priority over further optimizing for specific newer platforms like
Jasper Lake?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 9:16 ` Harshvardhan Jha
@ 2026-02-03 9:31 ` Christian Loehle
2026-02-03 10:22 ` Harshvardhan Jha
2026-02-03 16:45 ` Rafael J. Wysocki
0 siblings, 2 replies; 44+ messages in thread
From: Christian Loehle @ 2026-02-03 9:31 UTC (permalink / raw)
To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 2/3/26 09:16, Harshvardhan Jha wrote:
>
> On 03/02/26 2:37 PM, Christian Loehle wrote:
>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>> ... snip ...
>>>>>>>
>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>> from base to revert.
>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>> Please provide the results of the test runs that were done for
>>>>>>> the supplied before and after idle data.
>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>> Is it a test I can run on my test computer?
>>>>>> I see that I have fio installed on my test computer.
>>>>>>
>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>> I have done the rest now, see below.
>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>
>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>>>>
>>>>>>>> phoronix-sqlite
>>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 220 0 218 0.00% 99.09%
>>>>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>>>>
>>>>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>>>>
>>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 262 0 260 0.00% 99.24%
>>>>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>>>>
>>>>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>>>>
>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>
>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>
>>>>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 297550 0 296084 0.00% 99.51%
>>>>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>>>>
>>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>>>>
>>>>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 90715 0 75134 0.00% 82.82%
>>>>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>>>>
>>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>>>>
>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>
>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>
>>>>>>> fio
>>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 3822 0 3818 0.00% 99.90%
>>>>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>>>>
>>>>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>>>>
>>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 916 0 756 0.00% 82.53%
>>>>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>>>>
>>>>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>>>>
>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>> of the workload among the CPUs.
>>>>>>>
>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>
>>>>>>> rds-stress-test
>>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 1561 0 1435 0.00% 91.93%
>>>>>>> state 1 13855 899 2410 6.49% 17.39%
>>>>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>>>>
>>>>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>>>>
>>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>> Usage Above Below Above Below
>>>>>>> state 0 231 0 231 0.00% 100.00%
>>>>>>> state 1 5413 266 1186 4.91% 21.91%
>>>>>>> state 2 54365 719 3789 1.32% 6.97%
>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>>>>
>>>>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>>>>
>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>> This time the load distribution between CPUs is more
>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>
>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>
>>>>>> My input is to revert the reversion.
>>>>> OK, noted, thanks!
>>>>>
>>>>> Christian, what do you think?
>>>> I've attached readable diffs of the values provided the tldr is:
>>>>
>>>> +--------------------+-----------+-----------+
>>>> | Workload | Δ above % | Δ below % |
>>>> +--------------------+-----------+-----------+
>>>> | fio | -10.11 | +2.36 |
>>>> | rds-stress-test | -0.44 | +2.57 |
>>>> | jbb | -20.35 | +3.30 |
>>>> | phoronix-sqlite | -9.66 | -0.61 |
>>>> +--------------------+-----------+-----------+
>>>>
>>>> I think the overall trend however is clear, the commit
>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>
>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>> performed better on paper, system metrics regressed because other
>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
>>>> (+CC Sergey)
>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>> governor and a more holistic approach would be needed for that platform
>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>> skew towards deeper cpuidle states).
>>>>
>>>> I also think that the change made sense, for small residency values
>>>> with a bit of random noise mixed in, performing the same statistical
>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>
>>>> So options are:
>>>> 1. Revert revert on mainline+stable
>>>> 2. Revert revert on mainline only
>>>> 3. Keep revert, miss out on the improvement for many.
>>>> 4. Revert only when we have a good solution for the platforms like
>>>> Sergey's.
>>>>
>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>> hack like playing with the deep idle state residency values would
>>>> be enough to mitigate.
>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>> even more pronounced improvements on older kernels? I've tested this on
>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>> Since the revert has optimizations present only in Jasper Lake Systems
>>> which is new, isn't reverting the revert more relevant on stable
>>> kernels? It's more likely that older hardware runs older kernels than
>>> newer hardware although not always necessary imo.
>>>
>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>
> Oh I see, but shouldn't avoiding regressions on established platforms be
> a priority over further optimizing for specific newer platforms like
> Jasper Lake?
>
Well avoiding regressions on established platforms is what lead to
10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
being applied and backported.
The expectation for stable is that we avoid regressions and potentially
miss out on improvements. If you want the latest greatest performance you
should probably run a latest greatest kernel.
The original
85975daeaa4d cpuidle: menu: Avoid discarding useful information
was seen as a fix and overall improvement, that's why it was backported,
but Sergey's regression report contradicted that.
What is "established" and "newer" for a stable kernel is quite handwavy
IMO but even here Sergey's regression report is a clear data point...
Your report is only restoring 5.15 (and others) performance to 5.15
upstream-ish levels which is within the expectations of running a stable
kernel. No doubt it's frustrating either way!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 9:31 ` Christian Loehle
@ 2026-02-03 10:22 ` Harshvardhan Jha
2026-02-03 10:30 ` Christian Loehle
2026-02-03 16:45 ` Rafael J. Wysocki
1 sibling, 1 reply; 44+ messages in thread
From: Harshvardhan Jha @ 2026-02-03 10:22 UTC (permalink / raw)
To: Christian Loehle, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 03/02/26 3:01 PM, Christian Loehle wrote:
> On 2/3/26 09:16, Harshvardhan Jha wrote:
>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>>> ... snip ...
>>>>>>>>
>>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>>> from base to revert.
>>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>>> Please provide the results of the test runs that were done for
>>>>>>>> the supplied before and after idle data.
>>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>>> Is it a test I can run on my test computer?
>>>>>>> I see that I have fio installed on my test computer.
>>>>>>>
>>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>>> I have done the rest now, see below.
>>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>>
>>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>>>>>
>>>>>>>>> phoronix-sqlite
>>>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 220 0 218 0.00% 99.09%
>>>>>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>>>>>
>>>>>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>>>>>
>>>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 262 0 260 0.00% 99.24%
>>>>>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>>>>>
>>>>>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>>>>>
>>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>
>>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>
>>>>>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 297550 0 296084 0.00% 99.51%
>>>>>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>>>>>
>>>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>>>>>
>>>>>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 90715 0 75134 0.00% 82.82%
>>>>>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>>>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>>>>>
>>>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>>>>>
>>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>>
>>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>>
>>>>>>>> fio
>>>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 3822 0 3818 0.00% 99.90%
>>>>>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>>>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>>>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>>>>>
>>>>>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>>>>>
>>>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 916 0 756 0.00% 82.53%
>>>>>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>>>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>>>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>>>>>
>>>>>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>>>>>
>>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>>> of the workload among the CPUs.
>>>>>>>>
>>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>
>>>>>>>> rds-stress-test
>>>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 1561 0 1435 0.00% 91.93%
>>>>>>>> state 1 13855 899 2410 6.49% 17.39%
>>>>>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>>>>>
>>>>>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>>>>>
>>>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>>> Usage Above Below Above Below
>>>>>>>> state 0 231 0 231 0.00% 100.00%
>>>>>>>> state 1 5413 266 1186 4.91% 21.91%
>>>>>>>> state 2 54365 719 3789 1.32% 6.97%
>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>>>>>
>>>>>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>>>>>
>>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>>> This time the load distribution between CPUs is more
>>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>>
>>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>>
>>>>>>> My input is to revert the reversion.
>>>>>> OK, noted, thanks!
>>>>>>
>>>>>> Christian, what do you think?
>>>>> I've attached readable diffs of the values provided the tldr is:
>>>>>
>>>>> +--------------------+-----------+-----------+
>>>>> | Workload | Δ above % | Δ below % |
>>>>> +--------------------+-----------+-----------+
>>>>> | fio | -10.11 | +2.36 |
>>>>> | rds-stress-test | -0.44 | +2.57 |
>>>>> | jbb | -20.35 | +3.30 |
>>>>> | phoronix-sqlite | -9.66 | -0.61 |
>>>>> +--------------------+-----------+-----------+
>>>>>
>>>>> I think the overall trend however is clear, the commit
>>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>>
>>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>>> performed better on paper, system metrics regressed because other
>>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
>>>>> (+CC Sergey)
>>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>>> governor and a more holistic approach would be needed for that platform
>>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>>> skew towards deeper cpuidle states).
>>>>>
>>>>> I also think that the change made sense, for small residency values
>>>>> with a bit of random noise mixed in, performing the same statistical
>>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>>
>>>>> So options are:
>>>>> 1. Revert revert on mainline+stable
>>>>> 2. Revert revert on mainline only
>>>>> 3. Keep revert, miss out on the improvement for many.
>>>>> 4. Revert only when we have a good solution for the platforms like
>>>>> Sergey's.
>>>>>
>>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>>> hack like playing with the deep idle state residency values would
>>>>> be enough to mitigate.
>>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>>> even more pronounced improvements on older kernels? I've tested this on
>>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>>> Since the revert has optimizations present only in Jasper Lake Systems
>>>> which is new, isn't reverting the revert more relevant on stable
>>>> kernels? It's more likely that older hardware runs older kernels than
>>>> newer hardware although not always necessary imo.
>>>>
>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>> Oh I see, but shouldn't avoiding regressions on established platforms be
>> a priority over further optimizing for specific newer platforms like
>> Jasper Lake?
>>
> Well avoiding regressions on established platforms is what lead to
> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
> being applied and backported.
> The expectation for stable is that we avoid regressions and potentially
> miss out on improvements. If you want the latest greatest performance you
> should probably run a latest greatest kernel.
> The original
> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
> was seen as a fix and overall improvement, that's why it was backported,
> but Sergey's regression report contradicted that.
> What is "established" and "newer" for a stable kernel is quite handwavy
> IMO but even here Sergey's regression report is a clear data point...
> Your report is only restoring 5.15 (and others) performance to 5.15
> upstream-ish levels which is within the expectations of running a stable
> kernel. No doubt it's frustrating either way!
Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels
with the revert and reapplying it recovers performance across many
benchmarks. Hence, the previous suggestion.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 10:22 ` Harshvardhan Jha
@ 2026-02-03 10:30 ` Christian Loehle
0 siblings, 0 replies; 44+ messages in thread
From: Christian Loehle @ 2026-02-03 10:30 UTC (permalink / raw)
To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies
Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On 2/3/26 10:22, Harshvardhan Jha wrote:
>
> On 03/02/26 3:01 PM, Christian Loehle wrote:
>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>>>> On 02/02/26 12:50 AM, Christian Loehle wrote:
>>>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote:
>>>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote:
>>>>>>>> On 2026.01.28 15:53 Doug Smythies wrote:
>>>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote:
>>>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote:
>>>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote:
>>>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>>>> ... snip ...
>>>>>>>>>
>>>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed
>>>>>>>>>>>> from base to revert.
>>>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of
>>>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>>>>>>>>>>>> before and after the workload is usually fine to work with:
>>>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
>>>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu
>>>>>>>>>>> states for the test suites before and after tests. The folders with the
>>>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel
>>>>>>>>>>> containing two files having the before and after states. Please note
>>>>>>>>>>> that different machines were used for different test suites due to
>>>>>>>>>>> compatibility reasons. The jbb test was run using containers.
>>>>>>>>> Please provide the results of the test runs that were done for
>>>>>>>>> the supplied before and after idle data.
>>>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing.
>>>>>>>>> Is it a test I can run on my test computer?
>>>>>>>> I see that I have fio installed on my test computer.
>>>>>>>>
>>>>>>>>>> It is a considerable amount of work to manually extract and summarize the data.
>>>>>>>>>> I have only done it for the phoronix-sqlite data.
>>>>>>>>> I have done the rest now, see below.
>>>>>>>>> I have also attached the results, in case the formatting gets screwed up.
>>>>>>>>>
>>>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now.
>>>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up):
>>>>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???.
>>>>>>>>>>
>>>>>>>>>> phoronix-sqlite
>>>>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???)
>>>>>>>>>> Usage Above Below Above Below
>>>>>>>>>> state 0 220 0 218 0.00% 99.09%
>>>>>>>>>> state 1 70212 5213 34602 7.42% 49.28%
>>>>>>>>>> state 2 30273 5237 1806 17.30% 5.97%
>>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>>> state 4 11824 2120 0 17.93% 0.00%
>>>>>>>>>>
>>>>>>>>>> total 112529 12570 36626 43.72% <<< Misses %
>>>>>>>>>>
>>>>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
>>>>>>>>>> Usage Above Below Above Below
>>>>>>>>>> state 0 262 0 260 0.00% 99.24%
>>>>>>>>>> state 1 62751 3985 35588 6.35% 56.71%
>>>>>>>>>> state 2 24941 7896 1433 31.66% 5.75%
>>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>>> state 4 24489 11543 0 47.14% 0.00%
>>>>>>>>>>
>>>>>>>>>> total 112443 23424 37281 53.99% <<< Misses %
>>>>>>>>>>
>>>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>>
>>>>>>>>>> I have a template now, and can summarize the other 40 CPU data
>>>>>>>>>> faster, but I would have to rework the template for the 56 CPU data,
>>>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU?
>>>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>>
>>>>>>>>> Good Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 297550 0 296084 0.00% 99.51%
>>>>>>>>> state 1 8062854 341043 4962635 4.23% 61.55%
>>>>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 54624476 15868752 0 29.05% 0.00%
>>>>>>>>>
>>>>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses
>>>>>>>>>
>>>>>>>>> Bad Kernel: Time between samples > 2 hours (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 90715 0 75134 0.00% 82.82%
>>>>>>>>> state 1 8878738 312970 6082180 3.52% 68.50%
>>>>>>>>> state 2 12048728 2576251 603316 21.38% 5.01%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 85999424 44723273 0 52.00% 0.00%
>>>>>>>>>
>>>>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses
>>>>>>>>>
>>>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel"
>>>>>>>>>
>>>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6.
>>>>>>>>>
>>>>>>>>> fio
>>>>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 3822 0 3818 0.00% 99.90%
>>>>>>>>> state 1 148640 4406 68956 2.96% 46.39%
>>>>>>>>> state 2 593455 45344 105675 7.64% 17.81%
>>>>>>>>> state 3 3209648 807014 0 25.14% 0.00%
>>>>>>>>>
>>>>>>>>> total 3955565 856764 178449 26.17% <<< Misses
>>>>>>>>>
>>>>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 916 0 756 0.00% 82.53%
>>>>>>>>> state 1 80230 2028 42791 2.53% 53.34%
>>>>>>>>> state 2 59231 6888 6791 11.63% 11.47%
>>>>>>>>> state 3 2455784 564797 0 23.00% 0.00%
>>>>>>>>>
>>>>>>>>> total 2596161 573713 50338 24.04% <<< Misses
>>>>>>>>>
>>>>>>>>> It is not clear why the number of idle entries differs so much
>>>>>>>>> between the tests, but there is a bit of a different distribution
>>>>>>>>> of the workload among the CPUs.
>>>>>>>>>
>>>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled.
>>>>>>>>> POLL, C1, C1E, C3 (disabled), C6
>>>>>>>>>
>>>>>>>>> rds-stress-test
>>>>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 1561 0 1435 0.00% 91.93%
>>>>>>>>> state 1 13855 899 2410 6.49% 17.39%
>>>>>>>>> state 2 467998 139254 23679 29.76% 5.06%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 213132 107417 0 50.40% 0.00%
>>>>>>>>>
>>>>>>>>> total 696546 247570 27524 39.49% <<< Misses
>>>>>>>>>
>>>>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated)
>>>>>>>>> Usage Above Below Above Below
>>>>>>>>> state 0 231 0 231 0.00% 100.00%
>>>>>>>>> state 1 5413 266 1186 4.91% 21.91%
>>>>>>>>> state 2 54365 719 3789 1.32% 6.97%
>>>>>>>>> state 3 0 0 0 0.00% 0.00%
>>>>>>>>> state 4 267055 148327 0 55.54% 0.00%
>>>>>>>>>
>>>>>>>>> total 327064 149312 5206 47.24% <<< Misses
>>>>>>>>>
>>>>>>>>> Again, differing numbers of idle entries between tests.
>>>>>>>>> This time the load distribution between CPUs is more
>>>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's.
>>>>>>>>> In the "Good" case the work is distributed over more CPUs.
>>>>>>>>> I assume without proof, that the scheduler is deciding not to migrate
>>>>>>>>> the next bit of work to another CPU in the one case verses the other.
>>>>>>>> The above is incorrect. The CPUs involved between the "Good"
>>>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of
>>>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity.
>>>>>>>>
>>>>>>>> All of the tests show higher usage of shallower idle states with
>>>>>>>> the "Good" verses the "Bad", which was the expectation of the
>>>>>>>> original patch, as has been mentioned a few times in the emails.
>>>>>>>>
>>>>>>>> My input is to revert the reversion.
>>>>>>> OK, noted, thanks!
>>>>>>>
>>>>>>> Christian, what do you think?
>>>>>> I've attached readable diffs of the values provided the tldr is:
>>>>>>
>>>>>> +--------------------+-----------+-----------+
>>>>>> | Workload | Δ above % | Δ below % |
>>>>>> +--------------------+-----------+-----------+
>>>>>> | fio | -10.11 | +2.36 |
>>>>>> | rds-stress-test | -0.44 | +2.57 |
>>>>>> | jbb | -20.35 | +3.30 |
>>>>>> | phoronix-sqlite | -9.66 | -0.61 |
>>>>>> +--------------------+-----------+-----------+
>>>>>>
>>>>>> I think the overall trend however is clear, the commit
>>>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
>>>>>> improved menu on many systems and workloads, I'd dare to say most.
>>>>>>
>>>>>> Even on the reported regression introduced by it, the cpuidle governor
>>>>>> performed better on paper, system metrics regressed because other
>>>>>> CPUs' P-states weren't available due to being in a shallower state.
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$
>>>>>> (+CC Sergey)
>>>>>> It could be argued that this is a limitation of a per-CPU cpuidle
>>>>>> governor and a more holistic approach would be needed for that platform
>>>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states,
>>>>>> skew towards deeper cpuidle states).
>>>>>>
>>>>>> I also think that the change made sense, for small residency values
>>>>>> with a bit of random noise mixed in, performing the same statistical
>>>>>> test doesn't seem sensible, the short intervals will look noisier.
>>>>>>
>>>>>> So options are:
>>>>>> 1. Revert revert on mainline+stable
>>>>>> 2. Revert revert on mainline only
>>>>>> 3. Keep revert, miss out on the improvement for many.
>>>>>> 4. Revert only when we have a good solution for the platforms like
>>>>>> Sergey's.
>>>>>>
>>>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor
>>>>>> hack like playing with the deep idle state residency values would
>>>>>> be enough to mitigate.
>>>>> Wouldn't it be better to choose option 1 as reverting the revert has
>>>>> even more pronounced improvements on older kernels? I've tested this on
>>>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements.
>>>>> Since the revert has optimizations present only in Jasper Lake Systems
>>>>> which is new, isn't reverting the revert more relevant on stable
>>>>> kernels? It's more likely that older hardware runs older kernels than
>>>>> newer hardware although not always necessary imo.
>>>>>
>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>> a priority over further optimizing for specific newer platforms like
>>> Jasper Lake?
>>>
>> Well avoiding regressions on established platforms is what lead to
>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>> being applied and backported.
>> The expectation for stable is that we avoid regressions and potentially
>> miss out on improvements. If you want the latest greatest performance you
>> should probably run a latest greatest kernel.
>> The original
>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>> was seen as a fix and overall improvement, that's why it was backported,
>> but Sergey's regression report contradicted that.
>> What is "established" and "newer" for a stable kernel is quite handwavy
>> IMO but even here Sergey's regression report is a clear data point...
>> Your report is only restoring 5.15 (and others) performance to 5.15
>> upstream-ish levels which is within the expectations of running a stable
>> kernel. No doubt it's frustrating either way!
> Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels
> with the revert and reapplying it recovers performance across many
> benchmarks. Hence, the previous suggestion.
Right but in the case of 5.15 (similar for the others) we have:
v5.15.196 5666bcc3c00f Revert "cpuidle: menu: Avoid discarding useful information"
v5.15.185 14790abc8779 cpuidle: menu: Avoid discarding useful information
So your regression on v5.15.196 only restores performance back to pre-v5.15.185
levels, you now don't get an improvement which wasn't originally part of v5.15.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 9:31 ` Christian Loehle
2026-02-03 10:22 ` Harshvardhan Jha
@ 2026-02-03 16:45 ` Rafael J. Wysocki
2026-02-05 0:45 ` Doug Smythies
1 sibling, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-03 16:45 UTC (permalink / raw)
To: Christian Loehle
Cc: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies, Sasha Levin,
Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano,
Sergey Senozhatsky
On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 2/3/26 09:16, Harshvardhan Jha wrote:
> >
> > On 03/02/26 2:37 PM, Christian Loehle wrote:
> >> On 2/2/26 17:31, Harshvardhan Jha wrote:
[cut]
> >> FWIW Jasper Lake seems to be supported from 5.6 on, see
> >> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
> >
> > Oh I see, but shouldn't avoiding regressions on established platforms be
> > a priority over further optimizing for specific newer platforms like
> > Jasper Lake?
> >
>
> Well avoiding regressions on established platforms is what lead to
> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
> being applied and backported.
> The expectation for stable is that we avoid regressions and potentially
> miss out on improvements. If you want the latest greatest performance you
> should probably run a latest greatest kernel.
> The original
> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
> was seen as a fix and overall improvement,
Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
Cc: stable. It was picked up into stable kernels for another reason.
> that's why it was backported, but Sergey's regression report contradicted that.
Exactly.
> What is "established" and "newer" for a stable kernel is quite handwavy
> IMO but even here Sergey's regression report is a clear data point...
Which wasn't known at the time commit 85975daeaa4d went in.
> Your report is only restoring 5.15 (and others) performance to 5.15
> upstream-ish levels which is within the expectations of running a stable
> kernel. No doubt it's frustrating either way!
That is a consequence of the time it takes for mainline changes to
propagate to distributions (Chrome OS in this particular case) at
which point they get tested on a wider range of systems. Until that
happens, it is not really guaranteed that the given change will stay
in.
In this particular case, restoring commit 85975daeaa4d would cause the
same problems on the systems adversely affected by it to become
visible again and I don't think it would be fair to say "Too bad" to
the users of those systems. IMV, it cannot be restored without a way
to at least limit the adverse effect on performance.
I have an idea to test, but getting something workable out of it may
be a challenge, even if it turns out to be a good one.
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-03 16:45 ` Rafael J. Wysocki
@ 2026-02-05 0:45 ` Doug Smythies
2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:02 ` Doug Smythies
0 siblings, 2 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-05 0:45 UTC (permalink / raw)
To: 'Rafael J. Wysocki', 'Christian Loehle',
'Harshvardhan Jha', 'Sergey Senozhatsky'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano', Doug Smythies
On 2026.02.03 08:46 Rafael J. Wysocki wrote:
-----Original Message-----
From: Rafael J. Wysocki <rafael@kernel.org>
Sent: February 3, 2026 8:46 AM
To: Christian Loehle <christian.loehle@arm.com>
Cc: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>; Rafael J. Wysocki <rafael@kernel.org>; Doug Smythies <dsmythies@telus.net>; Sasha Levin <sashal@kernel.org>; Greg Kroah-Hartman <gregkh@linuxfoundation.org>; linux-pm@vger.kernel.org; stable@vger.kernel.org; Daniel Lezcano <daniel.lezcano@linaro.org>; Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
> On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote:
>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>
> [cut]
>
>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>>
>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>> a priority over further optimizing for specific newer platforms like
>>> Jasper Lake?
>>>
>> Well avoiding regressions on established platforms is what lead to
>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>> being applied and backported.
>> The expectation for stable is that we avoid regressions and potentially
>> miss out on improvements. If you want the latest greatest performance you
>> should probably run a latest greatest kernel.
>> The original
>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>> was seen as a fix and overall improvement,
>
> Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
> Cc: stable. It was picked up into stable kernels for another reason.
>
>> that's why it was backported, but Sergey's regression report contradicted that.
>
> Exactly.
>
>> What is "established" and "newer" for a stable kernel is quite handwavy
>> IMO but even here Sergey's regression report is a clear data point...
>
> Which wasn't known at the time commit 85975daeaa4d went in.
>
>> Your report is only restoring 5.15 (and others) performance to 5.15
>> upstream-ish levels which is within the expectations of running a stable
>> kernel. No doubt it's frustrating either way!
>
> That is a consequence of the time it takes for mainline changes to
> propagate to distributions (Chrome OS in this particular case) at
> which point they get tested on a wider range of systems. Until that
> happens, it is not really guaranteed that the given change will stay
> in.
>
> In this particular case, restoring commit 85975daeaa4d would cause the
> same problems on the systems adversely affected by it to become
> visible again and I don't think it would be fair to say "Too bad" to
> the users of those systems. IMV, it cannot be restored without a way
> to at least limit the adverse effect on performance.
I have been going over the old emails and the turbostat data again and again
and again.
I still do not understand how to breakdown Sergey's results into its
component contributions. I am certain there is power limit throttling
during the test, but have no idea to much or how little it contributes to the
differing results.
I think more work is needed to fully understand Sergey's test results from October.
I struggle with the dramatic test results difference of base=84.5 and revert=59.5
as being due to only the idle code changes.
That is why I keep asking for a test to be done with the CPU clock frequency limited
such that power limit throttling can not occur. I don't know what limit to use, but suggest
2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
@Sergey: are you willing to do the test?
>
> I have an idea to test, but getting something workable out of it may
> be a challenge, even if it turns out to be a good one.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 0:45 ` Doug Smythies
@ 2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:18 ` Doug Smythies
2026-02-05 7:15 ` Christian Loehle
2026-02-05 5:02 ` Doug Smythies
1 sibling, 2 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-05 2:37 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Rafael J. Wysocki', 'Christian Loehle',
'Harshvardhan Jha', 'Sergey Senozhatsky',
'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano'
On (26/02/04 16:45), Doug Smythies wrote:
> >> What is "established" and "newer" for a stable kernel is quite handwavy
> >> IMO but even here Sergey's regression report is a clear data point...
> >
> > Which wasn't known at the time commit 85975daeaa4d went in.
> >
> >> Your report is only restoring 5.15 (and others) performance to 5.15
> >> upstream-ish levels which is within the expectations of running a stable
> >> kernel. No doubt it's frustrating either way!
> >
> > That is a consequence of the time it takes for mainline changes to
> > propagate to distributions (Chrome OS in this particular case) at
> > which point they get tested on a wider range of systems. Until that
> > happens, it is not really guaranteed that the given change will stay
> > in.
> >
> > In this particular case, restoring commit 85975daeaa4d would cause the
> > same problems on the systems adversely affected by it to become
> > visible again and I don't think it would be fair to say "Too bad" to
> > the users of those systems. IMV, it cannot be restored without a way
> > to at least limit the adverse effect on performance.
>
> I have been going over the old emails and the turbostat data again and again
> and again.
>
> I still do not understand how to breakdown Sergey's results into its
> component contributions. I am certain there is power limit throttling
> during the test, but have no idea to much or how little it contributes to the
> differing results.
>
> I think more work is needed to fully understand Sergey's test results from October.
> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
> as being due to only the idle code changes.
>
> That is why I keep asking for a test to be done with the CPU clock frequency limited
> such that power limit throttling can not occur. I don't know what limit to use, but suggest
> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
> @Sergey: are you willing to do the test?
I can run tests, not immediately, though, but within some reasonable
time frame.
(I'll need some help with instructions/etc.)
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 0:45 ` Doug Smythies
2026-02-05 2:37 ` Sergey Senozhatsky
@ 2026-02-05 5:02 ` Doug Smythies
2026-02-10 9:33 ` Xueqin Luo
1 sibling, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-02-05 5:02 UTC (permalink / raw)
To: 'Rafael J. Wysocki', 'Christian Loehle',
'Harshvardhan Jha', 'Sergey Senozhatsky'
Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano', Doug Smythies
[-- Attachment #1: Type: text/plain, Size: 3851 bytes --]
On 2026.02.04 16:45 Doug Smythies wrote:
> On 2026.02.03 08:46 Rafael J. Wysocki wrote:
>> On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote:
>>> On 2/3/26 09:16, Harshvardhan Jha wrote:
>>>> On 03/02/26 2:37 PM, Christian Loehle wrote:
>>>>> On 2/2/26 17:31, Harshvardhan Jha wrote:
>>
>> [cut]
>>
>>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see
>>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family")
>>>>
>>>> Oh I see, but shouldn't avoiding regressions on established platforms be
>>>> a priority over further optimizing for specific newer platforms like
>>>> Jasper Lake?
>>>>
>>> Well avoiding regressions on established platforms is what lead to
>>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information"
>>> being applied and backported.
>>> The expectation for stable is that we avoid regressions and potentially
>>> miss out on improvements. If you want the latest greatest performance you
>>> should probably run a latest greatest kernel.
>>> The original
>>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information
>>> was seen as a fix and overall improvement,
>>
>> Note, however, that commit 85975daeaa4d carries no Fixes: tag and no
>> Cc: stable. It was picked up into stable kernels for another reason.
>>
>>> that's why it was backported, but Sergey's regression report contradicted that.
>>
>> Exactly.
>>
>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>> IMO but even here Sergey's regression report is a clear data point...
>>
>> Which wasn't known at the time commit 85975daeaa4d went in.
>>
>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>> upstream-ish levels which is within the expectations of running a stable
>>> kernel. No doubt it's frustrating either way!
>>
>> That is a consequence of the time it takes for mainline changes to
>> propagate to distributions (Chrome OS in this particular case) at
>> which point they get tested on a wider range of systems. Until that
>> happens, it is not really guaranteed that the given change will stay
>> in.
>>
>> In this particular case, restoring commit 85975daeaa4d would cause the
>> same problems on the systems adversely affected by it to become
>> visible again and I don't think it would be fair to say "Too bad" to
>> the users of those systems. IMV, it cannot be restored without a way
>> to at least limit the adverse effect on performance.
>
> I have been going over the old emails and the turbostat data again and again
> and again.
>
> I still do not understand how to breakdown Sergey's results into its
> component contributions. I am certain there is power limit throttling
> during the test, but have no idea to much or how little it contributes to the
> differing results.
>
> I think more work is needed to fully understand Sergey's test results from October.
> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
> as being due to only the idle code changes.
>
> That is why I keep asking for a test to be done with the CPU clock frequency limited
> such that power limit throttling can not occur. I don't know what limit to use, but suggest
> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
> @Sergey: are you willing to do the test?
Further to my earlier email, there is an interesting area in Sergey's turbostat data from
October. It is not near any throttling threshold, yet the CPU frequencies are considerably
higher for the "revert" case verses "base", with everything seemingly similar. I do not
understand this area. An annotated graph is attached.
>> I have an idea to test, but getting something workable out of it may
>> be a challenge, even if it turns out to be a good one.
[-- Attachment #2: interesting-area.png --]
[-- Type: image/png, Size: 237347 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 2:37 ` Sergey Senozhatsky
@ 2026-02-05 5:18 ` Doug Smythies
2026-02-10 9:17 ` Sergey Senozhatsky
2026-02-05 7:15 ` Christian Loehle
1 sibling, 1 reply; 44+ messages in thread
From: Doug Smythies @ 2026-02-05 5:18 UTC (permalink / raw)
To: 'Sergey Senozhatsky'
Cc: 'Rafael J. Wysocki', 'Christian Loehle',
'Harshvardhan Jha', 'Sasha Levin',
'Greg Kroah-Hartman', linux-pm, stable,
'Daniel Lezcano', Doug Smythies
On 2026.02.04 18:37 Sergey Senozhatsky wrote:
> On (26/02/04 16:45), Doug Smythies wrote:
>>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>>> IMO but even here Sergey's regression report is a clear data point...
>>>
>>> Which wasn't known at the time commit 85975daeaa4d went in.
>>>
>>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>>> upstream-ish levels which is within the expectations of running a stable
>>>> kernel. No doubt it's frustrating either way!
>>>
>>> That is a consequence of the time it takes for mainline changes to
>>> propagate to distributions (Chrome OS in this particular case) at
>>> which point they get tested on a wider range of systems. Until that
>>> happens, it is not really guaranteed that the given change will stay
>>> in.
>>>
>>> In this particular case, restoring commit 85975daeaa4d would cause the
>>> same problems on the systems adversely affected by it to become
>>> visible again and I don't think it would be fair to say "Too bad" to
>>> the users of those systems. IMV, it cannot be restored without a way
>>> to at least limit the adverse effect on performance.
>>
>> I have been going over the old emails and the turbostat data again and again
>> and again.
>>
>> I still do not understand how to breakdown Sergey's results into its
>> component contributions. I am certain there is power limit throttling
>> during the test, but have no idea to much or how little it contributes to the
>> differing results.
>>
>> I think more work is needed to fully understand Sergey's test results from October.
>> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
>> as being due to only the idle code changes.
>>
>> That is why I keep asking for a test to be done with the CPU clock frequency limited
>> such that power limit throttling can not occur. I don't know what limit to use, but suggest
>> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
>
>> @Sergey: are you willing to do the test?
>
> I can run tests, not immediately, though, but within some reasonable
> time frame.
Thanks.
> (I'll need some help with instructions/etc.)
From your turbostat data from October you are using the intel_pstate
CPU frequency scaling driver and the powersave CPU frequency scaling governor
and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.
To limit the maximum CPU frequency to around 2,200 MHz do:
echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
Then run the tests acquiring turbostat logs the same way you did in October.
To restore the maximum CPU frequency afterwards do:
echo 100 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:18 ` Doug Smythies
@ 2026-02-05 7:15 ` Christian Loehle
2026-02-10 8:02 ` Sergey Senozhatsky
1 sibling, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-05 7:15 UTC (permalink / raw)
To: Sergey Senozhatsky, Doug Smythies
Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano'
On 2/5/26 02:37, Sergey Senozhatsky wrote:
> On (26/02/04 16:45), Doug Smythies wrote:
>>>> What is "established" and "newer" for a stable kernel is quite handwavy
>>>> IMO but even here Sergey's regression report is a clear data point...
>>>
>>> Which wasn't known at the time commit 85975daeaa4d went in.
>>>
>>>> Your report is only restoring 5.15 (and others) performance to 5.15
>>>> upstream-ish levels which is within the expectations of running a stable
>>>> kernel. No doubt it's frustrating either way!
>>>
>>> That is a consequence of the time it takes for mainline changes to
>>> propagate to distributions (Chrome OS in this particular case) at
>>> which point they get tested on a wider range of systems. Until that
>>> happens, it is not really guaranteed that the given change will stay
>>> in.
>>>
>>> In this particular case, restoring commit 85975daeaa4d would cause the
>>> same problems on the systems adversely affected by it to become
>>> visible again and I don't think it would be fair to say "Too bad" to
>>> the users of those systems. IMV, it cannot be restored without a way
>>> to at least limit the adverse effect on performance.
>>
>> I have been going over the old emails and the turbostat data again and again
>> and again.
>>
>> I still do not understand how to breakdown Sergey's results into its
>> component contributions. I am certain there is power limit throttling
>> during the test, but have no idea to much or how little it contributes to the
>> differing results.
>>
>> I think more work is needed to fully understand Sergey's test results from October.
>> I struggle with the dramatic test results difference of base=84.5 and revert=59.5
>> as being due to only the idle code changes.
>>
>> That is why I keep asking for a test to be done with the CPU clock frequency limited
>> such that power limit throttling can not occur. I don't know what limit to use, but suggest
>> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results.
>
>
>> @Sergey: are you willing to do the test?
>
> I can run tests, not immediately, though, but within some reasonable
> time frame.
>
> (I'll need some help with instructions/etc.)
>
@Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
29.6% decrease in system performance in a traditional throughput sense.
The "benchmark" might me measuring dropped frames, user input latency or what have you.
Nonetheless @Sergey do feel free to expand.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 7:15 ` Christian Loehle
@ 2026-02-10 8:02 ` Sergey Senozhatsky
2026-02-10 8:57 ` Christian Loehle
0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10 8:02 UTC (permalink / raw)
To: Christian Loehle
Cc: Sergey Senozhatsky, Doug Smythies, 'Rafael J. Wysocki',
'Harshvardhan Jha', 'Sasha Levin',
'Greg Kroah-Hartman', linux-pm, stable,
'Daniel Lezcano'
On (26/02/05 07:15), Christian Loehle wrote:
[..]
> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
> 29.6% decrease in system performance in a traditional throughput sense.
> The "benchmark" might me measuring dropped frames, user input latency or what have you.
> Nonetheless @Sergey do feel free to expand.
I'm not on the performance team and I don't define those metrics, so
I can't really comment. But frame drops during Google Docs scrolling,
for instance, or typing is a user visible regression, that people tend
to notice.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 8:02 ` Sergey Senozhatsky
@ 2026-02-10 8:57 ` Christian Loehle
2026-02-11 4:27 ` Doug Smythies
0 siblings, 1 reply; 44+ messages in thread
From: Christian Loehle @ 2026-02-10 8:57 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Doug Smythies, 'Rafael J. Wysocki',
'Harshvardhan Jha', 'Sasha Levin',
'Greg Kroah-Hartman', linux-pm, stable,
'Daniel Lezcano'
On 2/10/26 08:02, Sergey Senozhatsky wrote:
> On (26/02/05 07:15), Christian Loehle wrote:
> [..]
>> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
>> 29.6% decrease in system performance in a traditional throughput sense.
>> The "benchmark" might me measuring dropped frames, user input latency or what have you.
>> Nonetheless @Sergey do feel free to expand.
>
> I'm not on the performance team and I don't define those metrics, so
> I can't really comment. But frame drops during Google Docs scrolling,
> for instance, or typing is a user visible regression, that people tend
> to notice.
Yeah I guess that was my point already, i.e. it isn't implausible that
e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in
double the number of dropped frames (= score reduction of 50%).
Everything just an example but don't be thrown off by the 29.6% reduction in
score and expect to go looking for -29.6% cpu frequency (like you would expect
for many purely cpubound benchmarks).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 5:18 ` Doug Smythies
@ 2026-02-10 9:17 ` Sergey Senozhatsky
2026-02-11 4:27 ` Doug Smythies
0 siblings, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10 9:17 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Sergey Senozhatsky', 'Rafael J. Wysocki',
'Christian Loehle', 'Harshvardhan Jha',
'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano'
[-- Attachment #1: Type: text/plain, Size: 774 bytes --]
On (26/02/04 21:18), Doug Smythies wrote:
>
> > (I'll need some help with instructions/etc.)
>
> From your turbostat data from October you are using the intel_pstate
> CPU frequency scaling driver and the powersave CPU frequency scaling governor
> and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.
>
> To limit the maximum CPU frequency to around 2,200 MHz do:
>
> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
So I only set max_perf_pct
echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
and re-ran the test:
revert: 72.5
base: 82.5
// as before "revert" means revert of "cpuidle: menu: Avoid discarding useful
// information", while base means that the patch is applied.
Please find turbostat logs attached.
[-- Attachment #2: turbostat-base.gz --]
[-- Type: application/gzip, Size: 40773 bytes --]
[-- Attachment #3: turbostat-revert.gz --]
[-- Type: application/gzip, Size: 39194 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-05 5:02 ` Doug Smythies
@ 2026-02-10 9:33 ` Xueqin Luo
2026-02-10 10:04 ` Sergey Senozhatsky
2026-02-10 14:20 ` Rafael J. Wysocki
0 siblings, 2 replies; 44+ messages in thread
From: Xueqin Luo @ 2026-02-10 9:33 UTC (permalink / raw)
To: dsmythies
Cc: christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha,
linux-pm, rafael, sashal, senozhatsky, stable, Xueqin Luo
Hi Doug, Rafael, and all,
I would like to share an additional data point from a different
platform that also shows a power regression associated with commit
85975daeaa4d ("cpuidle: menu: Avoid discarding useful information").
The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test
scenario is system idle power measurement.
Below are the cpuidle statistics for CPU1. Other CPU cores show the
same trend.
With commit 85975daeaa4d applied (base):
State Ratio Avg(ms) Count Over-pred Under-pred
-----------------------------------------------------------------
POLL 0.0% 0.10 68 0.0% (0) 100.0% (68)
C1 0.05% 0.82 187 0.0% (0) 61.5% (115)
C2 0.0% 0.50 23 30.4% (7) 69.6% (16)
C3 0.01% 0.59 35 37.1% (13) 62.9% (22)
C4 0.24% 0.65 1092 46.9% (512) 50.0% (546)
C5 81.88% 1.45 169745 52.7% (89450) 0.0% (0)
After reverting the commit (revert):
State Ratio Avg(ms) Count Over-pred Under-pred
-----------------------------------------------------------------
POLL 0.0% 0.09 24 0.0% (0) 100.0% (24)
C1 0.03% 0.44 222 0.0% (0) 57.2% (127)
C2 0.01% 0.44 50 20.0% (10) 74.0% (37)
C3 0.01% 0.49 43 25.6% (11) 60.5% (26)
C4 0.15% 0.52 875 45.1% (395) 41.4% (362)
C5 97.1% 5.30 55099 13.9% (7645) 0.0% (0)
With commit 85975daeaa4d present:
* C5 entry count is very high
* C5 average residency is short (~1.45 ms)
* Over-prediction ratio for C5 is around 50%
After reverting the commit:
* C5 residency ratio exceeds 90%
* C5 average residency increases to ~5 ms
* Entry count drops significantly
* Over-prediction ratio is greatly reduced
This indicates that the commit makes idle residency more fragmented,
leading to more frequent C-state transitions.
In addition to the cpuidle statistics, measured system idle power is
about 2W higher when this commit is applied.
Thanks,
Xueqin Luo
--
2.43.0
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 9:33 ` Xueqin Luo
@ 2026-02-10 10:04 ` Sergey Senozhatsky
2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-10 14:20 ` Rafael J. Wysocki
1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10 10:04 UTC (permalink / raw)
To: Xueqin Luo
Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh,
harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable
On (26/02/10 17:33), Xueqin Luo wrote:
>
> In addition to the cpuidle statistics, measured system idle power is
> about 2W higher when this commit is applied.
>
We also noticed shorted battery life on some of the affected laptops.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 9:33 ` Xueqin Luo
2026-02-10 10:04 ` Sergey Senozhatsky
@ 2026-02-10 14:20 ` Rafael J. Wysocki
1 sibling, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-10 14:20 UTC (permalink / raw)
To: Xueqin Luo
Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh,
harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable
On Tue, Feb 10, 2026 at 10:33 AM Xueqin Luo <luoxueqin@kylinos.cn> wrote:
>
> Hi Doug, Rafael, and all,
>
> I would like to share an additional data point from a different
> platform that also shows a power regression associated with commit
> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information").
>
> The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test
> scenario is system idle power measurement.
>
> Below are the cpuidle statistics for CPU1. Other CPU cores show the
> same trend.
>
> With commit 85975daeaa4d applied (base):
>
> State Ratio Avg(ms) Count Over-pred Under-pred
> -----------------------------------------------------------------
> POLL 0.0% 0.10 68 0.0% (0) 100.0% (68)
> C1 0.05% 0.82 187 0.0% (0) 61.5% (115)
> C2 0.0% 0.50 23 30.4% (7) 69.6% (16)
> C3 0.01% 0.59 35 37.1% (13) 62.9% (22)
> C4 0.24% 0.65 1092 46.9% (512) 50.0% (546)
> C5 81.88% 1.45 169745 52.7% (89450) 0.0% (0)
>
> After reverting the commit (revert):
>
> State Ratio Avg(ms) Count Over-pred Under-pred
> -----------------------------------------------------------------
> POLL 0.0% 0.09 24 0.0% (0) 100.0% (24)
> C1 0.03% 0.44 222 0.0% (0) 57.2% (127)
> C2 0.01% 0.44 50 20.0% (10) 74.0% (37)
> C3 0.01% 0.49 43 25.6% (11) 60.5% (26)
> C4 0.15% 0.52 875 45.1% (395) 41.4% (362)
> C5 97.1% 5.30 55099 13.9% (7645) 0.0% (0)
>
> With commit 85975daeaa4d present:
>
> * C5 entry count is very high
> * C5 average residency is short (~1.45 ms)
> * Over-prediction ratio for C5 is around 50%
>
> After reverting the commit:
>
> * C5 residency ratio exceeds 90%
> * C5 average residency increases to ~5 ms
> * Entry count drops significantly
> * Over-prediction ratio is greatly reduced
>
> This indicates that the commit makes idle residency more fragmented,
> leading to more frequent C-state transitions.
Thanks for the data!
> In addition to the cpuidle statistics, measured system idle power is
> about 2W higher when this commit is applied.
Well, 2W of a difference is not good. I'm wondering what the idle
power with the commit in question reverted is.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 10:04 ` Sergey Senozhatsky
@ 2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-11 1:34 ` Sergey Senozhatsky
2026-02-11 8:58 ` Xueqin Luo
0 siblings, 2 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2026-02-10 14:24 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Xueqin Luo, dsmythies, christian.loehle, daniel.lezcano, gregkh,
harshvardhan.j.jha, linux-pm, rafael, sashal, stable
On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (26/02/10 17:33), Xueqin Luo wrote:
> >
> > In addition to the cpuidle statistics, measured system idle power is
> > about 2W higher when this commit is applied.
> >
>
> We also noticed shorted battery life on some of the affected laptops.
Was the difference significant?
Overall, this clearly is a "help some - hurt some" situation and I am
not at all convinced that restoring the commit in question is a good
idea (with all due respect to everyone who thinks otherwise or got
better results when it was there).
Honestly, I'd rather stop tweaking the menu governor at this point and
kindly ask people who want to sacrifice some energy for more
performance to try the teo governor instead.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 14:24 ` Rafael J. Wysocki
@ 2026-02-11 1:34 ` Sergey Senozhatsky
2026-02-11 4:17 ` Doug Smythies
2026-02-11 8:58 ` Xueqin Luo
1 sibling, 1 reply; 44+ messages in thread
From: Sergey Senozhatsky @ 2026-02-11 1:34 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Sergey Senozhatsky, Xueqin Luo, dsmythies, christian.loehle,
daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, sashal,
stable
On (26/02/10 15:24), Rafael J. Wysocki wrote:
> On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (26/02/10 17:33), Xueqin Luo wrote:
> > >
> > > In addition to the cpuidle statistics, measured system idle power is
> > > about 2W higher when this commit is applied.
> > >
> >
> > We also noticed shorted battery life on some of the affected laptops.
>
> Was the difference significant?
I think I saw up to "5.16% regression in perf.minutes_battery_life"
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-11 1:34 ` Sergey Senozhatsky
@ 2026-02-11 4:17 ` Doug Smythies
0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11 4:17 UTC (permalink / raw)
To: 'Sergey Senozhatsky', 'Rafael J. Wysocki'
Cc: 'Xueqin Luo', christian.loehle, daniel.lezcano, gregkh,
harshvardhan.j.jha, linux-pm, sashal, stable, Doug Smythies
On 2026.02.10 17:34 Sergey Senozhatsky wrote:
> On (26/02/10 15:24), Rafael J. Wysocki wrote:
>> On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky wrote:
>>> On (26/02/10 17:33), Xueqin Luo wrote:
>>>>
>>>> In addition to the cpuidle statistics, measured system idle power is
>>>> about 2W higher when this commit is applied.
>>>>
>>>
>>> We also noticed shorted battery life on some of the affected laptops.
>>
>> Was the difference significant?
>
> I think I saw up to "5.16% regression in perf.minutes_battery_life"
Note: I get a fair bit of noise on my idle tests of recent.
For what it's worth: On my test computer I got:
kernel 6.19-rc8 and with reapply.
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
CPU frequency scaling driver: intel_pstate.
CPU frequency scaling governor: powersave.
HWP: enabled.
idle governor menu.
Git Branch "doug"
3856f38e5bb9 (HEAD -> doug) Reapply "cpuidle: menu: Avoid discarding useful information"
18f7fcd5e69a (tag: v6.19-rc8) Linux 6.19-rc8
Processor Package Power:
average reapply: 140 minutes: 1.783028571
average rc8: 512 minutes: 1.656212891
7.65% more power
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 9:17 ` Sergey Senozhatsky
@ 2026-02-11 4:27 ` Doug Smythies
0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11 4:27 UTC (permalink / raw)
To: 'Sergey Senozhatsky'
Cc: 'Rafael J. Wysocki', 'Christian Loehle',
'Harshvardhan Jha', 'Sasha Levin',
'Greg Kroah-Hartman', linux-pm, stable,
'Daniel Lezcano', Doug Smythies
On 2026.02.10 01:17 Sergey Senozhatsky wrote:
> On (26/02/04 21:18), Doug Smythies wrote:
>>
>>> (I'll need some help with instructions/etc.)
>>
>> From your turbostat data from October you are using the intel_pstate
>> CPU frequency scaling driver and the powersave CPU frequency scaling governor
>> and HWP enabled. Also your maximum CPU frequency is 3,300 MHz.
>>
>> To limit the maximum CPU frequency to around 2,200 MHz do:
>>
>> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
>
> So I only set max_perf_pct
>
> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
>
> and re-ran the test:
>
> revert: 72.5
> base: 82.5
>
> // as before "revert" means revert of "cpuidle: menu: Avoid discarding useful
> // information", while base means that the patch is applied.
>
> Please find turbostat logs attached.
Than you for doing the test.
So now the turbostat data shows that the processor never gets anywhere
near any throttling threshold, eliminating it form consideration.
... Doug
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 8:57 ` Christian Loehle
@ 2026-02-11 4:27 ` Doug Smythies
0 siblings, 0 replies; 44+ messages in thread
From: Doug Smythies @ 2026-02-11 4:27 UTC (permalink / raw)
To: 'Christian Loehle', 'Sergey Senozhatsky'
Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha',
'Sasha Levin', 'Greg Kroah-Hartman', linux-pm,
stable, 'Daniel Lezcano', Doug Smythies
On 2026.02.10 00:57 Christian Loehle wrote:
> On 2/10/26 08:02, Sergey Senozhatsky wrote:
>> On (26/02/05 07:15), Christian Loehle wrote:
>> [..]
>>> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean
>>> 29.6% decrease in system performance in a traditional throughput sense.
>>> The "benchmark" might me measuring dropped frames, user input latency or what have you.
>>> Nonetheless @Sergey do feel free to expand.
>>
>> I'm not on the performance team and I don't define those metrics, so
>> I can't really comment. But frame drops during Google Docs scrolling,
>> for instance, or typing is a user visible regression, that people tend
>> to notice.
>
> Yeah I guess that was my point already, i.e. it isn't implausible that
> e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in
> double the number of dropped frames (= score reduction of 50%).
> Everything just an example but don't be thrown off by the 29.6% reduction in
> score and expect to go looking for -29.6% cpu frequency (like you would expect
> for many purely cpubound benchmarks).
Thanks for the inputs. Agreed.
In my defense, I was just attempting to extract whatever I could from
the very limited data we had.
... Doug
^ permalink raw reply [flat|nested] 44+ messages in thread
* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-11 1:34 ` Sergey Senozhatsky
@ 2026-02-11 8:58 ` Xueqin Luo
1 sibling, 0 replies; 44+ messages in thread
From: Xueqin Luo @ 2026-02-11 8:58 UTC (permalink / raw)
To: rafael
Cc: christian.loehle, daniel.lezcano, dsmythies, gregkh,
harshvardhan.j.jha, linux-pm, luoxueqin, sashal, senozhatsky,
stable
On this platform (ZHAOXIN KaiXian KX-7000), we evaluated the impact
of commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information")
under a screen-on idle scenario. During testing, the cpufreq driver
was acpi-cpufreq and the scaling governor was set to ondemand.
With this commit applied, measured system idle power increases by
approximately 2W compared to the revert case. In addition, battery life
testing on the same system shows a reduction of roughly 80 minutes when
this commit is present.
These results were consistently reproduced across multiple test runs
under identical conditions.
--
2.43.0
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2026-02-11 8:59 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
2025-12-03 16:44 ` Christian Loehle
2025-12-03 22:30 ` Doug Smythies
2025-12-08 11:33 ` Harshvardhan Jha
2025-12-08 12:47 ` Christian Loehle
2026-01-13 7:06 ` Harshvardhan Jha
2026-01-13 14:13 ` Rafael J. Wysocki
2026-01-13 14:18 ` Rafael J. Wysocki
2026-01-14 4:28 ` Sergey Senozhatsky
2026-01-14 4:49 ` Sergey Senozhatsky
2026-01-14 5:15 ` Tomasz Figa
2026-01-14 20:07 ` Rafael J. Wysocki
2026-01-29 10:23 ` Harshvardhan Jha
2026-01-29 22:47 ` Doug Smythies
2026-01-27 15:45 ` Harshvardhan Jha
2026-01-28 5:06 ` Doug Smythies
2026-01-28 23:53 ` Doug Smythies
2026-01-29 22:27 ` Doug Smythies
2026-01-30 19:28 ` Rafael J. Wysocki
2026-02-01 19:20 ` Christian Loehle
2026-02-02 17:31 ` Harshvardhan Jha
2026-02-03 9:07 ` Christian Loehle
2026-02-03 9:16 ` Harshvardhan Jha
2026-02-03 9:31 ` Christian Loehle
2026-02-03 10:22 ` Harshvardhan Jha
2026-02-03 10:30 ` Christian Loehle
2026-02-03 16:45 ` Rafael J. Wysocki
2026-02-05 0:45 ` Doug Smythies
2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:18 ` Doug Smythies
2026-02-10 9:17 ` Sergey Senozhatsky
2026-02-11 4:27 ` Doug Smythies
2026-02-05 7:15 ` Christian Loehle
2026-02-10 8:02 ` Sergey Senozhatsky
2026-02-10 8:57 ` Christian Loehle
2026-02-11 4:27 ` Doug Smythies
2026-02-05 5:02 ` Doug Smythies
2026-02-10 9:33 ` Xueqin Luo
2026-02-10 10:04 ` Sergey Senozhatsky
2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-11 1:34 ` Sergey Senozhatsky
2026-02-11 4:17 ` Doug Smythies
2026-02-11 8:58 ` Xueqin Luo
2026-02-10 14:20 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox