* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS @ 2025-12-03 16:18 Harshvardhan Jha 2025-12-03 16:44 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2025-12-03 16:18 UTC (permalink / raw) To: Rafael J. Wysocki, Daniel Lezcano Cc: Sasha Levin, Christian Loehle, Greg Kroah-Hartman, linux-pm@vger.kernel.org, stable@vger.kernel.org Hi there, While running performance benchmarks for the 5.15.196 LTS tags , it was observed that several regressions across different benchmarks is being introduced when compared to the previous 5.15.193 kernel tag. Running an automated bisect on both of them narrowed down the culprit commit to: - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful information" for 5.15 Regressions on 5.15.196 include: -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 -6.3% : Phoronix system/sqlite on OnPrem X6-2 -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 thread & 1M buffer size on OnPrem X6-2 -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & 1 thread & 1M buffer size on OnPrem X6-2 Up to -30% : Some Netpipe metrics on OnPrem X5-2 The culprit commits' messages mention that these reverts were done due to performance regressions introduced in Intel Jasper Lake systems but this revert is causing issues in other systems unfortunately. I wanted to know the maintainers' opinion on how we should proceed in order to fix this. If we reapply it'll bring back the previous regressions on Jasper Lake systems and if we don't revert it then it's stuck with current regressions. If this problem has been reported before and a fix is in the works then please let me know I shall follow developments to that mail thread. Thanks & Regards, Harshvardhan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha @ 2025-12-03 16:44 ` Christian Loehle 2025-12-03 22:30 ` Doug Smythies 0 siblings, 1 reply; 44+ messages in thread From: Christian Loehle @ 2025-12-03 16:44 UTC (permalink / raw) To: Harshvardhan Jha, Rafael J. Wysocki, Daniel Lezcano Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm@vger.kernel.org, stable@vger.kernel.org On 12/3/25 16:18, Harshvardhan Jha wrote: > Hi there, > > While running performance benchmarks for the 5.15.196 LTS tags , it was > observed that several regressions across different benchmarks is being > introduced when compared to the previous 5.15.193 kernel tag. Running an > automated bisect on both of them narrowed down the culprit commit to: > - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful > information" for 5.15 > > Regressions on 5.15.196 include: > -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 > -6.3% : Phoronix system/sqlite on OnPrem X6-2 > -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 > thread & 1M buffer size on OnPrem X6-2 > -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & > 1 thread & 1M buffer size on OnPrem X6-2 > Up to -30% : Some Netpipe metrics on OnPrem X5-2 > > The culprit commits' messages mention that these reverts were done due > to performance regressions introduced in Intel Jasper Lake systems but > this revert is causing issues in other systems unfortunately. I wanted > to know the maintainers' opinion on how we should proceed in order to > fix this. If we reapply it'll bring back the previous regressions on > Jasper Lake systems and if we don't revert it then it's stuck with > current regressions. If this problem has been reported before and a fix > is in the works then please let me know I shall follow developments to > that mail thread. The discussion regarding this can be found here: https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ we explored an alternative to the full revert here: https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/ unfortunately that didn't lead anywhere useful, so Rafael went with the full revert you're seeing now. Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs will highly depend on your platform, but also your workload, Jasper Lake in particular seems to favor deep idle states even when they don't seem to be a 'good' choice from a purely cpuidle (governor) perspective, so we're kind of stuck with that. For teo we've discussed a tunable knob in the past, which comes naturally with the logic, for menu there's nothing obvious that would be comparable. But for teo such a knob didn't generate any further interest (so far). That's the status, unless I missed anything? ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-03 16:44 ` Christian Loehle @ 2025-12-03 22:30 ` Doug Smythies 2025-12-08 11:33 ` Harshvardhan Jha 0 siblings, 1 reply; 44+ messages in thread From: Doug Smythies @ 2025-12-03 22:30 UTC (permalink / raw) To: 'Harshvardhan Jha' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, Doug Smythies, 'Christian Loehle', 'Rafael J. Wysocki', 'Daniel Lezcano' [-- Attachment #1: Type: text/plain, Size: 3917 bytes --] On 2025.12.03 08:45 Christian Loehle wrote: > On 12/3/25 16:18, Harshvardhan Jha wrote: >> Hi there, >> >> While running performance benchmarks for the 5.15.196 LTS tags , it was >> observed that several regressions across different benchmarks is being >> introduced when compared to the previous 5.15.193 kernel tag. Running an >> automated bisect on both of them narrowed down the culprit commit to: >> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >> information" for 5.15 >> >> Regressions on 5.15.196 include: >> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >> thread & 1M buffer size on OnPrem X6-2 >> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >> 1 thread & 1M buffer size on OnPrem X6-2 >> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >> >> The culprit commits' messages mention that these reverts were done due >> to performance regressions introduced in Intel Jasper Lake systems but >> this revert is causing issues in other systems unfortunately. I wanted >> to know the maintainers' opinion on how we should proceed in order to >> fix this. If we reapply it'll bring back the previous regressions on >> Jasper Lake systems and if we don't revert it then it's stuck with >> current regressions. If this problem has been reported before and a fix >> is in the works then please let me know I shall follow developments to >> that mail thread. > > The discussion regarding this can be found here: > https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ > we explored an alternative to the full revert here: > https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/ > unfortunately that didn't lead anywhere useful, so Rafael went with the > full revert you're seeing now. > > Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs > will highly depend on your platform, but also your workload, Jasper Lake > in particular seems to favor deep idle states even when they don't seem > to be a 'good' choice from a purely cpuidle (governor) perspective, so > we're kind of stuck with that. > > For teo we've discussed a tunable knob in the past, which comes naturally with > the logic, for menu there's nothing obvious that would be comparable. > But for teo such a knob didn't generate any further interest (so far). > > That's the status, unless I missed anything? By reading everything in the links Chrsitian provided, you can see that we had difficulties repeating test results on other platforms. Of the tests listed herein, the only one that was easy to repeat on my test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): Kernel 6.18 Reverted pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 performance performance performance performance test what ave ave ave ave 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% Where: T/C means: Threads / Copies performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. Data points are in Seconds. Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. The reversion was manually applied to kernel 6.18-rc1 for that test. The reversion was included in kernel 6.18-rc3. Kernel 6.18-rc4 had another code change to menu.c In case the formatting gets messed up, the table is also attached. Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. HWP: Enabled. ... Doug [-- Attachment #2: sqlite-test-table.png --] [-- Type: image/png, Size: 21984 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-03 22:30 ` Doug Smythies @ 2025-12-08 11:33 ` Harshvardhan Jha 2025-12-08 12:47 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2025-12-08 11:33 UTC (permalink / raw) To: Doug Smythies Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Christian Loehle', 'Rafael J. Wysocki', 'Daniel Lezcano' Hi Doug, On 04/12/25 4:00 AM, Doug Smythies wrote: > On 2025.12.03 08:45 Christian Loehle wrote: >> On 12/3/25 16:18, Harshvardhan Jha wrote: >>> Hi there, >>> >>> While running performance benchmarks for the 5.15.196 LTS tags , it was >>> observed that several regressions across different benchmarks is being >>> introduced when compared to the previous 5.15.193 kernel tag. Running an >>> automated bisect on both of them narrowed down the culprit commit to: >>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >>> information" for 5.15 >>> >>> Regressions on 5.15.196 include: >>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >>> thread & 1M buffer size on OnPrem X6-2 >>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >>> 1 thread & 1M buffer size on OnPrem X6-2 >>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >>> >>> The culprit commits' messages mention that these reverts were done due >>> to performance regressions introduced in Intel Jasper Lake systems but >>> this revert is causing issues in other systems unfortunately. I wanted >>> to know the maintainers' opinion on how we should proceed in order to >>> fix this. If we reapply it'll bring back the previous regressions on >>> Jasper Lake systems and if we don't revert it then it's stuck with >>> current regressions. If this problem has been reported before and a fix >>> is in the works then please let me know I shall follow developments to >>> that mail thread. >> The discussion regarding this can be found here: >> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ >> we explored an alternative to the full revert here: >> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ >> unfortunately that didn't lead anywhere useful, so Rafael went with the >> full revert you're seeing now. >> >> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs >> will highly depend on your platform, but also your workload, Jasper Lake >> in particular seems to favor deep idle states even when they don't seem >> to be a 'good' choice from a purely cpuidle (governor) perspective, so >> we're kind of stuck with that. >> >> For teo we've discussed a tunable knob in the past, which comes naturally with >> the logic, for menu there's nothing obvious that would be comparable. >> But for teo such a knob didn't generate any further interest (so far). >> >> That's the status, unless I missed anything? > By reading everything in the links Chrsitian provided, you can see > that we had difficulties repeating test results on other platforms. > > Of the tests listed herein, the only one that was easy to repeat on my > test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): > > Kernel 6.18 Reverted > pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 > performance performance performance performance > test what ave ave ave ave > 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% > 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% > 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% > 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% > 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% > > Where: > T/C means: Threads / Copies > performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. > Data points are in Seconds. > Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. > The reversion was manually applied to kernel 6.18-rc1 for that test. > The reversion was included in kernel 6.18-rc3. > Kernel 6.18-rc4 had another code change to menu.c > > In case the formatting gets messed up, the table is also attached. > > Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. > HWP: Enabled. I was able to recover performance on 5.15 and 5.4 LTS based kernels after reapplying the revert on X6-2 systems. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz CPU family: 6 Model: 79 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 Stepping: 1 CPU(s) scaling MHz: 98% CPU max MHz: 2600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5188.26 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pg e mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm cons tant_tsc arch_perfmon pebs bts rep_good nopl xtopol ogy nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes 64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe po pcnt tsc_deadline_timer aes xsave avx f16c rdrand l ahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp _l3 pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bm i1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rd seed adx smap intel_pt xsaveopt cqm_llc cqm_occup_l lc cqm_mbm_total cqm_mbm_local dtherm arat pln pts vnmi md_clear flush_l1d Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 896 KiB (28 instances) L1i: 896 KiB (28 instances) L2: 7 MiB (28 instances) L3: 70 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-13,28-41 NUMA node1 CPU(s): 14-27,42-55 Vulnerabilities: Gather data sampling: Not affected Indirect target selection: Not affected Itlb multihit: KVM: Mitigation: Split huge pages L1tf: Mitigation; PTE Inversion; VMX conditional cache fl ushes, SMT vulnerable Mds: Mitigation; Clear CPU buffers; SMT vulnerable Meltdown: Mitigation; PTI Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via p rctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user poi nter sanitization Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not aff ected; BHI Not affected Srbds: Not affected Tsa: Not affected Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable Vmscape: Mitigation; IBPB before exit to userspace Thanks & Regards, Harshvardhan > > ... Doug > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-08 11:33 ` Harshvardhan Jha @ 2025-12-08 12:47 ` Christian Loehle 2026-01-13 7:06 ` Harshvardhan Jha 2026-01-27 15:45 ` Harshvardhan Jha 0 siblings, 2 replies; 44+ messages in thread From: Christian Loehle @ 2025-12-08 12:47 UTC (permalink / raw) To: Harshvardhan Jha, Doug Smythies Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano' On 12/8/25 11:33, Harshvardhan Jha wrote: > Hi Doug, > > On 04/12/25 4:00 AM, Doug Smythies wrote: >> On 2025.12.03 08:45 Christian Loehle wrote: >>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>> Hi there, >>>> >>>> While running performance benchmarks for the 5.15.196 LTS tags , it was >>>> observed that several regressions across different benchmarks is being >>>> introduced when compared to the previous 5.15.193 kernel tag. Running an >>>> automated bisect on both of them narrowed down the culprit commit to: >>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >>>> information" for 5.15 >>>> >>>> Regressions on 5.15.196 include: >>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >>>> thread & 1M buffer size on OnPrem X6-2 >>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >>>> 1 thread & 1M buffer size on OnPrem X6-2 >>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >>>> >>>> The culprit commits' messages mention that these reverts were done due >>>> to performance regressions introduced in Intel Jasper Lake systems but >>>> this revert is causing issues in other systems unfortunately. I wanted >>>> to know the maintainers' opinion on how we should proceed in order to >>>> fix this. If we reapply it'll bring back the previous regressions on >>>> Jasper Lake systems and if we don't revert it then it's stuck with >>>> current regressions. If this problem has been reported before and a fix >>>> is in the works then please let me know I shall follow developments to >>>> that mail thread. >>> The discussion regarding this can be found here: >>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ >>> we explored an alternative to the full revert here: >>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ >>> unfortunately that didn't lead anywhere useful, so Rafael went with the >>> full revert you're seeing now. >>> >>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs >>> will highly depend on your platform, but also your workload, Jasper Lake >>> in particular seems to favor deep idle states even when they don't seem >>> to be a 'good' choice from a purely cpuidle (governor) perspective, so >>> we're kind of stuck with that. >>> >>> For teo we've discussed a tunable knob in the past, which comes naturally with >>> the logic, for menu there's nothing obvious that would be comparable. >>> But for teo such a knob didn't generate any further interest (so far). >>> >>> That's the status, unless I missed anything? >> By reading everything in the links Chrsitian provided, you can see >> that we had difficulties repeating test results on other platforms. >> >> Of the tests listed herein, the only one that was easy to repeat on my >> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): >> >> Kernel 6.18 Reverted >> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 >> performance performance performance performance >> test what ave ave ave ave >> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% >> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% >> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% >> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% >> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% >> >> Where: >> T/C means: Threads / Copies >> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. >> Data points are in Seconds. >> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. >> The reversion was manually applied to kernel 6.18-rc1 for that test. >> The reversion was included in kernel 6.18-rc3. >> Kernel 6.18-rc4 had another code change to menu.c >> >> In case the formatting gets messed up, the table is also attached. >> >> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. >> HWP: Enabled. > > I was able to recover performance on 5.15 and 5.4 LTS based kernels > after reapplying the revert on X6-2 systems. > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 46 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 56 > On-line CPU(s) list: 0-55 > Vendor ID: GenuineIntel > Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz > CPU family: 6 > Model: 79 > Thread(s) per core: 2 > Core(s) per socket: 14 > Socket(s): 2 > Stepping: 1 > CPU(s) scaling MHz: 98% > CPU max MHz: 2600.0000 > CPU min MHz: 1200.0000 > BogoMIPS: 5188.26 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > mtrr pg > e mca cmov pat pse36 clflush dts acpi mmx > fxsr sse > sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp > lm cons > tant_tsc arch_perfmon pebs bts rep_good > nopl xtopol > ogy nonstop_tsc cpuid aperfmperf pni > pclmulqdq dtes > 64 monitor ds_cpl vmx smx est tm2 ssse3 > sdbg fma cx > 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic > movbe po > pcnt tsc_deadline_timer aes xsave avx f16c > rdrand l > ahf_lm abm 3dnowprefetch cpuid_fault epb > cat_l3 cdp > _l3 pti intel_ppin ssbd ibrs ibpb stibp > tpr_shadow > flexpriority ept vpid ept_ad fsgsbase > tsc_adjust bm > i1 hle avx2 smep bmi2 erms invpcid rtm cqm > rdt_a rd > seed adx smap intel_pt xsaveopt cqm_llc > cqm_occup_l > lc cqm_mbm_total cqm_mbm_local dtherm arat > pln pts > vnmi md_clear flush_l1d > Virtualization features: > Virtualization: VT-x > Caches (sum of all): > L1d: 896 KiB (28 instances) > L1i: 896 KiB (28 instances) > L2: 7 MiB (28 instances) > L3: 70 MiB (2 instances) > NUMA: > NUMA node(s): 2 > NUMA node0 CPU(s): 0-13,28-41 > NUMA node1 CPU(s): 14-27,42-55 > Vulnerabilities: > Gather data sampling: Not affected > Indirect target selection: Not affected > Itlb multihit: KVM: Mitigation: Split huge pages > L1tf: Mitigation; PTE Inversion; VMX conditional > cache fl > ushes, SMT vulnerable > Mds: Mitigation; Clear CPU buffers; SMT vulnerable > Meltdown: Mitigation; PTI > Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable > Reg file data sampling: Not affected > Retbleed: Not affected > Spec rstack overflow: Not affected > Spec store bypass: Mitigation; Speculative Store Bypass > disabled via p > rctl > Spectre v1: Mitigation; usercopy/swapgs barriers and > __user poi > nter sanitization > Spectre v2: Mitigation; Retpolines; IBPB conditional; > IBRS_FW; > STIBP conditional; RSB filling; PBRSB-eIBRS > Not aff > ected; BHI Not affected > Srbds: Not affected > Tsa: Not affected > Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable > Vmscape: Mitigation; IBPB before exit to userspace > It would be nice to get the idle states here, ideally how the states' usage changed from base to revert. The mentioned thread did this and should show how it can be done, but a dump of cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* before and after the workload is usually fine to work with: https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-08 12:47 ` Christian Loehle @ 2026-01-13 7:06 ` Harshvardhan Jha 2026-01-13 14:13 ` Rafael J. Wysocki 2026-01-27 15:45 ` Harshvardhan Jha 1 sibling, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2026-01-13 7:06 UTC (permalink / raw) To: Christian Loehle, Doug Smythies Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano' Hi Crhistian, On 08/12/25 6:17 PM, Christian Loehle wrote: > On 12/8/25 11:33, Harshvardhan Jha wrote: >> Hi Doug, >> >> On 04/12/25 4:00 AM, Doug Smythies wrote: >>> On 2025.12.03 08:45 Christian Loehle wrote: >>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>> Hi there, >>>>> >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was >>>>> observed that several regressions across different benchmarks is being >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an >>>>> automated bisect on both of them narrowed down the culprit commit to: >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >>>>> information" for 5.15 >>>>> >>>>> Regressions on 5.15.196 include: >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >>>>> thread & 1M buffer size on OnPrem X6-2 >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >>>>> 1 thread & 1M buffer size on OnPrem X6-2 >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >>>>> >>>>> The culprit commits' messages mention that these reverts were done due >>>>> to performance regressions introduced in Intel Jasper Lake systems but >>>>> this revert is causing issues in other systems unfortunately. I wanted >>>>> to know the maintainers' opinion on how we should proceed in order to >>>>> fix this. If we reapply it'll bring back the previous regressions on >>>>> Jasper Lake systems and if we don't revert it then it's stuck with >>>>> current regressions. If this problem has been reported before and a fix >>>>> is in the works then please let me know I shall follow developments to >>>>> that mail thread. >>>> The discussion regarding this can be found here: >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ >>>> we explored an alternative to the full revert here: >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the >>>> full revert you're seeing now. >>>> >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs >>>> will highly depend on your platform, but also your workload, Jasper Lake >>>> in particular seems to favor deep idle states even when they don't seem >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so >>>> we're kind of stuck with that. >>>> >>>> For teo we've discussed a tunable knob in the past, which comes naturally with >>>> the logic, for menu there's nothing obvious that would be comparable. >>>> But for teo such a knob didn't generate any further interest (so far). >>>> >>>> That's the status, unless I missed anything? >>> By reading everything in the links Chrsitian provided, you can see >>> that we had difficulties repeating test results on other platforms. >>> >>> Of the tests listed herein, the only one that was easy to repeat on my >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): >>> >>> Kernel 6.18 Reverted >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 >>> performance performance performance performance >>> test what ave ave ave ave >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% >>> >>> Where: >>> T/C means: Threads / Copies >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. >>> Data points are in Seconds. >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. >>> The reversion was manually applied to kernel 6.18-rc1 for that test. >>> The reversion was included in kernel 6.18-rc3. >>> Kernel 6.18-rc4 had another code change to menu.c >>> >>> In case the formatting gets messed up, the table is also attached. >>> >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. >>> HWP: Enabled. >> I was able to recover performance on 5.15 and 5.4 LTS based kernels >> after reapplying the revert on X6-2 systems. >> >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Address sizes: 46 bits physical, 48 bits virtual >> Byte Order: Little Endian >> CPU(s): 56 >> On-line CPU(s) list: 0-55 >> Vendor ID: GenuineIntel >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz >> CPU family: 6 >> Model: 79 >> Thread(s) per core: 2 >> Core(s) per socket: 14 >> Socket(s): 2 >> Stepping: 1 >> CPU(s) scaling MHz: 98% >> CPU max MHz: 2600.0000 >> CPU min MHz: 1200.0000 >> BogoMIPS: 5188.26 >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep >> mtrr pg >> e mca cmov pat pse36 clflush dts acpi mmx >> fxsr sse >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp >> lm cons >> tant_tsc arch_perfmon pebs bts rep_good >> nopl xtopol >> ogy nonstop_tsc cpuid aperfmperf pni >> pclmulqdq dtes >> 64 monitor ds_cpl vmx smx est tm2 ssse3 >> sdbg fma cx >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic >> movbe po >> pcnt tsc_deadline_timer aes xsave avx f16c >> rdrand l >> ahf_lm abm 3dnowprefetch cpuid_fault epb >> cat_l3 cdp >> _l3 pti intel_ppin ssbd ibrs ibpb stibp >> tpr_shadow >> flexpriority ept vpid ept_ad fsgsbase >> tsc_adjust bm >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm >> rdt_a rd >> seed adx smap intel_pt xsaveopt cqm_llc >> cqm_occup_l >> lc cqm_mbm_total cqm_mbm_local dtherm arat >> pln pts >> vnmi md_clear flush_l1d >> Virtualization features: >> Virtualization: VT-x >> Caches (sum of all): >> L1d: 896 KiB (28 instances) >> L1i: 896 KiB (28 instances) >> L2: 7 MiB (28 instances) >> L3: 70 MiB (2 instances) >> NUMA: >> NUMA node(s): 2 >> NUMA node0 CPU(s): 0-13,28-41 >> NUMA node1 CPU(s): 14-27,42-55 >> Vulnerabilities: >> Gather data sampling: Not affected >> Indirect target selection: Not affected >> Itlb multihit: KVM: Mitigation: Split huge pages >> L1tf: Mitigation; PTE Inversion; VMX conditional >> cache fl >> ushes, SMT vulnerable >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable >> Meltdown: Mitigation; PTI >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable >> Reg file data sampling: Not affected >> Retbleed: Not affected >> Spec rstack overflow: Not affected >> Spec store bypass: Mitigation; Speculative Store Bypass >> disabled via p >> rctl >> Spectre v1: Mitigation; usercopy/swapgs barriers and >> __user poi >> nter sanitization >> Spectre v2: Mitigation; Retpolines; IBPB conditional; >> IBRS_FW; >> STIBP conditional; RSB filling; PBRSB-eIBRS >> Not aff >> ected; BHI Not affected >> Srbds: Not affected >> Tsa: Not affected >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable >> Vmscape: Mitigation; IBPB before exit to userspace >> > It would be nice to get the idle states here, ideally how the states' usage changed > from base to revert. > The mentioned thread did this and should show how it can be done, but a dump of > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* > before and after the workload is usually fine to work with: > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ Bumping this as I discovered this issue on 6.12 stable branch also. The reapplication seems inevitable. I shall get back to you with these details also. Thanks & Regards, Harshvardhan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-13 7:06 ` Harshvardhan Jha @ 2026-01-13 14:13 ` Rafael J. Wysocki 2026-01-13 14:18 ` Rafael J. Wysocki 0 siblings, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2026-01-13 14:13 UTC (permalink / raw) To: Harshvardhan Jha Cc: Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Rafael J. Wysocki, Daniel Lezcano On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha <harshvardhan.j.jha@oracle.com> wrote: > > Hi Crhistian, > > On 08/12/25 6:17 PM, Christian Loehle wrote: > > On 12/8/25 11:33, Harshvardhan Jha wrote: > >> Hi Doug, > >> > >> On 04/12/25 4:00 AM, Doug Smythies wrote: > >>> On 2025.12.03 08:45 Christian Loehle wrote: > >>>> On 12/3/25 16:18, Harshvardhan Jha wrote: > >>>>> Hi there, > >>>>> > >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was > >>>>> observed that several regressions across different benchmarks is being > >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an > >>>>> automated bisect on both of them narrowed down the culprit commit to: > >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful > >>>>> information" for 5.15 > >>>>> > >>>>> Regressions on 5.15.196 include: > >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 > >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 > >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 > >>>>> thread & 1M buffer size on OnPrem X6-2 > >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & > >>>>> 1 thread & 1M buffer size on OnPrem X6-2 > >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 > >>>>> > >>>>> The culprit commits' messages mention that these reverts were done due > >>>>> to performance regressions introduced in Intel Jasper Lake systems but > >>>>> this revert is causing issues in other systems unfortunately. I wanted > >>>>> to know the maintainers' opinion on how we should proceed in order to > >>>>> fix this. If we reapply it'll bring back the previous regressions on > >>>>> Jasper Lake systems and if we don't revert it then it's stuck with > >>>>> current regressions. If this problem has been reported before and a fix > >>>>> is in the works then please let me know I shall follow developments to > >>>>> that mail thread. > >>>> The discussion regarding this can be found here: > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ > >>>> we explored an alternative to the full revert here: > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ > >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the > >>>> full revert you're seeing now. > >>>> > >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs > >>>> will highly depend on your platform, but also your workload, Jasper Lake > >>>> in particular seems to favor deep idle states even when they don't seem > >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so > >>>> we're kind of stuck with that. > >>>> > >>>> For teo we've discussed a tunable knob in the past, which comes naturally with > >>>> the logic, for menu there's nothing obvious that would be comparable. > >>>> But for teo such a knob didn't generate any further interest (so far). > >>>> > >>>> That's the status, unless I missed anything? > >>> By reading everything in the links Chrsitian provided, you can see > >>> that we had difficulties repeating test results on other platforms. > >>> > >>> Of the tests listed herein, the only one that was easy to repeat on my > >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): > >>> > >>> Kernel 6.18 Reverted > >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 > >>> performance performance performance performance > >>> test what ave ave ave ave > >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% > >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% > >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% > >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% > >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% > >>> > >>> Where: > >>> T/C means: Threads / Copies > >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. > >>> Data points are in Seconds. > >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. > >>> The reversion was manually applied to kernel 6.18-rc1 for that test. > >>> The reversion was included in kernel 6.18-rc3. > >>> Kernel 6.18-rc4 had another code change to menu.c > >>> > >>> In case the formatting gets messed up, the table is also attached. > >>> > >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. > >>> HWP: Enabled. > >> I was able to recover performance on 5.15 and 5.4 LTS based kernels > >> after reapplying the revert on X6-2 systems. > >> > >> Architecture: x86_64 > >> CPU op-mode(s): 32-bit, 64-bit > >> Address sizes: 46 bits physical, 48 bits virtual > >> Byte Order: Little Endian > >> CPU(s): 56 > >> On-line CPU(s) list: 0-55 > >> Vendor ID: GenuineIntel > >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz > >> CPU family: 6 > >> Model: 79 > >> Thread(s) per core: 2 > >> Core(s) per socket: 14 > >> Socket(s): 2 > >> Stepping: 1 > >> CPU(s) scaling MHz: 98% > >> CPU max MHz: 2600.0000 > >> CPU min MHz: 1200.0000 > >> BogoMIPS: 5188.26 > >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > >> mtrr pg > >> e mca cmov pat pse36 clflush dts acpi mmx > >> fxsr sse > >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp > >> lm cons > >> tant_tsc arch_perfmon pebs bts rep_good > >> nopl xtopol > >> ogy nonstop_tsc cpuid aperfmperf pni > >> pclmulqdq dtes > >> 64 monitor ds_cpl vmx smx est tm2 ssse3 > >> sdbg fma cx > >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic > >> movbe po > >> pcnt tsc_deadline_timer aes xsave avx f16c > >> rdrand l > >> ahf_lm abm 3dnowprefetch cpuid_fault epb > >> cat_l3 cdp > >> _l3 pti intel_ppin ssbd ibrs ibpb stibp > >> tpr_shadow > >> flexpriority ept vpid ept_ad fsgsbase > >> tsc_adjust bm > >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm > >> rdt_a rd > >> seed adx smap intel_pt xsaveopt cqm_llc > >> cqm_occup_l > >> lc cqm_mbm_total cqm_mbm_local dtherm arat > >> pln pts > >> vnmi md_clear flush_l1d > >> Virtualization features: > >> Virtualization: VT-x > >> Caches (sum of all): > >> L1d: 896 KiB (28 instances) > >> L1i: 896 KiB (28 instances) > >> L2: 7 MiB (28 instances) > >> L3: 70 MiB (2 instances) > >> NUMA: > >> NUMA node(s): 2 > >> NUMA node0 CPU(s): 0-13,28-41 > >> NUMA node1 CPU(s): 14-27,42-55 > >> Vulnerabilities: > >> Gather data sampling: Not affected > >> Indirect target selection: Not affected > >> Itlb multihit: KVM: Mitigation: Split huge pages > >> L1tf: Mitigation; PTE Inversion; VMX conditional > >> cache fl > >> ushes, SMT vulnerable > >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable > >> Meltdown: Mitigation; PTI > >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable > >> Reg file data sampling: Not affected > >> Retbleed: Not affected > >> Spec rstack overflow: Not affected > >> Spec store bypass: Mitigation; Speculative Store Bypass > >> disabled via p > >> rctl > >> Spectre v1: Mitigation; usercopy/swapgs barriers and > >> __user poi > >> nter sanitization > >> Spectre v2: Mitigation; Retpolines; IBPB conditional; > >> IBRS_FW; > >> STIBP conditional; RSB filling; PBRSB-eIBRS > >> Not aff > >> ected; BHI Not affected > >> Srbds: Not affected > >> Tsa: Not affected > >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable > >> Vmscape: Mitigation; IBPB before exit to userspace > >> > > It would be nice to get the idle states here, ideally how the states' usage changed > > from base to revert. > > The mentioned thread did this and should show how it can be done, but a dump of > > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* > > before and after the workload is usually fine to work with: > > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ > > Bumping this as I discovered this issue on 6.12 stable branch also. The > reapplication seems inevitable. I shall get back to you with these > details also. Yes, please, because I have another reason to restore the reverted commit. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-13 14:13 ` Rafael J. Wysocki @ 2026-01-13 14:18 ` Rafael J. Wysocki 2026-01-14 4:28 ` Sergey Senozhatsky 0 siblings, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2026-01-13 14:18 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Rafael J. Wysocki, Daniel Lezcano On Tue, Jan 13, 2026 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Jan 13, 2026 at 8:06 AM Harshvardhan Jha > <harshvardhan.j.jha@oracle.com> wrote: > > > > Hi Crhistian, > > > > On 08/12/25 6:17 PM, Christian Loehle wrote: > > > On 12/8/25 11:33, Harshvardhan Jha wrote: > > >> Hi Doug, > > >> > > >> On 04/12/25 4:00 AM, Doug Smythies wrote: > > >>> On 2025.12.03 08:45 Christian Loehle wrote: > > >>>> On 12/3/25 16:18, Harshvardhan Jha wrote: > > >>>>> Hi there, > > >>>>> > > >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was > > >>>>> observed that several regressions across different benchmarks is being > > >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an > > >>>>> automated bisect on both of them narrowed down the culprit commit to: > > >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful > > >>>>> information" for 5.15 > > >>>>> > > >>>>> Regressions on 5.15.196 include: > > >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 > > >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 > > >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 > > >>>>> thread & 1M buffer size on OnPrem X6-2 > > >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & > > >>>>> 1 thread & 1M buffer size on OnPrem X6-2 > > >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 > > >>>>> > > >>>>> The culprit commits' messages mention that these reverts were done due > > >>>>> to performance regressions introduced in Intel Jasper Lake systems but > > >>>>> this revert is causing issues in other systems unfortunately. I wanted > > >>>>> to know the maintainers' opinion on how we should proceed in order to > > >>>>> fix this. If we reapply it'll bring back the previous regressions on > > >>>>> Jasper Lake systems and if we don't revert it then it's stuck with > > >>>>> current regressions. If this problem has been reported before and a fix > > >>>>> is in the works then please let me know I shall follow developments to > > >>>>> that mail thread. > > >>>> The discussion regarding this can be found here: > > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ > > >>>> we explored an alternative to the full revert here: > > >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ > > >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the > > >>>> full revert you're seeing now. > > >>>> > > >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs > > >>>> will highly depend on your platform, but also your workload, Jasper Lake > > >>>> in particular seems to favor deep idle states even when they don't seem > > >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so > > >>>> we're kind of stuck with that. > > >>>> > > >>>> For teo we've discussed a tunable knob in the past, which comes naturally with > > >>>> the logic, for menu there's nothing obvious that would be comparable. > > >>>> But for teo such a knob didn't generate any further interest (so far). > > >>>> > > >>>> That's the status, unless I missed anything? > > >>> By reading everything in the links Chrsitian provided, you can see > > >>> that we had difficulties repeating test results on other platforms. > > >>> > > >>> Of the tests listed herein, the only one that was easy to repeat on my > > >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): > > >>> > > >>> Kernel 6.18 Reverted > > >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 > > >>> performance performance performance performance > > >>> test what ave ave ave ave > > >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% > > >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% > > >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% > > >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% > > >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% > > >>> > > >>> Where: > > >>> T/C means: Threads / Copies > > >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. > > >>> Data points are in Seconds. > > >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. > > >>> The reversion was manually applied to kernel 6.18-rc1 for that test. > > >>> The reversion was included in kernel 6.18-rc3. > > >>> Kernel 6.18-rc4 had another code change to menu.c > > >>> > > >>> In case the formatting gets messed up, the table is also attached. > > >>> > > >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. > > >>> HWP: Enabled. > > >> I was able to recover performance on 5.15 and 5.4 LTS based kernels > > >> after reapplying the revert on X6-2 systems. > > >> > > >> Architecture: x86_64 > > >> CPU op-mode(s): 32-bit, 64-bit > > >> Address sizes: 46 bits physical, 48 bits virtual > > >> Byte Order: Little Endian > > >> CPU(s): 56 > > >> On-line CPU(s) list: 0-55 > > >> Vendor ID: GenuineIntel > > >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz > > >> CPU family: 6 > > >> Model: 79 > > >> Thread(s) per core: 2 > > >> Core(s) per socket: 14 > > >> Socket(s): 2 > > >> Stepping: 1 > > >> CPU(s) scaling MHz: 98% > > >> CPU max MHz: 2600.0000 > > >> CPU min MHz: 1200.0000 > > >> BogoMIPS: 5188.26 > > >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > > >> mtrr pg > > >> e mca cmov pat pse36 clflush dts acpi mmx > > >> fxsr sse > > >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp > > >> lm cons > > >> tant_tsc arch_perfmon pebs bts rep_good > > >> nopl xtopol > > >> ogy nonstop_tsc cpuid aperfmperf pni > > >> pclmulqdq dtes > > >> 64 monitor ds_cpl vmx smx est tm2 ssse3 > > >> sdbg fma cx > > >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic > > >> movbe po > > >> pcnt tsc_deadline_timer aes xsave avx f16c > > >> rdrand l > > >> ahf_lm abm 3dnowprefetch cpuid_fault epb > > >> cat_l3 cdp > > >> _l3 pti intel_ppin ssbd ibrs ibpb stibp > > >> tpr_shadow > > >> flexpriority ept vpid ept_ad fsgsbase > > >> tsc_adjust bm > > >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm > > >> rdt_a rd > > >> seed adx smap intel_pt xsaveopt cqm_llc > > >> cqm_occup_l > > >> lc cqm_mbm_total cqm_mbm_local dtherm arat > > >> pln pts > > >> vnmi md_clear flush_l1d > > >> Virtualization features: > > >> Virtualization: VT-x > > >> Caches (sum of all): > > >> L1d: 896 KiB (28 instances) > > >> L1i: 896 KiB (28 instances) > > >> L2: 7 MiB (28 instances) > > >> L3: 70 MiB (2 instances) > > >> NUMA: > > >> NUMA node(s): 2 > > >> NUMA node0 CPU(s): 0-13,28-41 > > >> NUMA node1 CPU(s): 14-27,42-55 > > >> Vulnerabilities: > > >> Gather data sampling: Not affected > > >> Indirect target selection: Not affected > > >> Itlb multihit: KVM: Mitigation: Split huge pages > > >> L1tf: Mitigation; PTE Inversion; VMX conditional > > >> cache fl > > >> ushes, SMT vulnerable > > >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable > > >> Meltdown: Mitigation; PTI > > >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable > > >> Reg file data sampling: Not affected > > >> Retbleed: Not affected > > >> Spec rstack overflow: Not affected > > >> Spec store bypass: Mitigation; Speculative Store Bypass > > >> disabled via p > > >> rctl > > >> Spectre v1: Mitigation; usercopy/swapgs barriers and > > >> __user poi > > >> nter sanitization > > >> Spectre v2: Mitigation; Retpolines; IBPB conditional; > > >> IBRS_FW; > > >> STIBP conditional; RSB filling; PBRSB-eIBRS > > >> Not aff > > >> ected; BHI Not affected > > >> Srbds: Not affected > > >> Tsa: Not affected > > >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable > > >> Vmscape: Mitigation; IBPB before exit to userspace > > >> > > > It would be nice to get the idle states here, ideally how the states' usage changed > > > from base to revert. > > > The mentioned thread did this and should show how it can be done, but a dump of > > > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* > > > before and after the workload is usually fine to work with: > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The > > reapplication seems inevitable. I shall get back to you with these > > details also. > > Yes, please, because I have another reason to restore the reverted commit. Sergey, did you see a performance regression from 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") on any platforms other than the Jasper Lake it was reported for? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-13 14:18 ` Rafael J. Wysocki @ 2026-01-14 4:28 ` Sergey Senozhatsky 2026-01-14 4:49 ` Sergey Senozhatsky 0 siblings, 1 reply; 44+ messages in thread From: Sergey Senozhatsky @ 2026-01-14 4:28 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Sergey Senozhatsky, Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano Hi, On (26/01/13 15:18), Rafael J. Wysocki wrote: [..] > > > Bumping this as I discovered this issue on 6.12 stable branch also. The > > > reapplication seems inevitable. I shall get back to you with these > > > details also. > > > > Yes, please, because I have another reason to restore the reverted commit. > > Sergey, did you see a performance regression from 85975daeaa4d > ("cpuidle: menu: Avoid discarding useful information") on any > platforms other than the Jasper Lake it was reported for? Let me try to dig it up. I think I saw regressions on a number of devices: --- cpu family : 6 model : 122 model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz --- cpu family : 6 model : 122 model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz --- cpu family : 6 model : 156 model name : Intel(R) Celeron(R) N4500 @ 1.10GHz --- cpu family : 6 model : 156 model name : Intel(R) Celeron(R) N4500 @ 1.10GHz --- cpu family : 6 model : 156 model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz I guess family 6/model 122 is not Jasper Lake? I also saw some where the patch in question seemed to improve the metrics, but regressions are more important, so the revert simply put all of the boards back to the previous state. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-14 4:28 ` Sergey Senozhatsky @ 2026-01-14 4:49 ` Sergey Senozhatsky 2026-01-14 5:15 ` Tomasz Figa 2026-01-29 22:47 ` Doug Smythies 0 siblings, 2 replies; 44+ messages in thread From: Sergey Senozhatsky @ 2026-01-14 4:49 UTC (permalink / raw) To: Tomasz Figa Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky Cc-ing Tomasz On (26/01/14 13:28), Sergey Senozhatsky wrote: > Hi, > > On (26/01/13 15:18), Rafael J. Wysocki wrote: > [..] > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The > > > > reapplication seems inevitable. I shall get back to you with these > > > > details also. > > > > > > Yes, please, because I have another reason to restore the reverted commit. > > > > Sergey, did you see a performance regression from 85975daeaa4d > > ("cpuidle: menu: Avoid discarding useful information") on any > > platforms other than the Jasper Lake it was reported for? > > Let me try to dig it up. I think I saw regressions on a number of > devices: > > --- > cpu family : 6 > model : 122 > model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz > --- > cpu family : 6 > model : 122 > model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz > --- > cpu family : 6 > model : 156 > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz > --- > cpu family : 6 > model : 156 > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz > --- > cpu family : 6 > model : 156 > model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz > > > I guess family 6/model 122 is not Jasper Lake? > > I also saw some where the patch in question seemed to improve the > metrics, but regressions are more important, so the revert simply > put all of the boards back to the previous state. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-14 4:49 ` Sergey Senozhatsky @ 2026-01-14 5:15 ` Tomasz Figa 2026-01-14 20:07 ` Rafael J. Wysocki 2026-01-29 22:47 ` Doug Smythies 1 sibling, 1 reply; 44+ messages in thread From: Tomasz Figa @ 2026-01-14 5:15 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Rafael J. Wysocki, Harshvardhan Jha, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano Hi all, On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > Cc-ing Tomasz > > On (26/01/14 13:28), Sergey Senozhatsky wrote: > > Hi, > > > > On (26/01/13 15:18), Rafael J. Wysocki wrote: > > [..] > > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The > > > > > reapplication seems inevitable. I shall get back to you with these > > > > > details also. > > > > > > > > Yes, please, because I have another reason to restore the reverted commit. Is the performance difference the reporter observed an actual regression, or is it just a return to the level before the optimization was merged into stable branches? If the latter, shouldn't avoiding regressions be a priority over further optimizing for other users? If there is a really strong desire to reland this optimization, could it at least be applied selectively to the CPUs that it's known to help, or alternatively, made configurable? Best, Tomasz > > > > > > Sergey, did you see a performance regression from 85975daeaa4d > > > ("cpuidle: menu: Avoid discarding useful information") on any > > > platforms other than the Jasper Lake it was reported for? > > > > Let me try to dig it up. I think I saw regressions on a number of > > devices: > > > > --- > > cpu family : 6 > > model : 122 > > model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz > > --- > > cpu family : 6 > > model : 122 > > model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz > > --- > > cpu family : 6 > > model : 156 > > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz > > --- > > cpu family : 6 > > model : 156 > > model name : Intel(R) Celeron(R) N4500 @ 1.10GHz > > --- > > cpu family : 6 > > model : 156 > > model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz > > > > > > I guess family 6/model 122 is not Jasper Lake? > > > > I also saw some where the patch in question seemed to improve the > > metrics, but regressions are more important, so the revert simply > > put all of the boards back to the previous state. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-14 5:15 ` Tomasz Figa @ 2026-01-14 20:07 ` Rafael J. Wysocki 2026-01-29 10:23 ` Harshvardhan Jha 0 siblings, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2026-01-14 20:07 UTC (permalink / raw) To: Tomasz Figa, Harshvardhan Jha Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote: > > Hi all, > > On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky > <senozhatsky@chromium.org> wrote: > > > > Cc-ing Tomasz > > > > On (26/01/14 13:28), Sergey Senozhatsky wrote: > > > Hi, > > > > > > On (26/01/13 15:18), Rafael J. Wysocki wrote: > > > [..] > > > > > > Bumping this as I discovered this issue on 6.12 stable branch also. The > > > > > > reapplication seems inevitable. I shall get back to you with these > > > > > > details also. > > > > > > > > > > Yes, please, because I have another reason to restore the reverted commit. > > Is the performance difference the reporter observed an actual > regression, or is it just a return to the level before the > optimization was merged into stable branches? Good question. Harshvardhan, which one is the case? > If the latter, shouldn't avoiding regressions be a priority over further optimizing for other > users? > > If there is a really strong desire to reland this optimization, could > it at least be applied selectively to the CPUs that it's known to > help, or alternatively, made configurable? That wouldn't be easy in practice, but I think that it may be compensated by reducing the target residency values of the deepest idle states on those systems. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-14 20:07 ` Rafael J. Wysocki @ 2026-01-29 10:23 ` Harshvardhan Jha 0 siblings, 0 replies; 44+ messages in thread From: Harshvardhan Jha @ 2026-01-29 10:23 UTC (permalink / raw) To: Rafael J. Wysocki, Tomasz Figa Cc: Sergey Senozhatsky, Christian Loehle, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano Hi Rafael, On 15/01/26 1:37 AM, Rafael J. Wysocki wrote: > On Wed, Jan 14, 2026 at 6:16 AM Tomasz Figa <tfiga@chromium.org> wrote: >> Hi all, >> >> On Wed, Jan 14, 2026 at 1:49 PM Sergey Senozhatsky >> <senozhatsky@chromium.org> wrote: >>> Cc-ing Tomasz >>> >>> On (26/01/14 13:28), Sergey Senozhatsky wrote: >>>> Hi, >>>> >>>> On (26/01/13 15:18), Rafael J. Wysocki wrote: >>>> [..] >>>>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The >>>>>>> reapplication seems inevitable. I shall get back to you with these >>>>>>> details also. >>>>>> Yes, please, because I have another reason to restore the reverted commit. >> Is the performance difference the reporter observed an actual >> regression, or is it just a return to the level before the >> optimization was merged into stable branches? > Good question. > > Harshvardhan, which one is the case? The commit first introduced a performance improvement and then with the revert it returned back to the baseline. Harshvardhan > >> If the latter, shouldn't avoiding regressions be a priority over further optimizing for other >> users? >> >> If there is a really strong desire to reland this optimization, could >> it at least be applied selectively to the CPUs that it's known to >> help, or alternatively, made configurable? > That wouldn't be easy in practice, but I think that it may be > compensated by reducing the target residency values of the deepest > idle states on those systems. ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-14 4:49 ` Sergey Senozhatsky 2026-01-14 5:15 ` Tomasz Figa @ 2026-01-29 22:47 ` Doug Smythies 1 sibling, 0 replies; 44+ messages in thread From: Doug Smythies @ 2026-01-29 22:47 UTC (permalink / raw) To: 'Sergey Senozhatsky', 'Tomasz Figa' Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha', 'Christian Loehle', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies On 2026.01.13 20:50 Sergey Senozhatsky wrote: > Cc-ing Tomasz > > On (26/01/14 13:28), Sergey Senozhatsky wrote: >> Hi, >> >> On (26/01/13 15:18), Rafael J. Wysocki wrote: > [..] >>>>> Bumping this as I discovered this issue on 6.12 stable branch also. The >>>>> reapplication seems inevitable. I shall get back to you with these >>>>> details also. >>>> >>> > Yes, please, because I have another reason to restore the reverted commit. >>> >>> Sergey, did you see a performance regression from 85975daeaa4d >>> ("cpuidle: menu: Avoid discarding useful information") on any >>> platforms other than the Jasper Lake it was reported for? >> >> Let me try to dig it up. I think I saw regressions on a number of >> devices: >> >> --- >> cpu family : 6 >> model : 122 >> model name : Intel(R) Pentium(R) Silver N5000 CPU @ 1.10GHz >> --- >> cpu family : 6 >> model : 122 >> model name : Intel(R) Celeron(R) N4100 CPU @ 1.10GHz >> --- >> cpu family : 6 >> model : 156 >> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz >> --- >> cpu family : 6 >> model : 156 >> model name : Intel(R) Celeron(R) N4500 @ 1.10GHz >> --- >> cpu family : 6 >> model : 156 >> model name : Intel(R) Pentium(R) Silver N6000 @ 1.10GHz >> Those are all 6 watt TDP processors, the same as the earlier emails. We know from the turbostat data that 6 watts is exceeded in the test case and that it is likely that power limiting was involved. I don't think we ever got to the bottom of it for your case. I still would like to see test results where there is no chance of throttling being involved via reducing the maximum CPU frequency. >> >> I guess family 6/model 122 is not Jasper Lake? >> >> I also saw some where the patch in question seemed to improve the >> metrics, but regressions are more important, so the revert simply >> put all of the boards back to the previous state. In all of our testing we saw some minor regressions in addition to the improvements. Overall, the patch set it was deemed an improvement. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2025-12-08 12:47 ` Christian Loehle 2026-01-13 7:06 ` Harshvardhan Jha @ 2026-01-27 15:45 ` Harshvardhan Jha 2026-01-28 5:06 ` Doug Smythies 1 sibling, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2026-01-27 15:45 UTC (permalink / raw) To: Christian Loehle, Doug Smythies Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano' [-- Attachment #1: Type: text/plain, Size: 10297 bytes --] Hi Christian, On 08/12/25 6:17 PM, Christian Loehle wrote: > On 12/8/25 11:33, Harshvardhan Jha wrote: >> Hi Doug, >> >> On 04/12/25 4:00 AM, Doug Smythies wrote: >>> On 2025.12.03 08:45 Christian Loehle wrote: >>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>> Hi there, >>>>> >>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was >>>>> observed that several regressions across different benchmarks is being >>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an >>>>> automated bisect on both of them narrowed down the culprit commit to: >>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >>>>> information" for 5.15 >>>>> >>>>> Regressions on 5.15.196 include: >>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >>>>> thread & 1M buffer size on OnPrem X6-2 >>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >>>>> 1 thread & 1M buffer size on OnPrem X6-2 >>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >>>>> >>>>> The culprit commits' messages mention that these reverts were done due >>>>> to performance regressions introduced in Intel Jasper Lake systems but >>>>> this revert is causing issues in other systems unfortunately. I wanted >>>>> to know the maintainers' opinion on how we should proceed in order to >>>>> fix this. If we reapply it'll bring back the previous regressions on >>>>> Jasper Lake systems and if we don't revert it then it's stuck with >>>>> current regressions. If this problem has been reported before and a fix >>>>> is in the works then please let me know I shall follow developments to >>>>> that mail thread. >>>> The discussion regarding this can be found here: >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ >>>> we explored an alternative to the full revert here: >>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ >>>> unfortunately that didn't lead anywhere useful, so Rafael went with the >>>> full revert you're seeing now. >>>> >>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs >>>> will highly depend on your platform, but also your workload, Jasper Lake >>>> in particular seems to favor deep idle states even when they don't seem >>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so >>>> we're kind of stuck with that. >>>> >>>> For teo we've discussed a tunable knob in the past, which comes naturally with >>>> the logic, for menu there's nothing obvious that would be comparable. >>>> But for teo such a knob didn't generate any further interest (so far). >>>> >>>> That's the status, unless I missed anything? >>> By reading everything in the links Chrsitian provided, you can see >>> that we had difficulties repeating test results on other platforms. >>> >>> Of the tests listed herein, the only one that was easy to repeat on my >>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): >>> >>> Kernel 6.18 Reverted >>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 >>> performance performance performance performance >>> test what ave ave ave ave >>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% >>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% >>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% >>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% >>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% >>> >>> Where: >>> T/C means: Threads / Copies >>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. >>> Data points are in Seconds. >>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. >>> The reversion was manually applied to kernel 6.18-rc1 for that test. >>> The reversion was included in kernel 6.18-rc3. >>> Kernel 6.18-rc4 had another code change to menu.c >>> >>> In case the formatting gets messed up, the table is also attached. >>> >>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. >>> HWP: Enabled. >> I was able to recover performance on 5.15 and 5.4 LTS based kernels >> after reapplying the revert on X6-2 systems. >> >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Address sizes: 46 bits physical, 48 bits virtual >> Byte Order: Little Endian >> CPU(s): 56 >> On-line CPU(s) list: 0-55 >> Vendor ID: GenuineIntel >> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz >> CPU family: 6 >> Model: 79 >> Thread(s) per core: 2 >> Core(s) per socket: 14 >> Socket(s): 2 >> Stepping: 1 >> CPU(s) scaling MHz: 98% >> CPU max MHz: 2600.0000 >> CPU min MHz: 1200.0000 >> BogoMIPS: 5188.26 >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep >> mtrr pg >> e mca cmov pat pse36 clflush dts acpi mmx >> fxsr sse >> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp >> lm cons >> tant_tsc arch_perfmon pebs bts rep_good >> nopl xtopol >> ogy nonstop_tsc cpuid aperfmperf pni >> pclmulqdq dtes >> 64 monitor ds_cpl vmx smx est tm2 ssse3 >> sdbg fma cx >> 16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic >> movbe po >> pcnt tsc_deadline_timer aes xsave avx f16c >> rdrand l >> ahf_lm abm 3dnowprefetch cpuid_fault epb >> cat_l3 cdp >> _l3 pti intel_ppin ssbd ibrs ibpb stibp >> tpr_shadow >> flexpriority ept vpid ept_ad fsgsbase >> tsc_adjust bm >> i1 hle avx2 smep bmi2 erms invpcid rtm cqm >> rdt_a rd >> seed adx smap intel_pt xsaveopt cqm_llc >> cqm_occup_l >> lc cqm_mbm_total cqm_mbm_local dtherm arat >> pln pts >> vnmi md_clear flush_l1d >> Virtualization features: >> Virtualization: VT-x >> Caches (sum of all): >> L1d: 896 KiB (28 instances) >> L1i: 896 KiB (28 instances) >> L2: 7 MiB (28 instances) >> L3: 70 MiB (2 instances) >> NUMA: >> NUMA node(s): 2 >> NUMA node0 CPU(s): 0-13,28-41 >> NUMA node1 CPU(s): 14-27,42-55 >> Vulnerabilities: >> Gather data sampling: Not affected >> Indirect target selection: Not affected >> Itlb multihit: KVM: Mitigation: Split huge pages >> L1tf: Mitigation; PTE Inversion; VMX conditional >> cache fl >> ushes, SMT vulnerable >> Mds: Mitigation; Clear CPU buffers; SMT vulnerable >> Meltdown: Mitigation; PTI >> Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable >> Reg file data sampling: Not affected >> Retbleed: Not affected >> Spec rstack overflow: Not affected >> Spec store bypass: Mitigation; Speculative Store Bypass >> disabled via p >> rctl >> Spectre v1: Mitigation; usercopy/swapgs barriers and >> __user poi >> nter sanitization >> Spectre v2: Mitigation; Retpolines; IBPB conditional; >> IBRS_FW; >> STIBP conditional; RSB filling; PBRSB-eIBRS >> Not aff >> ected; BHI Not affected >> Srbds: Not affected >> Tsa: Not affected >> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable >> Vmscape: Mitigation; IBPB before exit to userspace >> > It would be nice to get the idle states here, ideally how the states' usage changed > from base to revert. > The mentioned thread did this and should show how it can be done, but a dump of > cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* > before and after the workload is usually fine to work with: > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ Apologies for the late reply, I'm attaching a tar ball which has the cpu states for the test suites before and after tests. The folders with the name of the test contain two folders good-kernel and bad-kernel containing two files having the before and after states. Please note that different machines were used for different test suites due to compatibility reasons. The jbb test was run using containers. Thanks & Regards, Harshvardhan [-- Attachment #2: cpuidle.tar.gz --] [-- Type: application/x-gzip, Size: 36882 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-27 15:45 ` Harshvardhan Jha @ 2026-01-28 5:06 ` Doug Smythies 2026-01-28 23:53 ` Doug Smythies 0 siblings, 1 reply; 44+ messages in thread From: Doug Smythies @ 2026-01-28 5:06 UTC (permalink / raw) To: 'Harshvardhan Jha', 'Christian Loehle' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano', Doug Smythies [-- Attachment #1: Type: text/plain, Size: 8076 bytes --] On 2026.01.27 07:45 Harshvardhan Jha wrote: >On 08/12/25 6:17 PM, Christian Loehle wrote: >> On 12/8/25 11:33, Harshvardhan Jha wrote: >>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>>> >>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was >>>>>> observed that several regressions across different benchmarks is being >>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an >>>>>> automated bisect on both of them narrowed down the culprit commit to: >>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful >>>>>> information" for 5.15 >>>>>> >>>>>> Regressions on 5.15.196 include: >>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2 >>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2 >>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1 >>>>>> thread & 1M buffer size on OnPrem X6-2 >>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth & >>>>>> 1 thread & 1M buffer size on OnPrem X6-2 >>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2 >>>>>> >>>>>> The culprit commits' messages mention that these reverts were done due >>>>>> to performance regressions introduced in Intel Jasper Lake systems but >>>>>> this revert is causing issues in other systems unfortunately. I wanted >>>>>> to know the maintainers' opinion on how we should proceed in order to >>>>>> fix this. If we reapply it'll bring back the previous regressions on >>>>>> Jasper Lake systems and if we don't revert it then it's stuck with >>>>>> current regressions. If this problem has been reported before and a fix >>>>>> is in the works then please let me know I shall follow developments to >>>>>> that mail thread. >>>>> The discussion regarding this can be found here: >>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$ >>>>> we explored an alternative to the full revert here: >>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$ >>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the >>>>> full revert you're seeing now. >>>>> >>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs >>>>> will highly depend on your platform, but also your workload, Jasper Lake >>>>> in particular seems to favor deep idle states even when they don't seem >>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so >>>>> we're kind of stuck with that. >>>>> >>>>> For teo we've discussed a tunable knob in the past, which comes naturally with >>>>> the logic, for menu there's nothing obvious that would be comparable. >>>>> But for teo such a knob didn't generate any further interest (so far). >>>>> >>>>> That's the status, unless I missed anything? >>>> By reading everything in the links Chrsitian provided, you can see >>>> that we had difficulties repeating test results on other platforms. >>>> >>>> Of the tests listed herein, the only one that was easy to repeat on my >>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference): >>>> >>>> Kernel 6.18 Reverted >>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3 >>>> performance performance performance performance >>>> test what ave ave ave ave >>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6% >>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1% >>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1% >>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3% >>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1% >>>> >>>> Where: >>>> T/C means: Threads / Copies >>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor. >>>> Data points are in Seconds. >>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10. >>>> The reversion was manually applied to kernel 6.18-rc1 for that test. >>>> The reversion was included in kernel 6.18-rc3. >>>> Kernel 6.18-rc4 had another code change to menu.c >>>> >>>> In case the formatting gets messed up, the table is also attached. >>>> >>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. >>>> HWP: Enabled. >>> I was able to recover performance on 5.15 and 5.4 LTS based kernels >>> after reapplying the revert on X6-2 systems. >>> >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Address sizes: 46 bits physical, 48 bits virtual >>> Byte Order: Little Endian >>> CPU(s): 56 >>> On-line CPU(s) list: 0-55 >>> Vendor ID: GenuineIntel >>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz >>> CPU family: 6 >>> Model: 79 >>> Thread(s) per core: 2 >>> Core(s) per socket: 14 >>> Socket(s): 2 >>> Stepping: 1 >>> CPU(s) scaling MHz: 98% >>> CPU max MHz: 2600.0000 >>> CPU min MHz: 1200.0000 >>> BogoMIPS: 5188.26 ... snip ... >> It would be nice to get the idle states here, ideally how the states' usage changed >> from base to revert. >> The mentioned thread did this and should show how it can be done, but a dump of >> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >> before and after the workload is usually fine to work with: >> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ > Apologies for the late reply, I'm attaching a tar ball which has the cpu > states for the test suites before and after tests. The folders with the > name of the test contain two folders good-kernel and bad-kernel > containing two files having the before and after states. Please note > that different machines were used for different test suites due to > compatibility reasons. The jbb test was run using containers. It is a considerable amount of work to manually extract and summarize the data. I have only done it for the phoronix-sqlite data. There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. I remember seeing a Linux-pm email about why but couldn't find it just now. Summary (also attached as a PNG file, in case the formatting gets messed up): The total idle entries (usage) and time seem low to me, which is why the ???. phoronix-sqlite Good Kernel: Time between samples 4 seconds (estimated and ???) Usage Above Below Above Below state 0 220 0 218 0.00% 99.09% state 1 70212 5213 34602 7.42% 49.28% state 2 30273 5237 1806 17.30% 5.97% state 3 0 0 0 0.00% 0.00% state 4 11824 2120 0 17.93% 0.00% total 112529 12570 36626 43.72% <<< Misses % Bad Kernel: Time between samples 3.8 seconds (estimated and ???) Usage Above Below Above Below state 0 262 0 260 0.00% 99.24% state 1 62751 3985 35588 6.35% 56.71% state 2 24941 7896 1433 31.66% 5.75% state 3 0 0 0 0.00% 0.00% state 4 24489 11543 0 47.14% 0.00% total 112443 23424 37281 53.99% <<< Misses % Observe 2X use of idle state 4 for the "Bad Kernel" I have a template now, and can summarize the other 40 CPU data faster, but I would have to rework the template for the 56 CPU data, and is it a 64 CPU data set at 4 idle states per CPU? ... Doug [-- Attachment #2: sqlite-summary.png --] [-- Type: image/png, Size: 37171 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-28 5:06 ` Doug Smythies @ 2026-01-28 23:53 ` Doug Smythies 2026-01-29 22:27 ` Doug Smythies 0 siblings, 1 reply; 44+ messages in thread From: Doug Smythies @ 2026-01-28 23:53 UTC (permalink / raw) To: 'Harshvardhan Jha', 'Christian Loehle' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano', Doug Smythies [-- Attachment #1: Type: text/plain, Size: 6136 bytes --] On 2026.01.27 21:07 Doug Smythies wrote: > On 2026.01.27 07:45 Harshvardhan Jha wrote: >> On 08/12/25 6:17 PM, Christian Loehle wrote: >>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: ... snip ... >>> It would be nice to get the idle states here, ideally how the states' usage changed >>> from base to revert. >>> The mentioned thread did this and should show how it can be done, but a dump of >>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>> before and after the workload is usually fine to work with: >>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >> Apologies for the late reply, I'm attaching a tar ball which has the cpu >> states for the test suites before and after tests. The folders with the >> name of the test contain two folders good-kernel and bad-kernel >> containing two files having the before and after states. Please note >> that different machines were used for different test suites due to >> compatibility reasons. The jbb test was run using containers. Please provide the results of the test runs that were done for the supplied before and after idle data. In particular, what is the "fio" test and it results. Its idle data is not very revealing. Is it a test I can run on my test computer? > It is a considerable amount of work to manually extract and summarize the data. > I have only done it for the phoronix-sqlite data. I have done the rest now, see below. I have also attached the results, in case the formatting gets screwed up. > There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. > I remember seeing a Linux-pm email about why but couldn't find it just now. > Summary (also attached as a PNG file, in case the formatting gets messed up): > The total idle entries (usage) and time seem low to me, which is why the ???. > > phoronix-sqlite > Good Kernel: Time between samples 4 seconds (estimated and ???) > Usage Above Below Above Below > state 0 220 0 218 0.00% 99.09% > state 1 70212 5213 34602 7.42% 49.28% > state 2 30273 5237 1806 17.30% 5.97% > state 3 0 0 0 0.00% 0.00% > state 4 11824 2120 0 17.93% 0.00% > > total 112529 12570 36626 43.72% <<< Misses % > > Bad Kernel: Time between samples 3.8 seconds (estimated and ???) > Usage Above Below Above Below > state 0 262 0 260 0.00% 99.24% > state 1 62751 3985 35588 6.35% 56.71% > state 2 24941 7896 1433 31.66% 5.75% > state 3 0 0 0 0.00% 0.00% > state 4 24489 11543 0 47.14% 0.00% > > total 112443 23424 37281 53.99% <<< Misses % > > Observe 2X use of idle state 4 for the "Bad Kernel" > > I have a template now, and can summarize the other 40 CPU data > faster, but I would have to rework the template for the 56 CPU data, > and is it a 64 CPU data set at 4 idle states per CPU? jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. POLL, C1, C1E, C3 (disabled), C6 Good Kernel: Time between samples > 2 hours (estimated) Usage Above Below Above Below state 0 297550 0 296084 0.00% 99.51% state 1 8062854 341043 4962635 4.23% 61.55% state 2 56708358 12688379 6252051 22.37% 11.02% state 3 0 0 0 0.00% 0.00% state 4 54624476 15868752 0 29.05% 0.00% total 119693238 28898174 11510770 33.76% <<< Misses Bad Kernel: Time between samples > 2 hours (estimated) Usage Above Below Above Below state 0 90715 0 75134 0.00% 82.82% state 1 8878738 312970 6082180 3.52% 68.50% state 2 12048728 2576251 603316 21.38% 5.01% state 3 0 0 0 0.00% 0.00% state 4 85999424 44723273 0 52.00% 0.00% total 107017605 47612494 6760630 50.81% <<< Misses As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. fio Good Kernel: Time between samples ~ 1 minute (estimated) Usage Above Below Above Below state 0 3822 0 3818 0.00% 99.90% state 1 148640 4406 68956 2.96% 46.39% state 2 593455 45344 105675 7.64% 17.81% state 3 3209648 807014 0 25.14% 0.00% total 3955565 856764 178449 26.17% <<< Misses Bad Kernel: Time between samples ~ 1 minute (estimated) Usage Above Below Above Below state 0 916 0 756 0.00% 82.53% state 1 80230 2028 42791 2.53% 53.34% state 2 59231 6888 6791 11.63% 11.47% state 3 2455784 564797 0 23.00% 0.00% total 2596161 573713 50338 24.04% <<< Misses It is not clear why the number of idle entries differs so much between the tests, but there is a bit of a different distribution of the workload among the CPUs. rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. POLL, C1, C1E, C3 (disabled), C6 rds-stress-test Good Kernel: Time between samples ~70 Seconds (estimated) Usage Above Below Above Below state 0 1561 0 1435 0.00% 91.93% state 1 13855 899 2410 6.49% 17.39% state 2 467998 139254 23679 29.76% 5.06% state 3 0 0 0 0.00% 0.00% state 4 213132 107417 0 50.40% 0.00% total 696546 247570 27524 39.49% <<< Misses Bad Kernel: Time between samples ~ 70 Seconds (estimated) Usage Above Below Above Below state 0 231 0 231 0.00% 100.00% state 1 5413 266 1186 4.91% 21.91% state 2 54365 719 3789 1.32% 6.97% state 3 0 0 0 0.00% 0.00% state 4 267055 148327 0 55.54% 0.00% total 327064 149312 5206 47.24% <<< Misses Again, differing numbers of idle entries between tests. This time the load distribution between CPUs is more obvious. In the "Bad" case most work is done on 2 or 3 CPU's. In the "Good" case the work is distributed over more CPUs. I assume without proof, that the scheduler is deciding not to migrate the next bit of work to another CPU in the one case verses the other. ... Doug [-- Attachment #2: sqlite-summary.png --] [-- Type: image/png, Size: 37171 bytes --] [-- Attachment #3: jbb-summary.png --] [-- Type: image/png, Size: 38778 bytes --] [-- Attachment #4: fio-summary.png --] [-- Type: image/png, Size: 34474 bytes --] [-- Attachment #5: rds-stress-summary.png --] [-- Type: image/png, Size: 37087 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-28 23:53 ` Doug Smythies @ 2026-01-29 22:27 ` Doug Smythies 2026-01-30 19:28 ` Rafael J. Wysocki 0 siblings, 1 reply; 44+ messages in thread From: Doug Smythies @ 2026-01-29 22:27 UTC (permalink / raw) To: 'Harshvardhan Jha', 'Christian Loehle' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Rafael J. Wysocki', 'Daniel Lezcano', Doug Smythies [-- Attachment #1: Type: text/plain, Size: 6908 bytes --] On 2026.01.28 15:53 Doug Smythies wrote: > On 2026.01.27 21:07 Doug Smythies wrote: >> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: > ... snip ... > >>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>> from base to revert. >>>> The mentioned thread did this and should show how it can be done, but a dump of >>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>> before and after the workload is usually fine to work with: >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ > >>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>> states for the test suites before and after tests. The folders with the >>> name of the test contain two folders good-kernel and bad-kernel >>> containing two files having the before and after states. Please note >>> that different machines were used for different test suites due to >>> compatibility reasons. The jbb test was run using containers. > > Please provide the results of the test runs that were done for > the supplied before and after idle data. > In particular, what is the "fio" test and it results. Its idle data is not very revealing. > Is it a test I can run on my test computer? I see that I have fio installed on my test computer. >> It is a considerable amount of work to manually extract and summarize the data. >> I have only done it for the phoronix-sqlite data. > > I have done the rest now, see below. > I have also attached the results, in case the formatting gets screwed up. > >> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >> I remember seeing a Linux-pm email about why but couldn't find it just now. >> Summary (also attached as a PNG file, in case the formatting gets messed up): >> The total idle entries (usage) and time seem low to me, which is why the ???. >> >> phoronix-sqlite >> Good Kernel: Time between samples 4 seconds (estimated and ???) >> Usage Above Below Above Below >> state 0 220 0 218 0.00% 99.09% >> state 1 70212 5213 34602 7.42% 49.28% >> state 2 30273 5237 1806 17.30% 5.97% >> state 3 0 0 0 0.00% 0.00% >> state 4 11824 2120 0 17.93% 0.00% >> >> total 112529 12570 36626 43.72% <<< Misses % >> >> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >> Usage Above Below Above Below >> state 0 262 0 260 0.00% 99.24% >> state 1 62751 3985 35588 6.35% 56.71% >> state 2 24941 7896 1433 31.66% 5.75% >> state 3 0 0 0 0.00% 0.00% >> state 4 24489 11543 0 47.14% 0.00% >> >> total 112443 23424 37281 53.99% <<< Misses % >> >> Observe 2X use of idle state 4 for the "Bad Kernel" >> >> I have a template now, and can summarize the other 40 CPU data >> faster, but I would have to rework the template for the 56 CPU data, >> and is it a 64 CPU data set at 4 idle states per CPU? > > jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. > POLL, C1, C1E, C3 (disabled), C6 > > Good Kernel: Time between samples > 2 hours (estimated) > Usage Above Below Above Below > state 0 297550 0 296084 0.00% 99.51% > state 1 8062854 341043 4962635 4.23% 61.55% > state 2 56708358 12688379 6252051 22.37% 11.02% > state 3 0 0 0 0.00% 0.00% > state 4 54624476 15868752 0 29.05% 0.00% > > total 119693238 28898174 11510770 33.76% <<< Misses > > Bad Kernel: Time between samples > 2 hours (estimated) > Usage Above Below Above Below > state 0 90715 0 75134 0.00% 82.82% > state 1 8878738 312970 6082180 3.52% 68.50% > state 2 12048728 2576251 603316 21.38% 5.01% > state 3 0 0 0 0.00% 0.00% > state 4 85999424 44723273 0 52.00% 0.00% > > total 107017605 47612494 6760630 50.81% <<< Misses > > As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" > > fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. > > fio > Good Kernel: Time between samples ~ 1 minute (estimated) > Usage Above Below Above Below > state 0 3822 0 3818 0.00% 99.90% > state 1 148640 4406 68956 2.96% 46.39% > state 2 593455 45344 105675 7.64% 17.81% > state 3 3209648 807014 0 25.14% 0.00% > > total 3955565 856764 178449 26.17% <<< Misses > > Bad Kernel: Time between samples ~ 1 minute (estimated) > Usage Above Below Above Below > state 0 916 0 756 0.00% 82.53% > state 1 80230 2028 42791 2.53% 53.34% > state 2 59231 6888 6791 11.63% 11.47% > state 3 2455784 564797 0 23.00% 0.00% > > total 2596161 573713 50338 24.04% <<< Misses > > It is not clear why the number of idle entries differs so much > between the tests, but there is a bit of a different distribution > of the workload among the CPUs. > > rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. > POLL, C1, C1E, C3 (disabled), C6 > > rds-stress-test > Good Kernel: Time between samples ~70 Seconds (estimated) > Usage Above Below Above Below > state 0 1561 0 1435 0.00% 91.93% > state 1 13855 899 2410 6.49% 17.39% > state 2 467998 139254 23679 29.76% 5.06% > state 3 0 0 0 0.00% 0.00% > state 4 213132 107417 0 50.40% 0.00% > > total 696546 247570 27524 39.49% <<< Misses > > Bad Kernel: Time between samples ~ 70 Seconds (estimated) > Usage Above Below Above Below > state 0 231 0 231 0.00% 100.00% > state 1 5413 266 1186 4.91% 21.91% > state 2 54365 719 3789 1.32% 6.97% > state 3 0 0 0 0.00% 0.00% > state 4 267055 148327 0 55.54% 0.00% > > total 327064 149312 5206 47.24% <<< Misses > > Again, differing numbers of idle entries between tests. > This time the load distribution between CPUs is more > obvious. In the "Bad" case most work is done on 2 or 3 CPU's. > In the "Good" case the work is distributed over more CPUs. > I assume without proof, that the scheduler is deciding not to migrate > the next bit of work to another CPU in the one case verses the other. The above is incorrect. The CPUs involved between the "Good" and "Bad" tests are very similar, mainly 2 CPUs with a little of a 3rd and 4th. See the attached graph for more detail / clarity. All of the tests show higher usage of shallower idle states with the "Good" verses the "Bad", which was the expectation of the original patch, as has been mentioned a few times in the emails. My input is to revert the reversion. ... Doug [-- Attachment #2: usage-v-cpu-v-state.png --] [-- Type: image/png, Size: 104684 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-29 22:27 ` Doug Smythies @ 2026-01-30 19:28 ` Rafael J. Wysocki 2026-02-01 19:20 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2026-01-30 19:28 UTC (permalink / raw) To: Doug Smythies, Christian Loehle Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Rafael J. Wysocki, Daniel Lezcano On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: > > On 2026.01.28 15:53 Doug Smythies wrote: > > On 2026.01.27 21:07 Doug Smythies wrote: > >> On 2026.01.27 07:45 Harshvardhan Jha wrote: > >>> On 08/12/25 6:17 PM, Christian Loehle wrote: > >>>> On 12/8/25 11:33, Harshvardhan Jha wrote: > >>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: > >>>>>> On 2025.12.03 08:45 Christian Loehle wrote: > >>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: > > ... snip ... > > > >>>> It would be nice to get the idle states here, ideally how the states' usage changed > >>>> from base to revert. > >>>> The mentioned thread did this and should show how it can be done, but a dump of > >>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* > >>>> before and after the workload is usually fine to work with: > >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ > > > >>> Apologies for the late reply, I'm attaching a tar ball which has the cpu > >>> states for the test suites before and after tests. The folders with the > >>> name of the test contain two folders good-kernel and bad-kernel > >>> containing two files having the before and after states. Please note > >>> that different machines were used for different test suites due to > >>> compatibility reasons. The jbb test was run using containers. > > > > Please provide the results of the test runs that were done for > > the supplied before and after idle data. > > In particular, what is the "fio" test and it results. Its idle data is not very revealing. > > Is it a test I can run on my test computer? > > I see that I have fio installed on my test computer. > > >> It is a considerable amount of work to manually extract and summarize the data. > >> I have only done it for the phoronix-sqlite data. > > > > I have done the rest now, see below. > > I have also attached the results, in case the formatting gets screwed up. > > > >> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. > >> I remember seeing a Linux-pm email about why but couldn't find it just now. > >> Summary (also attached as a PNG file, in case the formatting gets messed up): > >> The total idle entries (usage) and time seem low to me, which is why the ???. > >> > >> phoronix-sqlite > >> Good Kernel: Time between samples 4 seconds (estimated and ???) > >> Usage Above Below Above Below > >> state 0 220 0 218 0.00% 99.09% > >> state 1 70212 5213 34602 7.42% 49.28% > >> state 2 30273 5237 1806 17.30% 5.97% > >> state 3 0 0 0 0.00% 0.00% > >> state 4 11824 2120 0 17.93% 0.00% > >> > >> total 112529 12570 36626 43.72% <<< Misses % > >> > >> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) > >> Usage Above Below Above Below > >> state 0 262 0 260 0.00% 99.24% > >> state 1 62751 3985 35588 6.35% 56.71% > >> state 2 24941 7896 1433 31.66% 5.75% > >> state 3 0 0 0 0.00% 0.00% > >> state 4 24489 11543 0 47.14% 0.00% > >> > >> total 112443 23424 37281 53.99% <<< Misses % > >> > >> Observe 2X use of idle state 4 for the "Bad Kernel" > >> > >> I have a template now, and can summarize the other 40 CPU data > >> faster, but I would have to rework the template for the 56 CPU data, > >> and is it a 64 CPU data set at 4 idle states per CPU? > > > > jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. > > POLL, C1, C1E, C3 (disabled), C6 > > > > Good Kernel: Time between samples > 2 hours (estimated) > > Usage Above Below Above Below > > state 0 297550 0 296084 0.00% 99.51% > > state 1 8062854 341043 4962635 4.23% 61.55% > > state 2 56708358 12688379 6252051 22.37% 11.02% > > state 3 0 0 0 0.00% 0.00% > > state 4 54624476 15868752 0 29.05% 0.00% > > > > total 119693238 28898174 11510770 33.76% <<< Misses > > > > Bad Kernel: Time between samples > 2 hours (estimated) > > Usage Above Below Above Below > > state 0 90715 0 75134 0.00% 82.82% > > state 1 8878738 312970 6082180 3.52% 68.50% > > state 2 12048728 2576251 603316 21.38% 5.01% > > state 3 0 0 0 0.00% 0.00% > > state 4 85999424 44723273 0 52.00% 0.00% > > > > total 107017605 47612494 6760630 50.81% <<< Misses > > > > As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" > > > > fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. > > > > fio > > Good Kernel: Time between samples ~ 1 minute (estimated) > > Usage Above Below Above Below > > state 0 3822 0 3818 0.00% 99.90% > > state 1 148640 4406 68956 2.96% 46.39% > > state 2 593455 45344 105675 7.64% 17.81% > > state 3 3209648 807014 0 25.14% 0.00% > > > > total 3955565 856764 178449 26.17% <<< Misses > > > > Bad Kernel: Time between samples ~ 1 minute (estimated) > > Usage Above Below Above Below > > state 0 916 0 756 0.00% 82.53% > > state 1 80230 2028 42791 2.53% 53.34% > > state 2 59231 6888 6791 11.63% 11.47% > > state 3 2455784 564797 0 23.00% 0.00% > > > > total 2596161 573713 50338 24.04% <<< Misses > > > > It is not clear why the number of idle entries differs so much > > between the tests, but there is a bit of a different distribution > > of the workload among the CPUs. > > > > rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. > > POLL, C1, C1E, C3 (disabled), C6 > > > > rds-stress-test > > Good Kernel: Time between samples ~70 Seconds (estimated) > > Usage Above Below Above Below > > state 0 1561 0 1435 0.00% 91.93% > > state 1 13855 899 2410 6.49% 17.39% > > state 2 467998 139254 23679 29.76% 5.06% > > state 3 0 0 0 0.00% 0.00% > > state 4 213132 107417 0 50.40% 0.00% > > > > total 696546 247570 27524 39.49% <<< Misses > > > > Bad Kernel: Time between samples ~ 70 Seconds (estimated) > > Usage Above Below Above Below > > state 0 231 0 231 0.00% 100.00% > > state 1 5413 266 1186 4.91% 21.91% > > state 2 54365 719 3789 1.32% 6.97% > > state 3 0 0 0 0.00% 0.00% > > state 4 267055 148327 0 55.54% 0.00% > > > > total 327064 149312 5206 47.24% <<< Misses > > > > Again, differing numbers of idle entries between tests. > > This time the load distribution between CPUs is more > > obvious. In the "Bad" case most work is done on 2 or 3 CPU's. > > In the "Good" case the work is distributed over more CPUs. > > I assume without proof, that the scheduler is deciding not to migrate > > the next bit of work to another CPU in the one case verses the other. > > The above is incorrect. The CPUs involved between the "Good" > and "Bad" tests are very similar, mainly 2 CPUs with a little of > a 3rd and 4th. See the attached graph for more detail / clarity. > > All of the tests show higher usage of shallower idle states with > the "Good" verses the "Bad", which was the expectation of the > original patch, as has been mentioned a few times in the emails. > > My input is to revert the reversion. OK, noted, thanks! Christian, what do you think? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-01-30 19:28 ` Rafael J. Wysocki @ 2026-02-01 19:20 ` Christian Loehle 2026-02-02 17:31 ` Harshvardhan Jha 0 siblings, 1 reply; 44+ messages in thread From: Christian Loehle @ 2026-02-01 19:20 UTC (permalink / raw) To: Rafael J. Wysocki, Doug Smythies Cc: Harshvardhan Jha, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky [-- Attachment #1: Type: text/plain, Size: 10013 bytes --] On 1/30/26 19:28, Rafael J. Wysocki wrote: > On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >> >> On 2026.01.28 15:53 Doug Smythies wrote: >>> On 2026.01.27 21:07 Doug Smythies wrote: >>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>> ... snip ... >>> >>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>> from base to revert. >>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>> before and after the workload is usually fine to work with: >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>> >>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>> states for the test suites before and after tests. The folders with the >>>>> name of the test contain two folders good-kernel and bad-kernel >>>>> containing two files having the before and after states. Please note >>>>> that different machines were used for different test suites due to >>>>> compatibility reasons. The jbb test was run using containers. >>> >>> Please provide the results of the test runs that were done for >>> the supplied before and after idle data. >>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>> Is it a test I can run on my test computer? >> >> I see that I have fio installed on my test computer. >> >>>> It is a considerable amount of work to manually extract and summarize the data. >>>> I have only done it for the phoronix-sqlite data. >>> >>> I have done the rest now, see below. >>> I have also attached the results, in case the formatting gets screwed up. >>> >>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>> >>>> phoronix-sqlite >>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>> Usage Above Below Above Below >>>> state 0 220 0 218 0.00% 99.09% >>>> state 1 70212 5213 34602 7.42% 49.28% >>>> state 2 30273 5237 1806 17.30% 5.97% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 11824 2120 0 17.93% 0.00% >>>> >>>> total 112529 12570 36626 43.72% <<< Misses % >>>> >>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>> Usage Above Below Above Below >>>> state 0 262 0 260 0.00% 99.24% >>>> state 1 62751 3985 35588 6.35% 56.71% >>>> state 2 24941 7896 1433 31.66% 5.75% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 24489 11543 0 47.14% 0.00% >>>> >>>> total 112443 23424 37281 53.99% <<< Misses % >>>> >>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>> >>>> I have a template now, and can summarize the other 40 CPU data >>>> faster, but I would have to rework the template for the 56 CPU data, >>>> and is it a 64 CPU data set at 4 idle states per CPU? >>> >>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>> POLL, C1, C1E, C3 (disabled), C6 >>> >>> Good Kernel: Time between samples > 2 hours (estimated) >>> Usage Above Below Above Below >>> state 0 297550 0 296084 0.00% 99.51% >>> state 1 8062854 341043 4962635 4.23% 61.55% >>> state 2 56708358 12688379 6252051 22.37% 11.02% >>> state 3 0 0 0 0.00% 0.00% >>> state 4 54624476 15868752 0 29.05% 0.00% >>> >>> total 119693238 28898174 11510770 33.76% <<< Misses >>> >>> Bad Kernel: Time between samples > 2 hours (estimated) >>> Usage Above Below Above Below >>> state 0 90715 0 75134 0.00% 82.82% >>> state 1 8878738 312970 6082180 3.52% 68.50% >>> state 2 12048728 2576251 603316 21.38% 5.01% >>> state 3 0 0 0 0.00% 0.00% >>> state 4 85999424 44723273 0 52.00% 0.00% >>> >>> total 107017605 47612494 6760630 50.81% <<< Misses >>> >>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>> >>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>> >>> fio >>> Good Kernel: Time between samples ~ 1 minute (estimated) >>> Usage Above Below Above Below >>> state 0 3822 0 3818 0.00% 99.90% >>> state 1 148640 4406 68956 2.96% 46.39% >>> state 2 593455 45344 105675 7.64% 17.81% >>> state 3 3209648 807014 0 25.14% 0.00% >>> >>> total 3955565 856764 178449 26.17% <<< Misses >>> >>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>> Usage Above Below Above Below >>> state 0 916 0 756 0.00% 82.53% >>> state 1 80230 2028 42791 2.53% 53.34% >>> state 2 59231 6888 6791 11.63% 11.47% >>> state 3 2455784 564797 0 23.00% 0.00% >>> >>> total 2596161 573713 50338 24.04% <<< Misses >>> >>> It is not clear why the number of idle entries differs so much >>> between the tests, but there is a bit of a different distribution >>> of the workload among the CPUs. >>> >>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>> POLL, C1, C1E, C3 (disabled), C6 >>> >>> rds-stress-test >>> Good Kernel: Time between samples ~70 Seconds (estimated) >>> Usage Above Below Above Below >>> state 0 1561 0 1435 0.00% 91.93% >>> state 1 13855 899 2410 6.49% 17.39% >>> state 2 467998 139254 23679 29.76% 5.06% >>> state 3 0 0 0 0.00% 0.00% >>> state 4 213132 107417 0 50.40% 0.00% >>> >>> total 696546 247570 27524 39.49% <<< Misses >>> >>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>> Usage Above Below Above Below >>> state 0 231 0 231 0.00% 100.00% >>> state 1 5413 266 1186 4.91% 21.91% >>> state 2 54365 719 3789 1.32% 6.97% >>> state 3 0 0 0 0.00% 0.00% >>> state 4 267055 148327 0 55.54% 0.00% >>> >>> total 327064 149312 5206 47.24% <<< Misses >>> >>> Again, differing numbers of idle entries between tests. >>> This time the load distribution between CPUs is more >>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>> In the "Good" case the work is distributed over more CPUs. >>> I assume without proof, that the scheduler is deciding not to migrate >>> the next bit of work to another CPU in the one case verses the other. >> >> The above is incorrect. The CPUs involved between the "Good" >> and "Bad" tests are very similar, mainly 2 CPUs with a little of >> a 3rd and 4th. See the attached graph for more detail / clarity. >> >> All of the tests show higher usage of shallower idle states with >> the "Good" verses the "Bad", which was the expectation of the >> original patch, as has been mentioned a few times in the emails. >> >> My input is to revert the reversion. > > OK, noted, thanks! > > Christian, what do you think? I've attached readable diffs of the values provided the tldr is: +--------------------+-----------+-----------+ | Workload | Δ above % | Δ below % | +--------------------+-----------+-----------+ | fio | -10.11 | +2.36 | | rds-stress-test | -0.44 | +2.57 | | jbb | -20.35 | +3.30 | | phoronix-sqlite | -9.66 | -0.61 | +--------------------+-----------+-----------+ I think the overall trend however is clear, the commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") improved menu on many systems and workloads, I'd dare to say most. Even on the reported regression introduced by it, the cpuidle governor performed better on paper, system metrics regressed because other CPUs' P-states weren't available due to being in a shallower state. https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ (+CC Sergey) It could be argued that this is a limitation of a per-CPU cpuidle governor and a more holistic approach would be needed for that platform (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, skew towards deeper cpuidle states). I also think that the change made sense, for small residency values with a bit of random noise mixed in, performing the same statistical test doesn't seem sensible, the short intervals will look noisier. So options are: 1. Revert revert on mainline+stable 2. Revert revert on mainline only 3. Keep revert, miss out on the improvement for many. 4. Revert only when we have a good solution for the platforms like Sergey's. I'd lean towards 2 because 4 won't be easy, unless of course a minor hack like playing with the deep idle state residency values would be enough to mitigate. [-- Attachment #2: diff_rds_stress-test.txt --] [-- Type: text/plain, Size: 3573 bytes --] bad-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 231 usage : 231 time : 4420 avg_time_us_per_usage : 19.134199134199132 above_pct_over_usage : 0.0 below_pct_over_usage : 100.0 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 266 below : 1186 usage : 5413 time : 264365 avg_time_us_per_usage : 48.838906336597084 above_pct_over_usage : 4.914095695547755 below_pct_over_usage : 21.91021614631443 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 719 below : 3789 usage : 54365 time : 11979213 avg_time_us_per_usage : 220.34788926699164 above_pct_over_usage : 1.3225420767037617 below_pct_over_usage : 6.969557619792146 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 148327 below : 0 usage : 267055 time : 3728423124 avg_time_us_per_usage : 13961.255636479376 above_pct_over_usage : 55.54174233772069 below_pct_over_usage : 0.0 OVERALL above : 149312 below : 5206 usage : 327064 time : 3740671122 avg_time_us_per_usage : 11437.122771078444 above_pct_over_usage : 45.6522270870533 below_pct_over_usage : 1.5917373969620625 good-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 1435 usage : 1561 time : 23997 avg_time_us_per_usage : 15.372837924407431 above_pct_over_usage : 0.0 below_pct_over_usage : 91.92825112107623 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 899 below : 2410 usage : 13855 time : 580895 avg_time_us_per_usage : 41.9267412486467 above_pct_over_usage : 6.4886322627210395 below_pct_over_usage : 17.39444243955251 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 139254 below : 23679 usage : 467998 time : 71044825 avg_time_us_per_usage : 151.80583036679644 above_pct_over_usage : 29.755255364339163 below_pct_over_usage : 5.059637006995756 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 107417 below : 0 usage : 213132 time : 3670745602 avg_time_us_per_usage : 17222.87409680386 above_pct_over_usage : 50.399283073400525 below_pct_over_usage : 0.0 OVERALL above : 247570 below : 27524 usage : 696546 time : 3742395319 avg_time_us_per_usage : 5372.789907629934 above_pct_over_usage : 35.54251980486573 below_pct_over_usage : 3.9514978192395045 [-- Attachment #3: diff_fio.txt --] [-- Type: text/plain, Size: 3087 bytes --] bad-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 756 usage : 916 time : 13304 avg_time_us_per_usage : 14.524017467248909 above_pct_over_usage : 0.0 below_pct_over_usage : 82.53275109170306 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 2028 below : 42791 usage : 80230 time : 14121043 avg_time_us_per_usage : 176.00701732519008 above_pct_over_usage : 2.527732768291163 below_pct_over_usage : 53.33541069425402 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 6888 below : 6791 usage : 59231 time : 9518489 avg_time_us_per_usage : 160.70113622933937 above_pct_over_usage : 11.629045601121035 below_pct_over_usage : 11.465280005402576 index3 C6 latency=92 residency=276 desc='MWAIT 0x20' above : 564797 below : 0 usage : 2455784 time : 3656232645 avg_time_us_per_usage : 1488.8250127047004 above_pct_over_usage : 22.99864320314816 below_pct_over_usage : 0.0 OVERALL above : 573713 below : 50338 usage : 2596161 time : 3679885481 avg_time_us_per_usage : 1417.4334646426012 above_pct_over_usage : 22.09851392113201 below_pct_over_usage : 1.9389398423287307 good-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 3818 usage : 3822 time : 84618 avg_time_us_per_usage : 22.139717425431712 above_pct_over_usage : 0.0 below_pct_over_usage : 99.8953427524856 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 4406 below : 68956 usage : 148640 time : 22527422 avg_time_us_per_usage : 151.55692949407967 above_pct_over_usage : 2.9642088266953714 below_pct_over_usage : 46.39128094725511 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 45344 below : 105675 usage : 593455 time : 121006522 avg_time_us_per_usage : 203.9017650874961 above_pct_over_usage : 7.640680422272961 below_pct_over_usage : 17.806741875963638 index3 C6 latency=92 residency=276 desc='MWAIT 0x20' above : 807014 below : 0 usage : 3209648 time : 3510698554 avg_time_us_per_usage : 1093.7955046783945 above_pct_over_usage : 25.14338020867086 below_pct_over_usage : 0.0 OVERALL above : 856764 below : 178449 usage : 3955565 time : 3654317116 avg_time_us_per_usage : 923.8420089165518 above_pct_over_usage : 21.65971232933854 below_pct_over_usage : 4.511340352136799 [-- Attachment #4: diff_jbb.txt --] [-- Type: text/plain, Size: 3681 bytes --] jbb: bad-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 75134 usage : 90715 time : 1563844 avg_time_us_per_usage : 17.239089455988534 above_pct_over_usage : 0.0 below_pct_over_usage : 82.82422973047456 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 312970 below : 6082180 usage : 8878738 time : 1015443901 avg_time_us_per_usage : 114.3680443099008 above_pct_over_usage : 3.5249378909480154 below_pct_over_usage : 68.50275343185034 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 2576251 below : 603316 usage : 12048728 time : 1478419300 avg_time_us_per_usage : 122.70335092633844 above_pct_over_usage : 21.38193342898935 below_pct_over_usage : 5.007300355688999 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 44723273 below : 0 usage : 85999424 time : 300384703093 avg_time_us_per_usage : 3492.868778900194 above_pct_over_usage : 52.0041541208462 below_pct_over_usage : 0.0 OVERALL above : 47612494 below : 6760630 usage : 107017605 time : 302880130138 avg_time_us_per_usage : 2830.1897630581434 above_pct_over_usage : 44.490337828061094 below_pct_over_usage : 6.317306390850365 good-kernel index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 296084 usage : 297550 time : 5996079 avg_time_us_per_usage : 20.151500588136447 above_pct_over_usage : 0.0 below_pct_over_usage : 99.50730969584944 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 341043 below : 4962635 usage : 8062854 time : 994358306 avg_time_us_per_usage : 123.32584789455446 above_pct_over_usage : 4.229804979725541 below_pct_over_usage : 61.549359569204654 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 12688379 below : 6252051 usage : 56708358 time : 9902266327 avg_time_us_per_usage : 174.61740519801333 above_pct_over_usage : 22.37479526386569 below_pct_over_usage : 11.024919818697624 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 15868752 below : 0 usage : 54624476 time : 276236627330 avg_time_us_per_usage : 5057.011939666021 above_pct_over_usage : 29.050625584033064 below_pct_over_usage : 0.0 OVERALL above : 28898174 below : 11510770 usage : 119693238 time : 287139248042 avg_time_us_per_usage : 2398.9596475115827 above_pct_over_usage : 24.143530982092738 below_pct_over_usage : 9.616892476415417 [-- Attachment #5: diff_phoronix-sqlite.txt --] [-- Type: text/plain, Size: 3571 bytes --] bad-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 260 usage : 262 time : 6634 avg_time_us_per_usage : 25.3206106870229 above_pct_over_usage : 0.0 below_pct_over_usage : 99.23664122137404 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 3985 below : 35588 usage : 62751 time : 2972131 avg_time_us_per_usage : 47.36388264728849 above_pct_over_usage : 6.35049640643177 below_pct_over_usage : 56.71304042963459 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 7896 below : 1433 usage : 24941 time : 2185241 avg_time_us_per_usage : 87.61641473878353 above_pct_over_usage : 31.65871456637665 below_pct_over_usage : 5.745559520468305 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 11543 below : 0 usage : 24489 time : 123920172 avg_time_us_per_usage : 5060.238147739801 above_pct_over_usage : 47.135448568745154 below_pct_over_usage : 0.0 OVERALL above : 23424 below : 37281 usage : 112443 time : 129084178 avg_time_us_per_usage : 1147.996567149578 above_pct_over_usage : 20.831888156666043 below_pct_over_usage : 33.155465435820815 good-kernel: index0 POLL latency=0 residency=0 desc='CPUIDLE CORE POLL IDLE' above : 0 below : 218 usage : 220 time : 5595 avg_time_us_per_usage : 25.431818181818183 above_pct_over_usage : 0.0 below_pct_over_usage : 99.0909090909091 index1 C1 latency=2 residency=2 desc='MWAIT 0x00' above : 5213 below : 34602 usage : 70212 time : 3288043 avg_time_us_per_usage : 46.83021420839742 above_pct_over_usage : 7.424656753831254 below_pct_over_usage : 49.28217398735259 index2 C1E latency=10 residency=20 desc='MWAIT 0x01' above : 5237 below : 1806 usage : 30273 time : 4825950 avg_time_us_per_usage : 159.41432960063423 above_pct_over_usage : 17.299243550358405 below_pct_over_usage : 5.965712020612427 index3 C3 latency=40 residency=100 desc='MWAIT 0x10' above : 0 below : 0 usage : 0 time : 0 avg_time_us_per_usage : nan above_pct_over_usage : nan below_pct_over_usage : nan index4 C6 latency=133 residency=400 desc='MWAIT 0x20' above : 2120 below : 0 usage : 11824 time : 124762813 avg_time_us_per_usage : 10551.658744925575 above_pct_over_usage : 17.929634641407308 below_pct_over_usage : 0.0 OVERALL above : 12570 below : 36626 usage : 112529 time : 132882401 avg_time_us_per_usage : 1180.8724950901546 above_pct_over_usage : 11.170453838566058 below_pct_over_usage : 32.54805427934132 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-01 19:20 ` Christian Loehle @ 2026-02-02 17:31 ` Harshvardhan Jha 2026-02-03 9:07 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2026-02-02 17:31 UTC (permalink / raw) To: Christian Loehle, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 02/02/26 12:50 AM, Christian Loehle wrote: > On 1/30/26 19:28, Rafael J. Wysocki wrote: >> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>> On 2026.01.28 15:53 Doug Smythies wrote: >>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>> ... snip ... >>>> >>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>> from base to revert. >>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>> before and after the workload is usually fine to work with: >>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>> states for the test suites before and after tests. The folders with the >>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>> containing two files having the before and after states. Please note >>>>>> that different machines were used for different test suites due to >>>>>> compatibility reasons. The jbb test was run using containers. >>>> Please provide the results of the test runs that were done for >>>> the supplied before and after idle data. >>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>> Is it a test I can run on my test computer? >>> I see that I have fio installed on my test computer. >>> >>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>> I have only done it for the phoronix-sqlite data. >>>> I have done the rest now, see below. >>>> I have also attached the results, in case the formatting gets screwed up. >>>> >>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>> >>>>> phoronix-sqlite >>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>> Usage Above Below Above Below >>>>> state 0 220 0 218 0.00% 99.09% >>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 11824 2120 0 17.93% 0.00% >>>>> >>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>> >>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>> Usage Above Below Above Below >>>>> state 0 262 0 260 0.00% 99.24% >>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 24489 11543 0 47.14% 0.00% >>>>> >>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>> >>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>> >>>>> I have a template now, and can summarize the other 40 CPU data >>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>> POLL, C1, C1E, C3 (disabled), C6 >>>> >>>> Good Kernel: Time between samples > 2 hours (estimated) >>>> Usage Above Below Above Below >>>> state 0 297550 0 296084 0.00% 99.51% >>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 54624476 15868752 0 29.05% 0.00% >>>> >>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>> >>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>> Usage Above Below Above Below >>>> state 0 90715 0 75134 0.00% 82.82% >>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 85999424 44723273 0 52.00% 0.00% >>>> >>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>> >>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>> >>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>> >>>> fio >>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>> Usage Above Below Above Below >>>> state 0 3822 0 3818 0.00% 99.90% >>>> state 1 148640 4406 68956 2.96% 46.39% >>>> state 2 593455 45344 105675 7.64% 17.81% >>>> state 3 3209648 807014 0 25.14% 0.00% >>>> >>>> total 3955565 856764 178449 26.17% <<< Misses >>>> >>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>> Usage Above Below Above Below >>>> state 0 916 0 756 0.00% 82.53% >>>> state 1 80230 2028 42791 2.53% 53.34% >>>> state 2 59231 6888 6791 11.63% 11.47% >>>> state 3 2455784 564797 0 23.00% 0.00% >>>> >>>> total 2596161 573713 50338 24.04% <<< Misses >>>> >>>> It is not clear why the number of idle entries differs so much >>>> between the tests, but there is a bit of a different distribution >>>> of the workload among the CPUs. >>>> >>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>> POLL, C1, C1E, C3 (disabled), C6 >>>> >>>> rds-stress-test >>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>> Usage Above Below Above Below >>>> state 0 1561 0 1435 0.00% 91.93% >>>> state 1 13855 899 2410 6.49% 17.39% >>>> state 2 467998 139254 23679 29.76% 5.06% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 213132 107417 0 50.40% 0.00% >>>> >>>> total 696546 247570 27524 39.49% <<< Misses >>>> >>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>> Usage Above Below Above Below >>>> state 0 231 0 231 0.00% 100.00% >>>> state 1 5413 266 1186 4.91% 21.91% >>>> state 2 54365 719 3789 1.32% 6.97% >>>> state 3 0 0 0 0.00% 0.00% >>>> state 4 267055 148327 0 55.54% 0.00% >>>> >>>> total 327064 149312 5206 47.24% <<< Misses >>>> >>>> Again, differing numbers of idle entries between tests. >>>> This time the load distribution between CPUs is more >>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>> In the "Good" case the work is distributed over more CPUs. >>>> I assume without proof, that the scheduler is deciding not to migrate >>>> the next bit of work to another CPU in the one case verses the other. >>> The above is incorrect. The CPUs involved between the "Good" >>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>> a 3rd and 4th. See the attached graph for more detail / clarity. >>> >>> All of the tests show higher usage of shallower idle states with >>> the "Good" verses the "Bad", which was the expectation of the >>> original patch, as has been mentioned a few times in the emails. >>> >>> My input is to revert the reversion. >> OK, noted, thanks! >> >> Christian, what do you think? > I've attached readable diffs of the values provided the tldr is: > > +--------------------+-----------+-----------+ > | Workload | Δ above % | Δ below % | > +--------------------+-----------+-----------+ > | fio | -10.11 | +2.36 | > | rds-stress-test | -0.44 | +2.57 | > | jbb | -20.35 | +3.30 | > | phoronix-sqlite | -9.66 | -0.61 | > +--------------------+-----------+-----------+ > > I think the overall trend however is clear, the commit > 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") > improved menu on many systems and workloads, I'd dare to say most. > > Even on the reported regression introduced by it, the cpuidle governor > performed better on paper, system metrics regressed because other > CPUs' P-states weren't available due to being in a shallower state. > https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ > (+CC Sergey) > It could be argued that this is a limitation of a per-CPU cpuidle > governor and a more holistic approach would be needed for that platform > (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, > skew towards deeper cpuidle states). > > I also think that the change made sense, for small residency values > with a bit of random noise mixed in, performing the same statistical > test doesn't seem sensible, the short intervals will look noisier. > > So options are: > 1. Revert revert on mainline+stable > 2. Revert revert on mainline only > 3. Keep revert, miss out on the improvement for many. > 4. Revert only when we have a good solution for the platforms like > Sergey's. > > I'd lean towards 2 because 4 won't be easy, unless of course a minor > hack like playing with the deep idle state residency values would > be enough to mitigate. Wouldn't it be better to choose option 1 as reverting the revert has even more pronounced improvements on older kernels? I've tested this on 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. Since the revert has optimizations present only in Jasper Lake Systems which is new, isn't reverting the revert more relevant on stable kernels? It's more likely that older hardware runs older kernels than newer hardware although not always necessary imo. Harshvardhan > > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-02 17:31 ` Harshvardhan Jha @ 2026-02-03 9:07 ` Christian Loehle 2026-02-03 9:16 ` Harshvardhan Jha 0 siblings, 1 reply; 44+ messages in thread From: Christian Loehle @ 2026-02-03 9:07 UTC (permalink / raw) To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 2/2/26 17:31, Harshvardhan Jha wrote: > > On 02/02/26 12:50 AM, Christian Loehle wrote: >> On 1/30/26 19:28, Rafael J. Wysocki wrote: >>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>>> On 2026.01.28 15:53 Doug Smythies wrote: >>>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>> ... snip ... >>>>> >>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>>> from base to revert. >>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>>> before and after the workload is usually fine to work with: >>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>>> states for the test suites before and after tests. The folders with the >>>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>>> containing two files having the before and after states. Please note >>>>>>> that different machines were used for different test suites due to >>>>>>> compatibility reasons. The jbb test was run using containers. >>>>> Please provide the results of the test runs that were done for >>>>> the supplied before and after idle data. >>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>>> Is it a test I can run on my test computer? >>>> I see that I have fio installed on my test computer. >>>> >>>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>>> I have only done it for the phoronix-sqlite data. >>>>> I have done the rest now, see below. >>>>> I have also attached the results, in case the formatting gets screwed up. >>>>> >>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>>> >>>>>> phoronix-sqlite >>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>>> Usage Above Below Above Below >>>>>> state 0 220 0 218 0.00% 99.09% >>>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 11824 2120 0 17.93% 0.00% >>>>>> >>>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>>> >>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>>> Usage Above Below Above Below >>>>>> state 0 262 0 260 0.00% 99.24% >>>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 24489 11543 0 47.14% 0.00% >>>>>> >>>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>>> >>>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>>> >>>>>> I have a template now, and can summarize the other 40 CPU data >>>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>> >>>>> Good Kernel: Time between samples > 2 hours (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 297550 0 296084 0.00% 99.51% >>>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 54624476 15868752 0 29.05% 0.00% >>>>> >>>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>>> >>>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 90715 0 75134 0.00% 82.82% >>>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 85999424 44723273 0 52.00% 0.00% >>>>> >>>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>>> >>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>>> >>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>>> >>>>> fio >>>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 3822 0 3818 0.00% 99.90% >>>>> state 1 148640 4406 68956 2.96% 46.39% >>>>> state 2 593455 45344 105675 7.64% 17.81% >>>>> state 3 3209648 807014 0 25.14% 0.00% >>>>> >>>>> total 3955565 856764 178449 26.17% <<< Misses >>>>> >>>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 916 0 756 0.00% 82.53% >>>>> state 1 80230 2028 42791 2.53% 53.34% >>>>> state 2 59231 6888 6791 11.63% 11.47% >>>>> state 3 2455784 564797 0 23.00% 0.00% >>>>> >>>>> total 2596161 573713 50338 24.04% <<< Misses >>>>> >>>>> It is not clear why the number of idle entries differs so much >>>>> between the tests, but there is a bit of a different distribution >>>>> of the workload among the CPUs. >>>>> >>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>> >>>>> rds-stress-test >>>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 1561 0 1435 0.00% 91.93% >>>>> state 1 13855 899 2410 6.49% 17.39% >>>>> state 2 467998 139254 23679 29.76% 5.06% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 213132 107417 0 50.40% 0.00% >>>>> >>>>> total 696546 247570 27524 39.49% <<< Misses >>>>> >>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>>> Usage Above Below Above Below >>>>> state 0 231 0 231 0.00% 100.00% >>>>> state 1 5413 266 1186 4.91% 21.91% >>>>> state 2 54365 719 3789 1.32% 6.97% >>>>> state 3 0 0 0 0.00% 0.00% >>>>> state 4 267055 148327 0 55.54% 0.00% >>>>> >>>>> total 327064 149312 5206 47.24% <<< Misses >>>>> >>>>> Again, differing numbers of idle entries between tests. >>>>> This time the load distribution between CPUs is more >>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>>> In the "Good" case the work is distributed over more CPUs. >>>>> I assume without proof, that the scheduler is deciding not to migrate >>>>> the next bit of work to another CPU in the one case verses the other. >>>> The above is incorrect. The CPUs involved between the "Good" >>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>>> a 3rd and 4th. See the attached graph for more detail / clarity. >>>> >>>> All of the tests show higher usage of shallower idle states with >>>> the "Good" verses the "Bad", which was the expectation of the >>>> original patch, as has been mentioned a few times in the emails. >>>> >>>> My input is to revert the reversion. >>> OK, noted, thanks! >>> >>> Christian, what do you think? >> I've attached readable diffs of the values provided the tldr is: >> >> +--------------------+-----------+-----------+ >> | Workload | Δ above % | Δ below % | >> +--------------------+-----------+-----------+ >> | fio | -10.11 | +2.36 | >> | rds-stress-test | -0.44 | +2.57 | >> | jbb | -20.35 | +3.30 | >> | phoronix-sqlite | -9.66 | -0.61 | >> +--------------------+-----------+-----------+ >> >> I think the overall trend however is clear, the commit >> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") >> improved menu on many systems and workloads, I'd dare to say most. >> >> Even on the reported regression introduced by it, the cpuidle governor >> performed better on paper, system metrics regressed because other >> CPUs' P-states weren't available due to being in a shallower state. >> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ >> (+CC Sergey) >> It could be argued that this is a limitation of a per-CPU cpuidle >> governor and a more holistic approach would be needed for that platform >> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, >> skew towards deeper cpuidle states). >> >> I also think that the change made sense, for small residency values >> with a bit of random noise mixed in, performing the same statistical >> test doesn't seem sensible, the short intervals will look noisier. >> >> So options are: >> 1. Revert revert on mainline+stable >> 2. Revert revert on mainline only >> 3. Keep revert, miss out on the improvement for many. >> 4. Revert only when we have a good solution for the platforms like >> Sergey's. >> >> I'd lean towards 2 because 4 won't be easy, unless of course a minor >> hack like playing with the deep idle state residency values would >> be enough to mitigate. > > Wouldn't it be better to choose option 1 as reverting the revert has > even more pronounced improvements on older kernels? I've tested this on > 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. > Since the revert has optimizations present only in Jasper Lake Systems > which is new, isn't reverting the revert more relevant on stable > kernels? It's more likely that older hardware runs older kernels than > newer hardware although not always necessary imo. > FWIW Jasper Lake seems to be supported from 5.6 on, see b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 9:07 ` Christian Loehle @ 2026-02-03 9:16 ` Harshvardhan Jha 2026-02-03 9:31 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2026-02-03 9:16 UTC (permalink / raw) To: Christian Loehle, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 03/02/26 2:37 PM, Christian Loehle wrote: > On 2/2/26 17:31, Harshvardhan Jha wrote: >> On 02/02/26 12:50 AM, Christian Loehle wrote: >>> On 1/30/26 19:28, Rafael J. Wysocki wrote: >>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>>>> On 2026.01.28 15:53 Doug Smythies wrote: >>>>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>>> ... snip ... >>>>>> >>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>>>> from base to revert. >>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>>>> before and after the workload is usually fine to work with: >>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>>>> states for the test suites before and after tests. The folders with the >>>>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>>>> containing two files having the before and after states. Please note >>>>>>>> that different machines were used for different test suites due to >>>>>>>> compatibility reasons. The jbb test was run using containers. >>>>>> Please provide the results of the test runs that were done for >>>>>> the supplied before and after idle data. >>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>>>> Is it a test I can run on my test computer? >>>>> I see that I have fio installed on my test computer. >>>>> >>>>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>>>> I have only done it for the phoronix-sqlite data. >>>>>> I have done the rest now, see below. >>>>>> I have also attached the results, in case the formatting gets screwed up. >>>>>> >>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>>>> >>>>>>> phoronix-sqlite >>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 220 0 218 0.00% 99.09% >>>>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 11824 2120 0 17.93% 0.00% >>>>>>> >>>>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>>>> >>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 262 0 260 0.00% 99.24% >>>>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 24489 11543 0 47.14% 0.00% >>>>>>> >>>>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>>>> >>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>>>> >>>>>>> I have a template now, and can summarize the other 40 CPU data >>>>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>> >>>>>> Good Kernel: Time between samples > 2 hours (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 297550 0 296084 0.00% 99.51% >>>>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 54624476 15868752 0 29.05% 0.00% >>>>>> >>>>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>>>> >>>>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 90715 0 75134 0.00% 82.82% >>>>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 85999424 44723273 0 52.00% 0.00% >>>>>> >>>>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>>>> >>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>>>> >>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>>>> >>>>>> fio >>>>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 3822 0 3818 0.00% 99.90% >>>>>> state 1 148640 4406 68956 2.96% 46.39% >>>>>> state 2 593455 45344 105675 7.64% 17.81% >>>>>> state 3 3209648 807014 0 25.14% 0.00% >>>>>> >>>>>> total 3955565 856764 178449 26.17% <<< Misses >>>>>> >>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 916 0 756 0.00% 82.53% >>>>>> state 1 80230 2028 42791 2.53% 53.34% >>>>>> state 2 59231 6888 6791 11.63% 11.47% >>>>>> state 3 2455784 564797 0 23.00% 0.00% >>>>>> >>>>>> total 2596161 573713 50338 24.04% <<< Misses >>>>>> >>>>>> It is not clear why the number of idle entries differs so much >>>>>> between the tests, but there is a bit of a different distribution >>>>>> of the workload among the CPUs. >>>>>> >>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>> >>>>>> rds-stress-test >>>>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 1561 0 1435 0.00% 91.93% >>>>>> state 1 13855 899 2410 6.49% 17.39% >>>>>> state 2 467998 139254 23679 29.76% 5.06% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 213132 107417 0 50.40% 0.00% >>>>>> >>>>>> total 696546 247570 27524 39.49% <<< Misses >>>>>> >>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>>>> Usage Above Below Above Below >>>>>> state 0 231 0 231 0.00% 100.00% >>>>>> state 1 5413 266 1186 4.91% 21.91% >>>>>> state 2 54365 719 3789 1.32% 6.97% >>>>>> state 3 0 0 0 0.00% 0.00% >>>>>> state 4 267055 148327 0 55.54% 0.00% >>>>>> >>>>>> total 327064 149312 5206 47.24% <<< Misses >>>>>> >>>>>> Again, differing numbers of idle entries between tests. >>>>>> This time the load distribution between CPUs is more >>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>>>> In the "Good" case the work is distributed over more CPUs. >>>>>> I assume without proof, that the scheduler is deciding not to migrate >>>>>> the next bit of work to another CPU in the one case verses the other. >>>>> The above is incorrect. The CPUs involved between the "Good" >>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>>>> a 3rd and 4th. See the attached graph for more detail / clarity. >>>>> >>>>> All of the tests show higher usage of shallower idle states with >>>>> the "Good" verses the "Bad", which was the expectation of the >>>>> original patch, as has been mentioned a few times in the emails. >>>>> >>>>> My input is to revert the reversion. >>>> OK, noted, thanks! >>>> >>>> Christian, what do you think? >>> I've attached readable diffs of the values provided the tldr is: >>> >>> +--------------------+-----------+-----------+ >>> | Workload | Δ above % | Δ below % | >>> +--------------------+-----------+-----------+ >>> | fio | -10.11 | +2.36 | >>> | rds-stress-test | -0.44 | +2.57 | >>> | jbb | -20.35 | +3.30 | >>> | phoronix-sqlite | -9.66 | -0.61 | >>> +--------------------+-----------+-----------+ >>> >>> I think the overall trend however is clear, the commit >>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") >>> improved menu on many systems and workloads, I'd dare to say most. >>> >>> Even on the reported regression introduced by it, the cpuidle governor >>> performed better on paper, system metrics regressed because other >>> CPUs' P-states weren't available due to being in a shallower state. >>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ >>> (+CC Sergey) >>> It could be argued that this is a limitation of a per-CPU cpuidle >>> governor and a more holistic approach would be needed for that platform >>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, >>> skew towards deeper cpuidle states). >>> >>> I also think that the change made sense, for small residency values >>> with a bit of random noise mixed in, performing the same statistical >>> test doesn't seem sensible, the short intervals will look noisier. >>> >>> So options are: >>> 1. Revert revert on mainline+stable >>> 2. Revert revert on mainline only >>> 3. Keep revert, miss out on the improvement for many. >>> 4. Revert only when we have a good solution for the platforms like >>> Sergey's. >>> >>> I'd lean towards 2 because 4 won't be easy, unless of course a minor >>> hack like playing with the deep idle state residency values would >>> be enough to mitigate. >> Wouldn't it be better to choose option 1 as reverting the revert has >> even more pronounced improvements on older kernels? I've tested this on >> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. >> Since the revert has optimizations present only in Jasper Lake Systems >> which is new, isn't reverting the revert more relevant on stable >> kernels? It's more likely that older hardware runs older kernels than >> newer hardware although not always necessary imo. >> > FWIW Jasper Lake seems to be supported from 5.6 on, see > b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") Oh I see, but shouldn't avoiding regressions on established platforms be a priority over further optimizing for specific newer platforms like Jasper Lake? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 9:16 ` Harshvardhan Jha @ 2026-02-03 9:31 ` Christian Loehle 2026-02-03 10:22 ` Harshvardhan Jha 2026-02-03 16:45 ` Rafael J. Wysocki 0 siblings, 2 replies; 44+ messages in thread From: Christian Loehle @ 2026-02-03 9:31 UTC (permalink / raw) To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 2/3/26 09:16, Harshvardhan Jha wrote: > > On 03/02/26 2:37 PM, Christian Loehle wrote: >> On 2/2/26 17:31, Harshvardhan Jha wrote: >>> On 02/02/26 12:50 AM, Christian Loehle wrote: >>>> On 1/30/26 19:28, Rafael J. Wysocki wrote: >>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>>>>> On 2026.01.28 15:53 Doug Smythies wrote: >>>>>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>>>> ... snip ... >>>>>>> >>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>>>>> from base to revert. >>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>>>>> before and after the workload is usually fine to work with: >>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>>>>> states for the test suites before and after tests. The folders with the >>>>>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>>>>> containing two files having the before and after states. Please note >>>>>>>>> that different machines were used for different test suites due to >>>>>>>>> compatibility reasons. The jbb test was run using containers. >>>>>>> Please provide the results of the test runs that were done for >>>>>>> the supplied before and after idle data. >>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>>>>> Is it a test I can run on my test computer? >>>>>> I see that I have fio installed on my test computer. >>>>>> >>>>>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>>>>> I have only done it for the phoronix-sqlite data. >>>>>>> I have done the rest now, see below. >>>>>>> I have also attached the results, in case the formatting gets screwed up. >>>>>>> >>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>>>>> >>>>>>>> phoronix-sqlite >>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 220 0 218 0.00% 99.09% >>>>>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 11824 2120 0 17.93% 0.00% >>>>>>>> >>>>>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>>>>> >>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 262 0 260 0.00% 99.24% >>>>>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 24489 11543 0 47.14% 0.00% >>>>>>>> >>>>>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>>>>> >>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>>>>> >>>>>>>> I have a template now, and can summarize the other 40 CPU data >>>>>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>> >>>>>>> Good Kernel: Time between samples > 2 hours (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 297550 0 296084 0.00% 99.51% >>>>>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 54624476 15868752 0 29.05% 0.00% >>>>>>> >>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>>>>> >>>>>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 90715 0 75134 0.00% 82.82% >>>>>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>>>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 85999424 44723273 0 52.00% 0.00% >>>>>>> >>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>>>>> >>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>>>>> >>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>>>>> >>>>>>> fio >>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 3822 0 3818 0.00% 99.90% >>>>>>> state 1 148640 4406 68956 2.96% 46.39% >>>>>>> state 2 593455 45344 105675 7.64% 17.81% >>>>>>> state 3 3209648 807014 0 25.14% 0.00% >>>>>>> >>>>>>> total 3955565 856764 178449 26.17% <<< Misses >>>>>>> >>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 916 0 756 0.00% 82.53% >>>>>>> state 1 80230 2028 42791 2.53% 53.34% >>>>>>> state 2 59231 6888 6791 11.63% 11.47% >>>>>>> state 3 2455784 564797 0 23.00% 0.00% >>>>>>> >>>>>>> total 2596161 573713 50338 24.04% <<< Misses >>>>>>> >>>>>>> It is not clear why the number of idle entries differs so much >>>>>>> between the tests, but there is a bit of a different distribution >>>>>>> of the workload among the CPUs. >>>>>>> >>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>> >>>>>>> rds-stress-test >>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 1561 0 1435 0.00% 91.93% >>>>>>> state 1 13855 899 2410 6.49% 17.39% >>>>>>> state 2 467998 139254 23679 29.76% 5.06% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 213132 107417 0 50.40% 0.00% >>>>>>> >>>>>>> total 696546 247570 27524 39.49% <<< Misses >>>>>>> >>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>>>>> Usage Above Below Above Below >>>>>>> state 0 231 0 231 0.00% 100.00% >>>>>>> state 1 5413 266 1186 4.91% 21.91% >>>>>>> state 2 54365 719 3789 1.32% 6.97% >>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>> state 4 267055 148327 0 55.54% 0.00% >>>>>>> >>>>>>> total 327064 149312 5206 47.24% <<< Misses >>>>>>> >>>>>>> Again, differing numbers of idle entries between tests. >>>>>>> This time the load distribution between CPUs is more >>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>>>>> In the "Good" case the work is distributed over more CPUs. >>>>>>> I assume without proof, that the scheduler is deciding not to migrate >>>>>>> the next bit of work to another CPU in the one case verses the other. >>>>>> The above is incorrect. The CPUs involved between the "Good" >>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>>>>> a 3rd and 4th. See the attached graph for more detail / clarity. >>>>>> >>>>>> All of the tests show higher usage of shallower idle states with >>>>>> the "Good" verses the "Bad", which was the expectation of the >>>>>> original patch, as has been mentioned a few times in the emails. >>>>>> >>>>>> My input is to revert the reversion. >>>>> OK, noted, thanks! >>>>> >>>>> Christian, what do you think? >>>> I've attached readable diffs of the values provided the tldr is: >>>> >>>> +--------------------+-----------+-----------+ >>>> | Workload | Δ above % | Δ below % | >>>> +--------------------+-----------+-----------+ >>>> | fio | -10.11 | +2.36 | >>>> | rds-stress-test | -0.44 | +2.57 | >>>> | jbb | -20.35 | +3.30 | >>>> | phoronix-sqlite | -9.66 | -0.61 | >>>> +--------------------+-----------+-----------+ >>>> >>>> I think the overall trend however is clear, the commit >>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") >>>> improved menu on many systems and workloads, I'd dare to say most. >>>> >>>> Even on the reported regression introduced by it, the cpuidle governor >>>> performed better on paper, system metrics regressed because other >>>> CPUs' P-states weren't available due to being in a shallower state. >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ >>>> (+CC Sergey) >>>> It could be argued that this is a limitation of a per-CPU cpuidle >>>> governor and a more holistic approach would be needed for that platform >>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, >>>> skew towards deeper cpuidle states). >>>> >>>> I also think that the change made sense, for small residency values >>>> with a bit of random noise mixed in, performing the same statistical >>>> test doesn't seem sensible, the short intervals will look noisier. >>>> >>>> So options are: >>>> 1. Revert revert on mainline+stable >>>> 2. Revert revert on mainline only >>>> 3. Keep revert, miss out on the improvement for many. >>>> 4. Revert only when we have a good solution for the platforms like >>>> Sergey's. >>>> >>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor >>>> hack like playing with the deep idle state residency values would >>>> be enough to mitigate. >>> Wouldn't it be better to choose option 1 as reverting the revert has >>> even more pronounced improvements on older kernels? I've tested this on >>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. >>> Since the revert has optimizations present only in Jasper Lake Systems >>> which is new, isn't reverting the revert more relevant on stable >>> kernels? It's more likely that older hardware runs older kernels than >>> newer hardware although not always necessary imo. >>> >> FWIW Jasper Lake seems to be supported from 5.6 on, see >> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") > > Oh I see, but shouldn't avoiding regressions on established platforms be > a priority over further optimizing for specific newer platforms like > Jasper Lake? > Well avoiding regressions on established platforms is what lead to 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" being applied and backported. The expectation for stable is that we avoid regressions and potentially miss out on improvements. If you want the latest greatest performance you should probably run a latest greatest kernel. The original 85975daeaa4d cpuidle: menu: Avoid discarding useful information was seen as a fix and overall improvement, that's why it was backported, but Sergey's regression report contradicted that. What is "established" and "newer" for a stable kernel is quite handwavy IMO but even here Sergey's regression report is a clear data point... Your report is only restoring 5.15 (and others) performance to 5.15 upstream-ish levels which is within the expectations of running a stable kernel. No doubt it's frustrating either way! ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 9:31 ` Christian Loehle @ 2026-02-03 10:22 ` Harshvardhan Jha 2026-02-03 10:30 ` Christian Loehle 2026-02-03 16:45 ` Rafael J. Wysocki 1 sibling, 1 reply; 44+ messages in thread From: Harshvardhan Jha @ 2026-02-03 10:22 UTC (permalink / raw) To: Christian Loehle, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 03/02/26 3:01 PM, Christian Loehle wrote: > On 2/3/26 09:16, Harshvardhan Jha wrote: >> On 03/02/26 2:37 PM, Christian Loehle wrote: >>> On 2/2/26 17:31, Harshvardhan Jha wrote: >>>> On 02/02/26 12:50 AM, Christian Loehle wrote: >>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote: >>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>>>>>> On 2026.01.28 15:53 Doug Smythies wrote: >>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>>>>> ... snip ... >>>>>>>> >>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>>>>>> from base to revert. >>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>>>>>> before and after the workload is usually fine to work with: >>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>>>>>> states for the test suites before and after tests. The folders with the >>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>>>>>> containing two files having the before and after states. Please note >>>>>>>>>> that different machines were used for different test suites due to >>>>>>>>>> compatibility reasons. The jbb test was run using containers. >>>>>>>> Please provide the results of the test runs that were done for >>>>>>>> the supplied before and after idle data. >>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>>>>>> Is it a test I can run on my test computer? >>>>>>> I see that I have fio installed on my test computer. >>>>>>> >>>>>>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>>>>>> I have only done it for the phoronix-sqlite data. >>>>>>>> I have done the rest now, see below. >>>>>>>> I have also attached the results, in case the formatting gets screwed up. >>>>>>>> >>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>>>>>> >>>>>>>>> phoronix-sqlite >>>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 220 0 218 0.00% 99.09% >>>>>>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>>>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 11824 2120 0 17.93% 0.00% >>>>>>>>> >>>>>>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>>>>>> >>>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 262 0 260 0.00% 99.24% >>>>>>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>>>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 24489 11543 0 47.14% 0.00% >>>>>>>>> >>>>>>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>>>>>> >>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>>>>>> >>>>>>>>> I have a template now, and can summarize the other 40 CPU data >>>>>>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>>> >>>>>>>> Good Kernel: Time between samples > 2 hours (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 297550 0 296084 0.00% 99.51% >>>>>>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 54624476 15868752 0 29.05% 0.00% >>>>>>>> >>>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>>>>>> >>>>>>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 90715 0 75134 0.00% 82.82% >>>>>>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>>>>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 85999424 44723273 0 52.00% 0.00% >>>>>>>> >>>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>>>>>> >>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>>>>>> >>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>>>>>> >>>>>>>> fio >>>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 3822 0 3818 0.00% 99.90% >>>>>>>> state 1 148640 4406 68956 2.96% 46.39% >>>>>>>> state 2 593455 45344 105675 7.64% 17.81% >>>>>>>> state 3 3209648 807014 0 25.14% 0.00% >>>>>>>> >>>>>>>> total 3955565 856764 178449 26.17% <<< Misses >>>>>>>> >>>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 916 0 756 0.00% 82.53% >>>>>>>> state 1 80230 2028 42791 2.53% 53.34% >>>>>>>> state 2 59231 6888 6791 11.63% 11.47% >>>>>>>> state 3 2455784 564797 0 23.00% 0.00% >>>>>>>> >>>>>>>> total 2596161 573713 50338 24.04% <<< Misses >>>>>>>> >>>>>>>> It is not clear why the number of idle entries differs so much >>>>>>>> between the tests, but there is a bit of a different distribution >>>>>>>> of the workload among the CPUs. >>>>>>>> >>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>>> >>>>>>>> rds-stress-test >>>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 1561 0 1435 0.00% 91.93% >>>>>>>> state 1 13855 899 2410 6.49% 17.39% >>>>>>>> state 2 467998 139254 23679 29.76% 5.06% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 213132 107417 0 50.40% 0.00% >>>>>>>> >>>>>>>> total 696546 247570 27524 39.49% <<< Misses >>>>>>>> >>>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>>>>>> Usage Above Below Above Below >>>>>>>> state 0 231 0 231 0.00% 100.00% >>>>>>>> state 1 5413 266 1186 4.91% 21.91% >>>>>>>> state 2 54365 719 3789 1.32% 6.97% >>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>> state 4 267055 148327 0 55.54% 0.00% >>>>>>>> >>>>>>>> total 327064 149312 5206 47.24% <<< Misses >>>>>>>> >>>>>>>> Again, differing numbers of idle entries between tests. >>>>>>>> This time the load distribution between CPUs is more >>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>>>>>> In the "Good" case the work is distributed over more CPUs. >>>>>>>> I assume without proof, that the scheduler is deciding not to migrate >>>>>>>> the next bit of work to another CPU in the one case verses the other. >>>>>>> The above is incorrect. The CPUs involved between the "Good" >>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity. >>>>>>> >>>>>>> All of the tests show higher usage of shallower idle states with >>>>>>> the "Good" verses the "Bad", which was the expectation of the >>>>>>> original patch, as has been mentioned a few times in the emails. >>>>>>> >>>>>>> My input is to revert the reversion. >>>>>> OK, noted, thanks! >>>>>> >>>>>> Christian, what do you think? >>>>> I've attached readable diffs of the values provided the tldr is: >>>>> >>>>> +--------------------+-----------+-----------+ >>>>> | Workload | Δ above % | Δ below % | >>>>> +--------------------+-----------+-----------+ >>>>> | fio | -10.11 | +2.36 | >>>>> | rds-stress-test | -0.44 | +2.57 | >>>>> | jbb | -20.35 | +3.30 | >>>>> | phoronix-sqlite | -9.66 | -0.61 | >>>>> +--------------------+-----------+-----------+ >>>>> >>>>> I think the overall trend however is clear, the commit >>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") >>>>> improved menu on many systems and workloads, I'd dare to say most. >>>>> >>>>> Even on the reported regression introduced by it, the cpuidle governor >>>>> performed better on paper, system metrics regressed because other >>>>> CPUs' P-states weren't available due to being in a shallower state. >>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ >>>>> (+CC Sergey) >>>>> It could be argued that this is a limitation of a per-CPU cpuidle >>>>> governor and a more holistic approach would be needed for that platform >>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, >>>>> skew towards deeper cpuidle states). >>>>> >>>>> I also think that the change made sense, for small residency values >>>>> with a bit of random noise mixed in, performing the same statistical >>>>> test doesn't seem sensible, the short intervals will look noisier. >>>>> >>>>> So options are: >>>>> 1. Revert revert on mainline+stable >>>>> 2. Revert revert on mainline only >>>>> 3. Keep revert, miss out on the improvement for many. >>>>> 4. Revert only when we have a good solution for the platforms like >>>>> Sergey's. >>>>> >>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor >>>>> hack like playing with the deep idle state residency values would >>>>> be enough to mitigate. >>>> Wouldn't it be better to choose option 1 as reverting the revert has >>>> even more pronounced improvements on older kernels? I've tested this on >>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. >>>> Since the revert has optimizations present only in Jasper Lake Systems >>>> which is new, isn't reverting the revert more relevant on stable >>>> kernels? It's more likely that older hardware runs older kernels than >>>> newer hardware although not always necessary imo. >>>> >>> FWIW Jasper Lake seems to be supported from 5.6 on, see >>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") >> Oh I see, but shouldn't avoiding regressions on established platforms be >> a priority over further optimizing for specific newer platforms like >> Jasper Lake? >> > Well avoiding regressions on established platforms is what lead to > 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" > being applied and backported. > The expectation for stable is that we avoid regressions and potentially > miss out on improvements. If you want the latest greatest performance you > should probably run a latest greatest kernel. > The original > 85975daeaa4d cpuidle: menu: Avoid discarding useful information > was seen as a fix and overall improvement, that's why it was backported, > but Sergey's regression report contradicted that. > What is "established" and "newer" for a stable kernel is quite handwavy > IMO but even here Sergey's regression report is a clear data point... > Your report is only restoring 5.15 (and others) performance to 5.15 > upstream-ish levels which is within the expectations of running a stable > kernel. No doubt it's frustrating either way! Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels with the revert and reapplying it recovers performance across many benchmarks. Hence, the previous suggestion. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 10:22 ` Harshvardhan Jha @ 2026-02-03 10:30 ` Christian Loehle 0 siblings, 0 replies; 44+ messages in thread From: Christian Loehle @ 2026-02-03 10:30 UTC (permalink / raw) To: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies Cc: Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On 2/3/26 10:22, Harshvardhan Jha wrote: > > On 03/02/26 3:01 PM, Christian Loehle wrote: >> On 2/3/26 09:16, Harshvardhan Jha wrote: >>> On 03/02/26 2:37 PM, Christian Loehle wrote: >>>> On 2/2/26 17:31, Harshvardhan Jha wrote: >>>>> On 02/02/26 12:50 AM, Christian Loehle wrote: >>>>>> On 1/30/26 19:28, Rafael J. Wysocki wrote: >>>>>>> On Thu, Jan 29, 2026 at 11:27 PM Doug Smythies <dsmythies@telus.net> wrote: >>>>>>>> On 2026.01.28 15:53 Doug Smythies wrote: >>>>>>>>> On 2026.01.27 21:07 Doug Smythies wrote: >>>>>>>>>> On 2026.01.27 07:45 Harshvardhan Jha wrote: >>>>>>>>>>> On 08/12/25 6:17 PM, Christian Loehle wrote: >>>>>>>>>>>> On 12/8/25 11:33, Harshvardhan Jha wrote: >>>>>>>>>>>>> On 04/12/25 4:00 AM, Doug Smythies wrote: >>>>>>>>>>>>>> On 2025.12.03 08:45 Christian Loehle wrote: >>>>>>>>>>>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote: >>>>>>>>> ... snip ... >>>>>>>>> >>>>>>>>>>>> It would be nice to get the idle states here, ideally how the states' usage changed >>>>>>>>>>>> from base to revert. >>>>>>>>>>>> The mentioned thread did this and should show how it can be done, but a dump of >>>>>>>>>>>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* >>>>>>>>>>>> before and after the workload is usually fine to work with: >>>>>>>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$ >>>>>>>>>>> Apologies for the late reply, I'm attaching a tar ball which has the cpu >>>>>>>>>>> states for the test suites before and after tests. The folders with the >>>>>>>>>>> name of the test contain two folders good-kernel and bad-kernel >>>>>>>>>>> containing two files having the before and after states. Please note >>>>>>>>>>> that different machines were used for different test suites due to >>>>>>>>>>> compatibility reasons. The jbb test was run using containers. >>>>>>>>> Please provide the results of the test runs that were done for >>>>>>>>> the supplied before and after idle data. >>>>>>>>> In particular, what is the "fio" test and it results. Its idle data is not very revealing. >>>>>>>>> Is it a test I can run on my test computer? >>>>>>>> I see that I have fio installed on my test computer. >>>>>>>> >>>>>>>>>> It is a considerable amount of work to manually extract and summarize the data. >>>>>>>>>> I have only done it for the phoronix-sqlite data. >>>>>>>>> I have done the rest now, see below. >>>>>>>>> I have also attached the results, in case the formatting gets screwed up. >>>>>>>>> >>>>>>>>>> There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>>>> I remember seeing a Linux-pm email about why but couldn't find it just now. >>>>>>>>>> Summary (also attached as a PNG file, in case the formatting gets messed up): >>>>>>>>>> The total idle entries (usage) and time seem low to me, which is why the ???. >>>>>>>>>> >>>>>>>>>> phoronix-sqlite >>>>>>>>>> Good Kernel: Time between samples 4 seconds (estimated and ???) >>>>>>>>>> Usage Above Below Above Below >>>>>>>>>> state 0 220 0 218 0.00% 99.09% >>>>>>>>>> state 1 70212 5213 34602 7.42% 49.28% >>>>>>>>>> state 2 30273 5237 1806 17.30% 5.97% >>>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>>> state 4 11824 2120 0 17.93% 0.00% >>>>>>>>>> >>>>>>>>>> total 112529 12570 36626 43.72% <<< Misses % >>>>>>>>>> >>>>>>>>>> Bad Kernel: Time between samples 3.8 seconds (estimated and ???) >>>>>>>>>> Usage Above Below Above Below >>>>>>>>>> state 0 262 0 260 0.00% 99.24% >>>>>>>>>> state 1 62751 3985 35588 6.35% 56.71% >>>>>>>>>> state 2 24941 7896 1433 31.66% 5.75% >>>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>>> state 4 24489 11543 0 47.14% 0.00% >>>>>>>>>> >>>>>>>>>> total 112443 23424 37281 53.99% <<< Misses % >>>>>>>>>> >>>>>>>>>> Observe 2X use of idle state 4 for the "Bad Kernel" >>>>>>>>>> >>>>>>>>>> I have a template now, and can summarize the other 40 CPU data >>>>>>>>>> faster, but I would have to rework the template for the 56 CPU data, >>>>>>>>>> and is it a 64 CPU data set at 4 idle states per CPU? >>>>>>>>> jbb: 40 CPU's; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>>>> >>>>>>>>> Good Kernel: Time between samples > 2 hours (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 297550 0 296084 0.00% 99.51% >>>>>>>>> state 1 8062854 341043 4962635 4.23% 61.55% >>>>>>>>> state 2 56708358 12688379 6252051 22.37% 11.02% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 54624476 15868752 0 29.05% 0.00% >>>>>>>>> >>>>>>>>> total 119693238 28898174 11510770 33.76% <<< Misses >>>>>>>>> >>>>>>>>> Bad Kernel: Time between samples > 2 hours (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 90715 0 75134 0.00% 82.82% >>>>>>>>> state 1 8878738 312970 6082180 3.52% 68.50% >>>>>>>>> state 2 12048728 2576251 603316 21.38% 5.01% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 85999424 44723273 0 52.00% 0.00% >>>>>>>>> >>>>>>>>> total 107017605 47612494 6760630 50.81% <<< Misses >>>>>>>>> >>>>>>>>> As with the previous test, observe 1.6X use of idle state 4 for the "Bad Kernel" >>>>>>>>> >>>>>>>>> fio: 64 CPUs; 4 idle states; POLL, C1, C1E, C6. >>>>>>>>> >>>>>>>>> fio >>>>>>>>> Good Kernel: Time between samples ~ 1 minute (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 3822 0 3818 0.00% 99.90% >>>>>>>>> state 1 148640 4406 68956 2.96% 46.39% >>>>>>>>> state 2 593455 45344 105675 7.64% 17.81% >>>>>>>>> state 3 3209648 807014 0 25.14% 0.00% >>>>>>>>> >>>>>>>>> total 3955565 856764 178449 26.17% <<< Misses >>>>>>>>> >>>>>>>>> Bad Kernel: Time between samples ~ 1 minute (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 916 0 756 0.00% 82.53% >>>>>>>>> state 1 80230 2028 42791 2.53% 53.34% >>>>>>>>> state 2 59231 6888 6791 11.63% 11.47% >>>>>>>>> state 3 2455784 564797 0 23.00% 0.00% >>>>>>>>> >>>>>>>>> total 2596161 573713 50338 24.04% <<< Misses >>>>>>>>> >>>>>>>>> It is not clear why the number of idle entries differs so much >>>>>>>>> between the tests, but there is a bit of a different distribution >>>>>>>>> of the workload among the CPUs. >>>>>>>>> >>>>>>>>> rds-stress: 56 CPUs; 5 idle states, with idle state 3 defaulting to disabled. >>>>>>>>> POLL, C1, C1E, C3 (disabled), C6 >>>>>>>>> >>>>>>>>> rds-stress-test >>>>>>>>> Good Kernel: Time between samples ~70 Seconds (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 1561 0 1435 0.00% 91.93% >>>>>>>>> state 1 13855 899 2410 6.49% 17.39% >>>>>>>>> state 2 467998 139254 23679 29.76% 5.06% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 213132 107417 0 50.40% 0.00% >>>>>>>>> >>>>>>>>> total 696546 247570 27524 39.49% <<< Misses >>>>>>>>> >>>>>>>>> Bad Kernel: Time between samples ~ 70 Seconds (estimated) >>>>>>>>> Usage Above Below Above Below >>>>>>>>> state 0 231 0 231 0.00% 100.00% >>>>>>>>> state 1 5413 266 1186 4.91% 21.91% >>>>>>>>> state 2 54365 719 3789 1.32% 6.97% >>>>>>>>> state 3 0 0 0 0.00% 0.00% >>>>>>>>> state 4 267055 148327 0 55.54% 0.00% >>>>>>>>> >>>>>>>>> total 327064 149312 5206 47.24% <<< Misses >>>>>>>>> >>>>>>>>> Again, differing numbers of idle entries between tests. >>>>>>>>> This time the load distribution between CPUs is more >>>>>>>>> obvious. In the "Bad" case most work is done on 2 or 3 CPU's. >>>>>>>>> In the "Good" case the work is distributed over more CPUs. >>>>>>>>> I assume without proof, that the scheduler is deciding not to migrate >>>>>>>>> the next bit of work to another CPU in the one case verses the other. >>>>>>>> The above is incorrect. The CPUs involved between the "Good" >>>>>>>> and "Bad" tests are very similar, mainly 2 CPUs with a little of >>>>>>>> a 3rd and 4th. See the attached graph for more detail / clarity. >>>>>>>> >>>>>>>> All of the tests show higher usage of shallower idle states with >>>>>>>> the "Good" verses the "Bad", which was the expectation of the >>>>>>>> original patch, as has been mentioned a few times in the emails. >>>>>>>> >>>>>>>> My input is to revert the reversion. >>>>>>> OK, noted, thanks! >>>>>>> >>>>>>> Christian, what do you think? >>>>>> I've attached readable diffs of the values provided the tldr is: >>>>>> >>>>>> +--------------------+-----------+-----------+ >>>>>> | Workload | Δ above % | Δ below % | >>>>>> +--------------------+-----------+-----------+ >>>>>> | fio | -10.11 | +2.36 | >>>>>> | rds-stress-test | -0.44 | +2.57 | >>>>>> | jbb | -20.35 | +3.30 | >>>>>> | phoronix-sqlite | -9.66 | -0.61 | >>>>>> +--------------------+-----------+-----------+ >>>>>> >>>>>> I think the overall trend however is clear, the commit >>>>>> 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") >>>>>> improved menu on many systems and workloads, I'd dare to say most. >>>>>> >>>>>> Even on the reported regression introduced by it, the cpuidle governor >>>>>> performed better on paper, system metrics regressed because other >>>>>> CPUs' P-states weren't available due to being in a shallower state. >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!KSEGRBOHs7E_E4fRenT3y3MovrhDewsTY-E4lu1JCX0Py-r4GiEJefoLfcHrummpmvmeO_vp1beh-OO_MYxG9xLU0BuBunAS$ >>>>>> (+CC Sergey) >>>>>> It could be argued that this is a limitation of a per-CPU cpuidle >>>>>> governor and a more holistic approach would be needed for that platform >>>>>> (i.e. power/thermal-budget-sharing-CPUs want to use higher P-states, >>>>>> skew towards deeper cpuidle states). >>>>>> >>>>>> I also think that the change made sense, for small residency values >>>>>> with a bit of random noise mixed in, performing the same statistical >>>>>> test doesn't seem sensible, the short intervals will look noisier. >>>>>> >>>>>> So options are: >>>>>> 1. Revert revert on mainline+stable >>>>>> 2. Revert revert on mainline only >>>>>> 3. Keep revert, miss out on the improvement for many. >>>>>> 4. Revert only when we have a good solution for the platforms like >>>>>> Sergey's. >>>>>> >>>>>> I'd lean towards 2 because 4 won't be easy, unless of course a minor >>>>>> hack like playing with the deep idle state residency values would >>>>>> be enough to mitigate. >>>>> Wouldn't it be better to choose option 1 as reverting the revert has >>>>> even more pronounced improvements on older kernels? I've tested this on >>>>> 6.12, 5.15 and 5.4 stable based kernels and found massive improvements. >>>>> Since the revert has optimizations present only in Jasper Lake Systems >>>>> which is new, isn't reverting the revert more relevant on stable >>>>> kernels? It's more likely that older hardware runs older kernels than >>>>> newer hardware although not always necessary imo. >>>>> >>>> FWIW Jasper Lake seems to be supported from 5.6 on, see >>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") >>> Oh I see, but shouldn't avoiding regressions on established platforms be >>> a priority over further optimizing for specific newer platforms like >>> Jasper Lake? >>> >> Well avoiding regressions on established platforms is what lead to >> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" >> being applied and backported. >> The expectation for stable is that we avoid regressions and potentially >> miss out on improvements. If you want the latest greatest performance you >> should probably run a latest greatest kernel. >> The original >> 85975daeaa4d cpuidle: menu: Avoid discarding useful information >> was seen as a fix and overall improvement, that's why it was backported, >> but Sergey's regression report contradicted that. >> What is "established" and "newer" for a stable kernel is quite handwavy >> IMO but even here Sergey's regression report is a clear data point... >> Your report is only restoring 5.15 (and others) performance to 5.15 >> upstream-ish levels which is within the expectations of running a stable >> kernel. No doubt it's frustrating either way! > Ah but we see a regression on 5.15, 5.4 and 6.12 stable based kernels > with the revert and reapplying it recovers performance across many > benchmarks. Hence, the previous suggestion. Right but in the case of 5.15 (similar for the others) we have: v5.15.196 5666bcc3c00f Revert "cpuidle: menu: Avoid discarding useful information" v5.15.185 14790abc8779 cpuidle: menu: Avoid discarding useful information So your regression on v5.15.196 only restores performance back to pre-v5.15.185 levels, you now don't get an improvement which wasn't originally part of v5.15. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 9:31 ` Christian Loehle 2026-02-03 10:22 ` Harshvardhan Jha @ 2026-02-03 16:45 ` Rafael J. Wysocki 2026-02-05 0:45 ` Doug Smythies 1 sibling, 1 reply; 44+ messages in thread From: Rafael J. Wysocki @ 2026-02-03 16:45 UTC (permalink / raw) To: Christian Loehle Cc: Harshvardhan Jha, Rafael J. Wysocki, Doug Smythies, Sasha Levin, Greg Kroah-Hartman, linux-pm, stable, Daniel Lezcano, Sergey Senozhatsky On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 2/3/26 09:16, Harshvardhan Jha wrote: > > > > On 03/02/26 2:37 PM, Christian Loehle wrote: > >> On 2/2/26 17:31, Harshvardhan Jha wrote: [cut] > >> FWIW Jasper Lake seems to be supported from 5.6 on, see > >> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") > > > > Oh I see, but shouldn't avoiding regressions on established platforms be > > a priority over further optimizing for specific newer platforms like > > Jasper Lake? > > > > Well avoiding regressions on established platforms is what lead to > 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" > being applied and backported. > The expectation for stable is that we avoid regressions and potentially > miss out on improvements. If you want the latest greatest performance you > should probably run a latest greatest kernel. > The original > 85975daeaa4d cpuidle: menu: Avoid discarding useful information > was seen as a fix and overall improvement, Note, however, that commit 85975daeaa4d carries no Fixes: tag and no Cc: stable. It was picked up into stable kernels for another reason. > that's why it was backported, but Sergey's regression report contradicted that. Exactly. > What is "established" and "newer" for a stable kernel is quite handwavy > IMO but even here Sergey's regression report is a clear data point... Which wasn't known at the time commit 85975daeaa4d went in. > Your report is only restoring 5.15 (and others) performance to 5.15 > upstream-ish levels which is within the expectations of running a stable > kernel. No doubt it's frustrating either way! That is a consequence of the time it takes for mainline changes to propagate to distributions (Chrome OS in this particular case) at which point they get tested on a wider range of systems. Until that happens, it is not really guaranteed that the given change will stay in. In this particular case, restoring commit 85975daeaa4d would cause the same problems on the systems adversely affected by it to become visible again and I don't think it would be fair to say "Too bad" to the users of those systems. IMV, it cannot be restored without a way to at least limit the adverse effect on performance. I have an idea to test, but getting something workable out of it may be a challenge, even if it turns out to be a good one. ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-03 16:45 ` Rafael J. Wysocki @ 2026-02-05 0:45 ` Doug Smythies 2026-02-05 2:37 ` Sergey Senozhatsky 2026-02-05 5:02 ` Doug Smythies 0 siblings, 2 replies; 44+ messages in thread From: Doug Smythies @ 2026-02-05 0:45 UTC (permalink / raw) To: 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sergey Senozhatsky' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies On 2026.02.03 08:46 Rafael J. Wysocki wrote: -----Original Message----- From: Rafael J. Wysocki <rafael@kernel.org> Sent: February 3, 2026 8:46 AM To: Christian Loehle <christian.loehle@arm.com> Cc: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>; Rafael J. Wysocki <rafael@kernel.org>; Doug Smythies <dsmythies@telus.net>; Sasha Levin <sashal@kernel.org>; Greg Kroah-Hartman <gregkh@linuxfoundation.org>; linux-pm@vger.kernel.org; stable@vger.kernel.org; Daniel Lezcano <daniel.lezcano@linaro.org>; Sergey Senozhatsky <senozhatsky@chromium.org> Subject: Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS > On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote: >> On 2/3/26 09:16, Harshvardhan Jha wrote: >>> On 03/02/26 2:37 PM, Christian Loehle wrote: >>>> On 2/2/26 17:31, Harshvardhan Jha wrote: > > [cut] > >>>> FWIW Jasper Lake seems to be supported from 5.6 on, see >>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") >>> >>> Oh I see, but shouldn't avoiding regressions on established platforms be >>> a priority over further optimizing for specific newer platforms like >>> Jasper Lake? >>> >> Well avoiding regressions on established platforms is what lead to >> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" >> being applied and backported. >> The expectation for stable is that we avoid regressions and potentially >> miss out on improvements. If you want the latest greatest performance you >> should probably run a latest greatest kernel. >> The original >> 85975daeaa4d cpuidle: menu: Avoid discarding useful information >> was seen as a fix and overall improvement, > > Note, however, that commit 85975daeaa4d carries no Fixes: tag and no > Cc: stable. It was picked up into stable kernels for another reason. > >> that's why it was backported, but Sergey's regression report contradicted that. > > Exactly. > >> What is "established" and "newer" for a stable kernel is quite handwavy >> IMO but even here Sergey's regression report is a clear data point... > > Which wasn't known at the time commit 85975daeaa4d went in. > >> Your report is only restoring 5.15 (and others) performance to 5.15 >> upstream-ish levels which is within the expectations of running a stable >> kernel. No doubt it's frustrating either way! > > That is a consequence of the time it takes for mainline changes to > propagate to distributions (Chrome OS in this particular case) at > which point they get tested on a wider range of systems. Until that > happens, it is not really guaranteed that the given change will stay > in. > > In this particular case, restoring commit 85975daeaa4d would cause the > same problems on the systems adversely affected by it to become > visible again and I don't think it would be fair to say "Too bad" to > the users of those systems. IMV, it cannot be restored without a way > to at least limit the adverse effect on performance. I have been going over the old emails and the turbostat data again and again and again. I still do not understand how to breakdown Sergey's results into its component contributions. I am certain there is power limit throttling during the test, but have no idea to much or how little it contributes to the differing results. I think more work is needed to fully understand Sergey's test results from October. I struggle with the dramatic test results difference of base=84.5 and revert=59.5 as being due to only the idle code changes. That is why I keep asking for a test to be done with the CPU clock frequency limited such that power limit throttling can not occur. I don't know what limit to use, but suggest 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results. @Sergey: are you willing to do the test? > > I have an idea to test, but getting something workable out of it may > be a challenge, even if it turns out to be a good one. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 0:45 ` Doug Smythies @ 2026-02-05 2:37 ` Sergey Senozhatsky 2026-02-05 5:18 ` Doug Smythies 2026-02-05 7:15 ` Christian Loehle 2026-02-05 5:02 ` Doug Smythies 1 sibling, 2 replies; 44+ messages in thread From: Sergey Senozhatsky @ 2026-02-05 2:37 UTC (permalink / raw) To: Doug Smythies Cc: 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sergey Senozhatsky', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano' On (26/02/04 16:45), Doug Smythies wrote: > >> What is "established" and "newer" for a stable kernel is quite handwavy > >> IMO but even here Sergey's regression report is a clear data point... > > > > Which wasn't known at the time commit 85975daeaa4d went in. > > > >> Your report is only restoring 5.15 (and others) performance to 5.15 > >> upstream-ish levels which is within the expectations of running a stable > >> kernel. No doubt it's frustrating either way! > > > > That is a consequence of the time it takes for mainline changes to > > propagate to distributions (Chrome OS in this particular case) at > > which point they get tested on a wider range of systems. Until that > > happens, it is not really guaranteed that the given change will stay > > in. > > > > In this particular case, restoring commit 85975daeaa4d would cause the > > same problems on the systems adversely affected by it to become > > visible again and I don't think it would be fair to say "Too bad" to > > the users of those systems. IMV, it cannot be restored without a way > > to at least limit the adverse effect on performance. > > I have been going over the old emails and the turbostat data again and again > and again. > > I still do not understand how to breakdown Sergey's results into its > component contributions. I am certain there is power limit throttling > during the test, but have no idea to much or how little it contributes to the > differing results. > > I think more work is needed to fully understand Sergey's test results from October. > I struggle with the dramatic test results difference of base=84.5 and revert=59.5 > as being due to only the idle code changes. > > That is why I keep asking for a test to be done with the CPU clock frequency limited > such that power limit throttling can not occur. I don't know what limit to use, but suggest > 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results. > @Sergey: are you willing to do the test? I can run tests, not immediately, though, but within some reasonable time frame. (I'll need some help with instructions/etc.) ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 2:37 ` Sergey Senozhatsky @ 2026-02-05 5:18 ` Doug Smythies 2026-02-10 9:17 ` Sergey Senozhatsky 2026-02-05 7:15 ` Christian Loehle 1 sibling, 1 reply; 44+ messages in thread From: Doug Smythies @ 2026-02-05 5:18 UTC (permalink / raw) To: 'Sergey Senozhatsky' Cc: 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies On 2026.02.04 18:37 Sergey Senozhatsky wrote: > On (26/02/04 16:45), Doug Smythies wrote: >>>> What is "established" and "newer" for a stable kernel is quite handwavy >>>> IMO but even here Sergey's regression report is a clear data point... >>> >>> Which wasn't known at the time commit 85975daeaa4d went in. >>> >>>> Your report is only restoring 5.15 (and others) performance to 5.15 >>>> upstream-ish levels which is within the expectations of running a stable >>>> kernel. No doubt it's frustrating either way! >>> >>> That is a consequence of the time it takes for mainline changes to >>> propagate to distributions (Chrome OS in this particular case) at >>> which point they get tested on a wider range of systems. Until that >>> happens, it is not really guaranteed that the given change will stay >>> in. >>> >>> In this particular case, restoring commit 85975daeaa4d would cause the >>> same problems on the systems adversely affected by it to become >>> visible again and I don't think it would be fair to say "Too bad" to >>> the users of those systems. IMV, it cannot be restored without a way >>> to at least limit the adverse effect on performance. >> >> I have been going over the old emails and the turbostat data again and again >> and again. >> >> I still do not understand how to breakdown Sergey's results into its >> component contributions. I am certain there is power limit throttling >> during the test, but have no idea to much or how little it contributes to the >> differing results. >> >> I think more work is needed to fully understand Sergey's test results from October. >> I struggle with the dramatic test results difference of base=84.5 and revert=59.5 >> as being due to only the idle code changes. >> >> That is why I keep asking for a test to be done with the CPU clock frequency limited >> such that power limit throttling can not occur. I don't know what limit to use, but suggest >> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results. > >> @Sergey: are you willing to do the test? > > I can run tests, not immediately, though, but within some reasonable > time frame. Thanks. > (I'll need some help with instructions/etc.) From your turbostat data from October you are using the intel_pstate CPU frequency scaling driver and the powersave CPU frequency scaling governor and HWP enabled. Also your maximum CPU frequency is 3,300 MHz. To limit the maximum CPU frequency to around 2,200 MHz do: echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct Then run the tests acquiring turbostat logs the same way you did in October. To restore the maximum CPU frequency afterwards do: echo 100 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 5:18 ` Doug Smythies @ 2026-02-10 9:17 ` Sergey Senozhatsky 2026-02-11 4:27 ` Doug Smythies 0 siblings, 1 reply; 44+ messages in thread From: Sergey Senozhatsky @ 2026-02-10 9:17 UTC (permalink / raw) To: Doug Smythies Cc: 'Sergey Senozhatsky', 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano' [-- Attachment #1: Type: text/plain, Size: 774 bytes --] On (26/02/04 21:18), Doug Smythies wrote: > > > (I'll need some help with instructions/etc.) > > From your turbostat data from October you are using the intel_pstate > CPU frequency scaling driver and the powersave CPU frequency scaling governor > and HWP enabled. Also your maximum CPU frequency is 3,300 MHz. > > To limit the maximum CPU frequency to around 2,200 MHz do: > > echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct So I only set max_perf_pct echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct and re-ran the test: revert: 72.5 base: 82.5 // as before "revert" means revert of "cpuidle: menu: Avoid discarding useful // information", while base means that the patch is applied. Please find turbostat logs attached. [-- Attachment #2: turbostat-base.gz --] [-- Type: application/gzip, Size: 40773 bytes --] [-- Attachment #3: turbostat-revert.gz --] [-- Type: application/gzip, Size: 39194 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 9:17 ` Sergey Senozhatsky @ 2026-02-11 4:27 ` Doug Smythies 0 siblings, 0 replies; 44+ messages in thread From: Doug Smythies @ 2026-02-11 4:27 UTC (permalink / raw) To: 'Sergey Senozhatsky' Cc: 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies On 2026.02.10 01:17 Sergey Senozhatsky wrote: > On (26/02/04 21:18), Doug Smythies wrote: >> >>> (I'll need some help with instructions/etc.) >> >> From your turbostat data from October you are using the intel_pstate >> CPU frequency scaling driver and the powersave CPU frequency scaling governor >> and HWP enabled. Also your maximum CPU frequency is 3,300 MHz. >> >> To limit the maximum CPU frequency to around 2,200 MHz do: >> >> echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct > > So I only set max_perf_pct > > echo 67 |sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct > > and re-ran the test: > > revert: 72.5 > base: 82.5 > > // as before "revert" means revert of "cpuidle: menu: Avoid discarding useful > // information", while base means that the patch is applied. > > Please find turbostat logs attached. Than you for doing the test. So now the turbostat data shows that the processor never gets anywhere near any throttling threshold, eliminating it form consideration. ... Doug ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 2:37 ` Sergey Senozhatsky 2026-02-05 5:18 ` Doug Smythies @ 2026-02-05 7:15 ` Christian Loehle 2026-02-10 8:02 ` Sergey Senozhatsky 1 sibling, 1 reply; 44+ messages in thread From: Christian Loehle @ 2026-02-05 7:15 UTC (permalink / raw) To: Sergey Senozhatsky, Doug Smythies Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano' On 2/5/26 02:37, Sergey Senozhatsky wrote: > On (26/02/04 16:45), Doug Smythies wrote: >>>> What is "established" and "newer" for a stable kernel is quite handwavy >>>> IMO but even here Sergey's regression report is a clear data point... >>> >>> Which wasn't known at the time commit 85975daeaa4d went in. >>> >>>> Your report is only restoring 5.15 (and others) performance to 5.15 >>>> upstream-ish levels which is within the expectations of running a stable >>>> kernel. No doubt it's frustrating either way! >>> >>> That is a consequence of the time it takes for mainline changes to >>> propagate to distributions (Chrome OS in this particular case) at >>> which point they get tested on a wider range of systems. Until that >>> happens, it is not really guaranteed that the given change will stay >>> in. >>> >>> In this particular case, restoring commit 85975daeaa4d would cause the >>> same problems on the systems adversely affected by it to become >>> visible again and I don't think it would be fair to say "Too bad" to >>> the users of those systems. IMV, it cannot be restored without a way >>> to at least limit the adverse effect on performance. >> >> I have been going over the old emails and the turbostat data again and again >> and again. >> >> I still do not understand how to breakdown Sergey's results into its >> component contributions. I am certain there is power limit throttling >> during the test, but have no idea to much or how little it contributes to the >> differing results. >> >> I think more work is needed to fully understand Sergey's test results from October. >> I struggle with the dramatic test results difference of base=84.5 and revert=59.5 >> as being due to only the idle code changes. >> >> That is why I keep asking for a test to be done with the CPU clock frequency limited >> such that power limit throttling can not occur. I don't know what limit to use, but suggest >> 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results. > > >> @Sergey: are you willing to do the test? > > I can run tests, not immediately, though, but within some reasonable > time frame. > > (I'll need some help with instructions/etc.) > @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean 29.6% decrease in system performance in a traditional throughput sense. The "benchmark" might me measuring dropped frames, user input latency or what have you. Nonetheless @Sergey do feel free to expand. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 7:15 ` Christian Loehle @ 2026-02-10 8:02 ` Sergey Senozhatsky 2026-02-10 8:57 ` Christian Loehle 0 siblings, 1 reply; 44+ messages in thread From: Sergey Senozhatsky @ 2026-02-10 8:02 UTC (permalink / raw) To: Christian Loehle Cc: Sergey Senozhatsky, Doug Smythies, 'Rafael J. Wysocki', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano' On (26/02/05 07:15), Christian Loehle wrote: [..] > @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean > 29.6% decrease in system performance in a traditional throughput sense. > The "benchmark" might me measuring dropped frames, user input latency or what have you. > Nonetheless @Sergey do feel free to expand. I'm not on the performance team and I don't define those metrics, so I can't really comment. But frame drops during Google Docs scrolling, for instance, or typing is a user visible regression, that people tend to notice. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 8:02 ` Sergey Senozhatsky @ 2026-02-10 8:57 ` Christian Loehle 2026-02-11 4:27 ` Doug Smythies 0 siblings, 1 reply; 44+ messages in thread From: Christian Loehle @ 2026-02-10 8:57 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Doug Smythies, 'Rafael J. Wysocki', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano' On 2/10/26 08:02, Sergey Senozhatsky wrote: > On (26/02/05 07:15), Christian Loehle wrote: > [..] >> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean >> 29.6% decrease in system performance in a traditional throughput sense. >> The "benchmark" might me measuring dropped frames, user input latency or what have you. >> Nonetheless @Sergey do feel free to expand. > > I'm not on the performance team and I don't define those metrics, so > I can't really comment. But frame drops during Google Docs scrolling, > for instance, or typing is a user visible regression, that people tend > to notice. Yeah I guess that was my point already, i.e. it isn't implausible that e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in double the number of dropped frames (= score reduction of 50%). Everything just an example but don't be thrown off by the 29.6% reduction in score and expect to go looking for -29.6% cpu frequency (like you would expect for many purely cpubound benchmarks). ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 8:57 ` Christian Loehle @ 2026-02-11 4:27 ` Doug Smythies 0 siblings, 0 replies; 44+ messages in thread From: Doug Smythies @ 2026-02-11 4:27 UTC (permalink / raw) To: 'Christian Loehle', 'Sergey Senozhatsky' Cc: 'Rafael J. Wysocki', 'Harshvardhan Jha', 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies On 2026.02.10 00:57 Christian Loehle wrote: > On 2/10/26 08:02, Sergey Senozhatsky wrote: >> On (26/02/05 07:15), Christian Loehle wrote: >> [..] >>> @Doug given this is on Chromebooks base=84.5 and revert=59.5 doesn't necessarily mean >>> 29.6% decrease in system performance in a traditional throughput sense. >>> The "benchmark" might me measuring dropped frames, user input latency or what have you. >>> Nonetheless @Sergey do feel free to expand. >> >> I'm not on the performance team and I don't define those metrics, so >> I can't really comment. But frame drops during Google Docs scrolling, >> for instance, or typing is a user visible regression, that people tend >> to notice. > > Yeah I guess that was my point already, i.e. it isn't implausible that > e.g. a frequency reduction from 2.2GHz to 2.0GHz (-10%) might result in > double the number of dropped frames (= score reduction of 50%). > Everything just an example but don't be thrown off by the 29.6% reduction in > score and expect to go looking for -29.6% cpu frequency (like you would expect > for many purely cpubound benchmarks). Thanks for the inputs. Agreed. In my defense, I was just attempting to extract whatever I could from the very limited data we had. ... Doug ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 0:45 ` Doug Smythies 2026-02-05 2:37 ` Sergey Senozhatsky @ 2026-02-05 5:02 ` Doug Smythies 2026-02-10 9:33 ` Xueqin Luo 1 sibling, 1 reply; 44+ messages in thread From: Doug Smythies @ 2026-02-05 5:02 UTC (permalink / raw) To: 'Rafael J. Wysocki', 'Christian Loehle', 'Harshvardhan Jha', 'Sergey Senozhatsky' Cc: 'Sasha Levin', 'Greg Kroah-Hartman', linux-pm, stable, 'Daniel Lezcano', Doug Smythies [-- Attachment #1: Type: text/plain, Size: 3851 bytes --] On 2026.02.04 16:45 Doug Smythies wrote: > On 2026.02.03 08:46 Rafael J. Wysocki wrote: >> On Tue, Feb 3, 2026 at 10:31 AM Christian Loehle wrote: >>> On 2/3/26 09:16, Harshvardhan Jha wrote: >>>> On 03/02/26 2:37 PM, Christian Loehle wrote: >>>>> On 2/2/26 17:31, Harshvardhan Jha wrote: >> >> [cut] >> >>>>> FWIW Jasper Lake seems to be supported from 5.6 on, see >>>>> b2d32af0bff4 ("x86/cpu: Add Jasper Lake to Intel family") >>>> >>>> Oh I see, but shouldn't avoiding regressions on established platforms be >>>> a priority over further optimizing for specific newer platforms like >>>> Jasper Lake? >>>> >>> Well avoiding regressions on established platforms is what lead to >>> 10fad4012234 Revert "cpuidle: menu: Avoid discarding useful information" >>> being applied and backported. >>> The expectation for stable is that we avoid regressions and potentially >>> miss out on improvements. If you want the latest greatest performance you >>> should probably run a latest greatest kernel. >>> The original >>> 85975daeaa4d cpuidle: menu: Avoid discarding useful information >>> was seen as a fix and overall improvement, >> >> Note, however, that commit 85975daeaa4d carries no Fixes: tag and no >> Cc: stable. It was picked up into stable kernels for another reason. >> >>> that's why it was backported, but Sergey's regression report contradicted that. >> >> Exactly. >> >>> What is "established" and "newer" for a stable kernel is quite handwavy >>> IMO but even here Sergey's regression report is a clear data point... >> >> Which wasn't known at the time commit 85975daeaa4d went in. >> >>> Your report is only restoring 5.15 (and others) performance to 5.15 >>> upstream-ish levels which is within the expectations of running a stable >>> kernel. No doubt it's frustrating either way! >> >> That is a consequence of the time it takes for mainline changes to >> propagate to distributions (Chrome OS in this particular case) at >> which point they get tested on a wider range of systems. Until that >> happens, it is not really guaranteed that the given change will stay >> in. >> >> In this particular case, restoring commit 85975daeaa4d would cause the >> same problems on the systems adversely affected by it to become >> visible again and I don't think it would be fair to say "Too bad" to >> the users of those systems. IMV, it cannot be restored without a way >> to at least limit the adverse effect on performance. > > I have been going over the old emails and the turbostat data again and again > and again. > > I still do not understand how to breakdown Sergey's results into its > component contributions. I am certain there is power limit throttling > during the test, but have no idea to much or how little it contributes to the > differing results. > > I think more work is needed to fully understand Sergey's test results from October. > I struggle with the dramatic test results difference of base=84.5 and revert=59.5 > as being due to only the idle code changes. > > That is why I keep asking for a test to be done with the CPU clock frequency limited > such that power limit throttling can not occur. I don't know what limit to use, but suggest > 2.2 GHZ to start with. Capture turbostat data with the tests. And record the test results. > @Sergey: are you willing to do the test? Further to my earlier email, there is an interesting area in Sergey's turbostat data from October. It is not near any throttling threshold, yet the CPU frequencies are considerably higher for the "revert" case verses "base", with everything seemingly similar. I do not understand this area. An annotated graph is attached. >> I have an idea to test, but getting something workable out of it may >> be a challenge, even if it turns out to be a good one. [-- Attachment #2: interesting-area.png --] [-- Type: image/png, Size: 237347 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-05 5:02 ` Doug Smythies @ 2026-02-10 9:33 ` Xueqin Luo 2026-02-10 10:04 ` Sergey Senozhatsky 2026-02-10 14:20 ` Rafael J. Wysocki 0 siblings, 2 replies; 44+ messages in thread From: Xueqin Luo @ 2026-02-10 9:33 UTC (permalink / raw) To: dsmythies Cc: christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable, Xueqin Luo Hi Doug, Rafael, and all, I would like to share an additional data point from a different platform that also shows a power regression associated with commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information"). The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test scenario is system idle power measurement. Below are the cpuidle statistics for CPU1. Other CPU cores show the same trend. With commit 85975daeaa4d applied (base): State Ratio Avg(ms) Count Over-pred Under-pred ----------------------------------------------------------------- POLL 0.0% 0.10 68 0.0% (0) 100.0% (68) C1 0.05% 0.82 187 0.0% (0) 61.5% (115) C2 0.0% 0.50 23 30.4% (7) 69.6% (16) C3 0.01% 0.59 35 37.1% (13) 62.9% (22) C4 0.24% 0.65 1092 46.9% (512) 50.0% (546) C5 81.88% 1.45 169745 52.7% (89450) 0.0% (0) After reverting the commit (revert): State Ratio Avg(ms) Count Over-pred Under-pred ----------------------------------------------------------------- POLL 0.0% 0.09 24 0.0% (0) 100.0% (24) C1 0.03% 0.44 222 0.0% (0) 57.2% (127) C2 0.01% 0.44 50 20.0% (10) 74.0% (37) C3 0.01% 0.49 43 25.6% (11) 60.5% (26) C4 0.15% 0.52 875 45.1% (395) 41.4% (362) C5 97.1% 5.30 55099 13.9% (7645) 0.0% (0) With commit 85975daeaa4d present: * C5 entry count is very high * C5 average residency is short (~1.45 ms) * Over-prediction ratio for C5 is around 50% After reverting the commit: * C5 residency ratio exceeds 90% * C5 average residency increases to ~5 ms * Entry count drops significantly * Over-prediction ratio is greatly reduced This indicates that the commit makes idle residency more fragmented, leading to more frequent C-state transitions. In addition to the cpuidle statistics, measured system idle power is about 2W higher when this commit is applied. Thanks, Xueqin Luo -- 2.43.0 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 9:33 ` Xueqin Luo @ 2026-02-10 10:04 ` Sergey Senozhatsky 2026-02-10 14:24 ` Rafael J. Wysocki 2026-02-10 14:20 ` Rafael J. Wysocki 1 sibling, 1 reply; 44+ messages in thread From: Sergey Senozhatsky @ 2026-02-10 10:04 UTC (permalink / raw) To: Xueqin Luo Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable On (26/02/10 17:33), Xueqin Luo wrote: > > In addition to the cpuidle statistics, measured system idle power is > about 2W higher when this commit is applied. > We also noticed shorted battery life on some of the affected laptops. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 10:04 ` Sergey Senozhatsky @ 2026-02-10 14:24 ` Rafael J. Wysocki 2026-02-11 1:34 ` Sergey Senozhatsky 2026-02-11 8:58 ` Xueqin Luo 0 siblings, 2 replies; 44+ messages in thread From: Rafael J. Wysocki @ 2026-02-10 14:24 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Xueqin Luo, dsmythies, christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, rafael, sashal, stable On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (26/02/10 17:33), Xueqin Luo wrote: > > > > In addition to the cpuidle statistics, measured system idle power is > > about 2W higher when this commit is applied. > > > > We also noticed shorted battery life on some of the affected laptops. Was the difference significant? Overall, this clearly is a "help some - hurt some" situation and I am not at all convinced that restoring the commit in question is a good idea (with all due respect to everyone who thinks otherwise or got better results when it was there). Honestly, I'd rather stop tweaking the menu governor at this point and kindly ask people who want to sacrifice some energy for more performance to try the teo governor instead. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 14:24 ` Rafael J. Wysocki @ 2026-02-11 1:34 ` Sergey Senozhatsky 2026-02-11 4:17 ` Doug Smythies 2026-02-11 8:58 ` Xueqin Luo 1 sibling, 1 reply; 44+ messages in thread From: Sergey Senozhatsky @ 2026-02-11 1:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Sergey Senozhatsky, Xueqin Luo, dsmythies, christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, sashal, stable On (26/02/10 15:24), Rafael J. Wysocki wrote: > On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky > <senozhatsky@chromium.org> wrote: > > > > On (26/02/10 17:33), Xueqin Luo wrote: > > > > > > In addition to the cpuidle statistics, measured system idle power is > > > about 2W higher when this commit is applied. > > > > > > > We also noticed shorted battery life on some of the affected laptops. > > Was the difference significant? I think I saw up to "5.16% regression in perf.minutes_battery_life" ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-11 1:34 ` Sergey Senozhatsky @ 2026-02-11 4:17 ` Doug Smythies 0 siblings, 0 replies; 44+ messages in thread From: Doug Smythies @ 2026-02-11 4:17 UTC (permalink / raw) To: 'Sergey Senozhatsky', 'Rafael J. Wysocki' Cc: 'Xueqin Luo', christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, sashal, stable, Doug Smythies On 2026.02.10 17:34 Sergey Senozhatsky wrote: > On (26/02/10 15:24), Rafael J. Wysocki wrote: >> On Tue, Feb 10, 2026 at 11:04 AM Sergey Senozhatsky wrote: >>> On (26/02/10 17:33), Xueqin Luo wrote: >>>> >>>> In addition to the cpuidle statistics, measured system idle power is >>>> about 2W higher when this commit is applied. >>>> >>> >>> We also noticed shorted battery life on some of the affected laptops. >> >> Was the difference significant? > > I think I saw up to "5.16% regression in perf.minutes_battery_life" Note: I get a fair bit of noise on my idle tests of recent. For what it's worth: On my test computer I got: kernel 6.19-rc8 and with reapply. Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs. CPU frequency scaling driver: intel_pstate. CPU frequency scaling governor: powersave. HWP: enabled. idle governor menu. Git Branch "doug" 3856f38e5bb9 (HEAD -> doug) Reapply "cpuidle: menu: Avoid discarding useful information" 18f7fcd5e69a (tag: v6.19-rc8) Linux 6.19-rc8 Processor Package Power: average reapply: 140 minutes: 1.783028571 average rc8: 512 minutes: 1.656212891 7.65% more power ^ permalink raw reply [flat|nested] 44+ messages in thread
* Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 14:24 ` Rafael J. Wysocki 2026-02-11 1:34 ` Sergey Senozhatsky @ 2026-02-11 8:58 ` Xueqin Luo 1 sibling, 0 replies; 44+ messages in thread From: Xueqin Luo @ 2026-02-11 8:58 UTC (permalink / raw) To: rafael Cc: christian.loehle, daniel.lezcano, dsmythies, gregkh, harshvardhan.j.jha, linux-pm, luoxueqin, sashal, senozhatsky, stable On this platform (ZHAOXIN KaiXian KX-7000), we evaluated the impact of commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") under a screen-on idle scenario. During testing, the cpufreq driver was acpi-cpufreq and the scaling governor was set to ondemand. With this commit applied, measured system idle power increases by approximately 2W compared to the revert case. In addition, battery life testing on the same system shows a reduction of roughly 80 minutes when this commit is present. These results were consistently reproduced across multiple test runs under identical conditions. -- 2.43.0 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS 2026-02-10 9:33 ` Xueqin Luo 2026-02-10 10:04 ` Sergey Senozhatsky @ 2026-02-10 14:20 ` Rafael J. Wysocki 1 sibling, 0 replies; 44+ messages in thread From: Rafael J. Wysocki @ 2026-02-10 14:20 UTC (permalink / raw) To: Xueqin Luo Cc: dsmythies, christian.loehle, daniel.lezcano, gregkh, harshvardhan.j.jha, linux-pm, rafael, sashal, senozhatsky, stable On Tue, Feb 10, 2026 at 10:33 AM Xueqin Luo <luoxueqin@kylinos.cn> wrote: > > Hi Doug, Rafael, and all, > > I would like to share an additional data point from a different > platform that also shows a power regression associated with commit > 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information"). > > The test platform is a ZHAOXIN KaiXian KX-7000 processor. The test > scenario is system idle power measurement. > > Below are the cpuidle statistics for CPU1. Other CPU cores show the > same trend. > > With commit 85975daeaa4d applied (base): > > State Ratio Avg(ms) Count Over-pred Under-pred > ----------------------------------------------------------------- > POLL 0.0% 0.10 68 0.0% (0) 100.0% (68) > C1 0.05% 0.82 187 0.0% (0) 61.5% (115) > C2 0.0% 0.50 23 30.4% (7) 69.6% (16) > C3 0.01% 0.59 35 37.1% (13) 62.9% (22) > C4 0.24% 0.65 1092 46.9% (512) 50.0% (546) > C5 81.88% 1.45 169745 52.7% (89450) 0.0% (0) > > After reverting the commit (revert): > > State Ratio Avg(ms) Count Over-pred Under-pred > ----------------------------------------------------------------- > POLL 0.0% 0.09 24 0.0% (0) 100.0% (24) > C1 0.03% 0.44 222 0.0% (0) 57.2% (127) > C2 0.01% 0.44 50 20.0% (10) 74.0% (37) > C3 0.01% 0.49 43 25.6% (11) 60.5% (26) > C4 0.15% 0.52 875 45.1% (395) 41.4% (362) > C5 97.1% 5.30 55099 13.9% (7645) 0.0% (0) > > With commit 85975daeaa4d present: > > * C5 entry count is very high > * C5 average residency is short (~1.45 ms) > * Over-prediction ratio for C5 is around 50% > > After reverting the commit: > > * C5 residency ratio exceeds 90% > * C5 average residency increases to ~5 ms > * Entry count drops significantly > * Over-prediction ratio is greatly reduced > > This indicates that the commit makes idle residency more fragmented, > leading to more frequent C-state transitions. Thanks for the data! > In addition to the cpuidle statistics, measured system idle power is > about 2W higher when this commit is applied. Well, 2W of a difference is not good. I'm wondering what the idle power with the commit in question reverted is. ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2026-02-11 8:59 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha 2025-12-03 16:44 ` Christian Loehle 2025-12-03 22:30 ` Doug Smythies 2025-12-08 11:33 ` Harshvardhan Jha 2025-12-08 12:47 ` Christian Loehle 2026-01-13 7:06 ` Harshvardhan Jha 2026-01-13 14:13 ` Rafael J. Wysocki 2026-01-13 14:18 ` Rafael J. Wysocki 2026-01-14 4:28 ` Sergey Senozhatsky 2026-01-14 4:49 ` Sergey Senozhatsky 2026-01-14 5:15 ` Tomasz Figa 2026-01-14 20:07 ` Rafael J. Wysocki 2026-01-29 10:23 ` Harshvardhan Jha 2026-01-29 22:47 ` Doug Smythies 2026-01-27 15:45 ` Harshvardhan Jha 2026-01-28 5:06 ` Doug Smythies 2026-01-28 23:53 ` Doug Smythies 2026-01-29 22:27 ` Doug Smythies 2026-01-30 19:28 ` Rafael J. Wysocki 2026-02-01 19:20 ` Christian Loehle 2026-02-02 17:31 ` Harshvardhan Jha 2026-02-03 9:07 ` Christian Loehle 2026-02-03 9:16 ` Harshvardhan Jha 2026-02-03 9:31 ` Christian Loehle 2026-02-03 10:22 ` Harshvardhan Jha 2026-02-03 10:30 ` Christian Loehle 2026-02-03 16:45 ` Rafael J. Wysocki 2026-02-05 0:45 ` Doug Smythies 2026-02-05 2:37 ` Sergey Senozhatsky 2026-02-05 5:18 ` Doug Smythies 2026-02-10 9:17 ` Sergey Senozhatsky 2026-02-11 4:27 ` Doug Smythies 2026-02-05 7:15 ` Christian Loehle 2026-02-10 8:02 ` Sergey Senozhatsky 2026-02-10 8:57 ` Christian Loehle 2026-02-11 4:27 ` Doug Smythies 2026-02-05 5:02 ` Doug Smythies 2026-02-10 9:33 ` Xueqin Luo 2026-02-10 10:04 ` Sergey Senozhatsky 2026-02-10 14:24 ` Rafael J. Wysocki 2026-02-11 1:34 ` Sergey Senozhatsky 2026-02-11 4:17 ` Doug Smythies 2026-02-11 8:58 ` Xueqin Luo 2026-02-10 14:20 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox