From: "Doug Smythies" <dsmythies@telus.net>
To: "'Harshvardhan Jha'" <harshvardhan.j.jha@oracle.com>,
"'Christian Loehle'" <christian.loehle@arm.com>
Cc: "'Sasha Levin'" <sashal@kernel.org>,
"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>,
<linux-pm@vger.kernel.org>, <stable@vger.kernel.org>,
"'Rafael J. Wysocki'" <rafael@kernel.org>,
"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
"Doug Smythies" <dsmythies@telus.net>
Subject: RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
Date: Tue, 27 Jan 2026 21:06:40 -0800 [thread overview]
Message-ID: <003e01dc9013$e3bc5060$ab34f120$@telus.net> (raw)
In-Reply-To: <849ee0ff-e15b-4b69-84de-6503e3b3168d@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 8076 bytes --]
On 2026.01.27 07:45 Harshvardhan Jha wrote:
>On 08/12/25 6:17 PM, Christian Loehle wrote:
>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>
>>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>>> observed that several regressions across different benchmarks is being
>>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>>> information" for 5.15
>>>>>>
>>>>>> Regressions on 5.15.196 include:
>>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>>
>>>>>> The culprit commits' messages mention that these reverts were done due
>>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>>> current regressions. If this problem has been reported before and a fix
>>>>>> is in the works then please let me know I shall follow developments to
>>>>>> that mail thread.
>>>>> The discussion regarding this can be found here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>>>> we explored an alternative to the full revert here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>>> full revert you're seeing now.
>>>>>
>>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>>> in particular seems to favor deep idle states even when they don't seem
>>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>>> we're kind of stuck with that.
>>>>>
>>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>>
>>>>> That's the status, unless I missed anything?
>>>> By reading everything in the links Chrsitian provided, you can see
>>>> that we had difficulties repeating test results on other platforms.
>>>>
>>>> Of the tests listed herein, the only one that was easy to repeat on my
>>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>>
>>>> Kernel 6.18 Reverted
>>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>>>> performance performance performance performance
>>>> test what ave ave ave ave
>>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>>>
>>>> Where:
>>>> T/C means: Threads / Copies
>>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>>> Data points are in Seconds.
>>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>>> The reversion was included in kernel 6.18-rc3.
>>>> Kernel 6.18-rc4 had another code change to menu.c
>>>>
>>>> In case the formatting gets messed up, the table is also attached.
>>>>
>>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>>> HWP: Enabled.
>>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>>> after reapplying the revert on X6-2 systems.
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 46 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 56
>>> On-line CPU(s) list: 0-55
>>> Vendor ID: GenuineIntel
>>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>> CPU family: 6
>>> Model: 79
>>> Thread(s) per core: 2
>>> Core(s) per socket: 14
>>> Socket(s): 2
>>> Stepping: 1
>>> CPU(s) scaling MHz: 98%
>>> CPU max MHz: 2600.0000
>>> CPU min MHz: 1200.0000
>>> BogoMIPS: 5188.26
... snip ...
>> It would be nice to get the idle states here, ideally how the states' usage changed
>> from base to revert.
>> The mentioned thread did this and should show how it can be done, but a dump of
>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>> before and after the workload is usually fine to work with:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> states for the test suites before and after tests. The folders with the
> name of the test contain two folders good-kernel and bad-kernel
> containing two files having the before and after states. Please note
> that different machines were used for different test suites due to
> compatibility reasons. The jbb test was run using containers.
It is a considerable amount of work to manually extract and summarize the data.
I have only done it for the phoronix-sqlite data.
There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
I remember seeing a Linux-pm email about why but couldn't find it just now.
Summary (also attached as a PNG file, in case the formatting gets messed up):
The total idle entries (usage) and time seem low to me, which is why the ???.
phoronix-sqlite
Good Kernel: Time between samples 4 seconds (estimated and ???)
Usage Above Below Above Below
state 0 220 0 218 0.00% 99.09%
state 1 70212 5213 34602 7.42% 49.28%
state 2 30273 5237 1806 17.30% 5.97%
state 3 0 0 0 0.00% 0.00%
state 4 11824 2120 0 17.93% 0.00%
total 112529 12570 36626 43.72% <<< Misses %
Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
Usage Above Below Above Below
state 0 262 0 260 0.00% 99.24%
state 1 62751 3985 35588 6.35% 56.71%
state 2 24941 7896 1433 31.66% 5.75%
state 3 0 0 0 0.00% 0.00%
state 4 24489 11543 0 47.14% 0.00%
total 112443 23424 37281 53.99% <<< Misses %
Observe 2X use of idle state 4 for the "Bad Kernel"
I have a template now, and can summarize the other 40 CPU data
faster, but I would have to rework the template for the 56 CPU data,
and is it a 64 CPU data set at 4 idle states per CPU?
... Doug
[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]
next prev parent reply other threads:[~2026-01-28 5:06 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
2025-12-03 16:44 ` Christian Loehle
2025-12-03 22:30 ` Doug Smythies
2025-12-08 11:33 ` Harshvardhan Jha
2025-12-08 12:47 ` Christian Loehle
2026-01-13 7:06 ` Harshvardhan Jha
2026-01-13 14:13 ` Rafael J. Wysocki
2026-01-13 14:18 ` Rafael J. Wysocki
2026-01-14 4:28 ` Sergey Senozhatsky
2026-01-14 4:49 ` Sergey Senozhatsky
2026-01-14 5:15 ` Tomasz Figa
2026-01-14 20:07 ` Rafael J. Wysocki
2026-01-29 10:23 ` Harshvardhan Jha
2026-01-29 22:47 ` Doug Smythies
2026-01-27 15:45 ` Harshvardhan Jha
2026-01-28 5:06 ` Doug Smythies [this message]
2026-01-28 23:53 ` Doug Smythies
2026-01-29 22:27 ` Doug Smythies
2026-01-30 19:28 ` Rafael J. Wysocki
2026-02-01 19:20 ` Christian Loehle
2026-02-02 17:31 ` Harshvardhan Jha
2026-02-03 9:07 ` Christian Loehle
2026-02-03 9:16 ` Harshvardhan Jha
2026-02-03 9:31 ` Christian Loehle
2026-02-03 10:22 ` Harshvardhan Jha
2026-02-03 10:30 ` Christian Loehle
2026-02-03 16:45 ` Rafael J. Wysocki
2026-02-05 0:45 ` Doug Smythies
2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:18 ` Doug Smythies
2026-02-10 9:17 ` Sergey Senozhatsky
2026-02-11 4:27 ` Doug Smythies
2026-02-05 7:15 ` Christian Loehle
2026-02-10 8:02 ` Sergey Senozhatsky
2026-02-10 8:57 ` Christian Loehle
2026-02-11 4:27 ` Doug Smythies
2026-02-05 5:02 ` Doug Smythies
2026-02-10 9:33 ` Xueqin Luo
2026-02-10 10:04 ` Sergey Senozhatsky
2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-11 1:34 ` Sergey Senozhatsky
2026-02-11 4:17 ` Doug Smythies
2026-02-11 8:58 ` Xueqin Luo
2026-02-10 14:20 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='003e01dc9013$e3bc5060$ab34f120$@telus.net' \
--to=dsmythies@telus.net \
--cc=christian.loehle@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=gregkh@linuxfoundation.org \
--cc=harshvardhan.j.jha@oracle.com \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox