From: "Doug Smythies" <dsmythies@telus.net>
To: "'Harshvardhan Jha'" <harshvardhan.j.jha@oracle.com>,
"'Christian Loehle'" <christian.loehle@arm.com>
Cc: "'Sasha Levin'" <sashal@kernel.org>,
"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>,
<linux-pm@vger.kernel.org>, <stable@vger.kernel.org>,
"'Rafael J. Wysocki'" <rafael@kernel.org>,
"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
"Doug Smythies" <dsmythies@telus.net>
Subject: RE: Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS
Date: Tue, 27 Jan 2026 21:06:40 -0800 [thread overview]
Message-ID: <003e01dc9013$e3bc5060$ab34f120$@telus.net> (raw)
In-Reply-To: <849ee0ff-e15b-4b69-84de-6503e3b3168d@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 8076 bytes --]
On 2026.01.27 07:45 Harshvardhan Jha wrote:
>On 08/12/25 6:17 PM, Christian Loehle wrote:
>> On 12/8/25 11:33, Harshvardhan Jha wrote:
>>> On 04/12/25 4:00 AM, Doug Smythies wrote:
>>>> On 2025.12.03 08:45 Christian Loehle wrote:
>>>>> On 12/3/25 16:18, Harshvardhan Jha wrote:
>>>>>>
>>>>>> While running performance benchmarks for the 5.15.196 LTS tags , it was
>>>>>> observed that several regressions across different benchmarks is being
>>>>>> introduced when compared to the previous 5.15.193 kernel tag. Running an
>>>>>> automated bisect on both of them narrowed down the culprit commit to:
>>>>>> - 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
>>>>>> information" for 5.15
>>>>>>
>>>>>> Regressions on 5.15.196 include:
>>>>>> -9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
>>>>>> -6.3% : Phoronix system/sqlite on OnPrem X6-2
>>>>>> -18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
>>>>>> thread & 1M buffer size on OnPrem X6-2
>>>>>> -4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
>>>>>> 1 thread & 1M buffer size on OnPrem X6-2
>>>>>> Up to -30% : Some Netpipe metrics on OnPrem X5-2
>>>>>>
>>>>>> The culprit commits' messages mention that these reverts were done due
>>>>>> to performance regressions introduced in Intel Jasper Lake systems but
>>>>>> this revert is causing issues in other systems unfortunately. I wanted
>>>>>> to know the maintainers' opinion on how we should proceed in order to
>>>>>> fix this. If we reapply it'll bring back the previous regressions on
>>>>>> Jasper Lake systems and if we don't revert it then it's stuck with
>>>>>> current regressions. If this problem has been reported before and a fix
>>>>>> is in the works then please let me know I shall follow developments to
>>>>>> that mail thread.
>>>>> The discussion regarding this can be found here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA-b9PW7hw$
>>>>> we explored an alternative to the full revert here:
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/4687373.LvFx2qVVIh@rafael.j.wysocki/__;!!ACWV5N9M2RV99hQ!MWXEz_wRbaLyJxDign2EXci2qNzAPpCyhi8qIORMdReh0g_yIVIt-Oqov23KT23A_rGBRRxJ4bHb_e6UQA9PSf_uMQ$
>>>>> unfortunately that didn't lead anywhere useful, so Rafael went with the
>>>>> full revert you're seeing now.
>>>>>
>>>>> Ultimately it seems to me that this "aggressiveness" on deep idle tradeoffs
>>>>> will highly depend on your platform, but also your workload, Jasper Lake
>>>>> in particular seems to favor deep idle states even when they don't seem
>>>>> to be a 'good' choice from a purely cpuidle (governor) perspective, so
>>>>> we're kind of stuck with that.
>>>>>
>>>>> For teo we've discussed a tunable knob in the past, which comes naturally with
>>>>> the logic, for menu there's nothing obvious that would be comparable.
>>>>> But for teo such a knob didn't generate any further interest (so far).
>>>>>
>>>>> That's the status, unless I missed anything?
>>>> By reading everything in the links Chrsitian provided, you can see
>>>> that we had difficulties repeating test results on other platforms.
>>>>
>>>> Of the tests listed herein, the only one that was easy to repeat on my
>>>> test server, was the " Phoronix pts/sqlite" one. I got (summary: no difference):
>>>>
>>>> Kernel 6.18 Reverted
>>>> pts/sqlite-2.3.0 menu rc4 menu rc1 menu rc1 menu rc3
>>>> performance performance performance performance
>>>> test what ave ave ave ave
>>>> 1 T/C 1 2.147 -0.2% 2.143 0.0% 2.16 -0.8% 2.156 -0.6%
>>>> 2 T/C 2 3.468 0.1% 3.473 0.0% 3.486 -0.4% 3.478 -0.1%
>>>> 3 T/C 4 4.336 0.3% 4.35 0.0% 4.355 -0.1% 4.354 -0.1%
>>>> 4 T/C 8 5.438 -0.1% 5.434 0.0% 5.456 -0.4% 5.45 -0.3%
>>>> 5 T/C 12 6.314 -0.2% 6.299 0.0% 6.307 -0.1% 6.29 0.1%
>>>>
>>>> Where:
>>>> T/C means: Threads / Copies
>>>> performance means: intel_pstate CPU frequency scaling driver and the performance CPU frequencay scaling governor.
>>>> Data points are in Seconds.
>>>> Ave means the average test result. The number of runs per test was increased from the default of 3 to 10.
>>>> The reversion was manually applied to kernel 6.18-rc1 for that test.
>>>> The reversion was included in kernel 6.18-rc3.
>>>> Kernel 6.18-rc4 had another code change to menu.c
>>>>
>>>> In case the formatting gets messed up, the table is also attached.
>>>>
>>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
>>>> HWP: Enabled.
>>> I was able to recover performance on 5.15 and 5.4 LTS based kernels
>>> after reapplying the revert on X6-2 systems.
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 46 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 56
>>> On-line CPU(s) list: 0-55
>>> Vendor ID: GenuineIntel
>>> Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>> CPU family: 6
>>> Model: 79
>>> Thread(s) per core: 2
>>> Core(s) per socket: 14
>>> Socket(s): 2
>>> Stepping: 1
>>> CPU(s) scaling MHz: 98%
>>> CPU max MHz: 2600.0000
>>> CPU min MHz: 1200.0000
>>> BogoMIPS: 5188.26
... snip ...
>> It would be nice to get the idle states here, ideally how the states' usage changed
>> from base to revert.
>> The mentioned thread did this and should show how it can be done, but a dump of
>> cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*
>> before and after the workload is usually fine to work with:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-pm/8da42386-282e-4f97-af93-4715ae206361@arm.com/__;!!ACWV5N9M2RV99hQ!PEhkFcO7emFLMaNxWEoE2Gtnw3zSkpghP17iuEvZM3W6KUpmkbgKw_tr91FwGfpzm4oA5f7c5sz8PkYvKiEVwI_iLIPpMt53$
> Apologies for the late reply, I'm attaching a tar ball which has the cpu
> states for the test suites before and after tests. The folders with the
> name of the test contain two folders good-kernel and bad-kernel
> containing two files having the before and after states. Please note
> that different machines were used for different test suites due to
> compatibility reasons. The jbb test was run using containers.
It is a considerable amount of work to manually extract and summarize the data.
I have only done it for the phoronix-sqlite data.
There seems to be 40 CPUs, 5 idle states, with idle state 3 defaulting to disabled.
I remember seeing a Linux-pm email about why but couldn't find it just now.
Summary (also attached as a PNG file, in case the formatting gets messed up):
The total idle entries (usage) and time seem low to me, which is why the ???.
phoronix-sqlite
Good Kernel: Time between samples 4 seconds (estimated and ???)
Usage Above Below Above Below
state 0 220 0 218 0.00% 99.09%
state 1 70212 5213 34602 7.42% 49.28%
state 2 30273 5237 1806 17.30% 5.97%
state 3 0 0 0 0.00% 0.00%
state 4 11824 2120 0 17.93% 0.00%
total 112529 12570 36626 43.72% <<< Misses %
Bad Kernel: Time between samples 3.8 seconds (estimated and ???)
Usage Above Below Above Below
state 0 262 0 260 0.00% 99.24%
state 1 62751 3985 35588 6.35% 56.71%
state 2 24941 7896 1433 31.66% 5.75%
state 3 0 0 0 0.00% 0.00%
state 4 24489 11543 0 47.14% 0.00%
total 112443 23424 37281 53.99% <<< Misses %
Observe 2X use of idle state 4 for the "Bad Kernel"
I have a template now, and can summarize the other 40 CPU data
faster, but I would have to rework the template for the 56 CPU data,
and is it a 64 CPU data set at 4 idle states per CPU?
... Doug
[-- Attachment #2: sqlite-summary.png --]
[-- Type: image/png, Size: 37171 bytes --]
next prev parent reply other threads:[~2026-01-28 5:06 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 16:18 Performance regressions introduced via Revert "cpuidle: menu: Avoid discarding useful information" on 5.15 LTS Harshvardhan Jha
2025-12-03 16:44 ` Christian Loehle
2025-12-03 22:30 ` Doug Smythies
2025-12-08 11:33 ` Harshvardhan Jha
2025-12-08 12:47 ` Christian Loehle
2026-01-13 7:06 ` Harshvardhan Jha
2026-01-13 14:13 ` Rafael J. Wysocki
2026-01-13 14:18 ` Rafael J. Wysocki
2026-01-14 4:28 ` Sergey Senozhatsky
2026-01-14 4:49 ` Sergey Senozhatsky
2026-01-14 5:15 ` Tomasz Figa
2026-01-14 20:07 ` Rafael J. Wysocki
2026-01-29 10:23 ` Harshvardhan Jha
2026-01-29 22:47 ` Doug Smythies
2026-01-27 15:45 ` Harshvardhan Jha
2026-01-28 5:06 ` Doug Smythies [this message]
2026-01-28 23:53 ` Doug Smythies
2026-01-29 22:27 ` Doug Smythies
2026-01-30 19:28 ` Rafael J. Wysocki
2026-02-01 19:20 ` Christian Loehle
2026-02-02 17:31 ` Harshvardhan Jha
2026-02-03 9:07 ` Christian Loehle
2026-02-03 9:16 ` Harshvardhan Jha
2026-02-03 9:31 ` Christian Loehle
2026-02-03 10:22 ` Harshvardhan Jha
2026-02-03 10:30 ` Christian Loehle
2026-02-03 16:45 ` Rafael J. Wysocki
2026-02-05 0:45 ` Doug Smythies
2026-02-05 2:37 ` Sergey Senozhatsky
2026-02-05 5:18 ` Doug Smythies
2026-02-10 9:17 ` Sergey Senozhatsky
2026-02-11 4:27 ` Doug Smythies
2026-02-05 7:15 ` Christian Loehle
2026-02-10 8:02 ` Sergey Senozhatsky
2026-02-10 8:57 ` Christian Loehle
2026-02-11 4:27 ` Doug Smythies
2026-02-05 5:02 ` Doug Smythies
2026-02-10 9:33 ` Xueqin Luo
2026-02-10 10:04 ` Sergey Senozhatsky
2026-02-10 14:24 ` Rafael J. Wysocki
2026-02-11 1:34 ` Sergey Senozhatsky
2026-02-11 4:17 ` Doug Smythies
2026-02-11 8:58 ` Xueqin Luo
2026-02-10 14:20 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='003e01dc9013$e3bc5060$ab34f120$@telus.net' \
--to=dsmythies@telus.net \
--cc=christian.loehle@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=gregkh@linuxfoundation.org \
--cc=harshvardhan.j.jha@oracle.com \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.