From: Christian Loehle <christian.loehle@arm.com>
To: Aboorva Devarajan <aboorvad@linux.ibm.com>,
rafael@kernel.org, daniel.lezcano@linaro.org
Cc: gautam@linux.ibm.com, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/1] cpuidle: menu: Add residency threshold for non-polling state selection
Date: Tue, 9 Sep 2025 14:32:33 +0100 [thread overview]
Message-ID: <0b59d09e-a508-4bca-a110-ab2b12c2284a@arm.com> (raw)
In-Reply-To: <20250908075443.208570-1-aboorvad@linux.ibm.com>
On 9/8/25 08:54, Aboorva Devarajan wrote:
> On virtualized PowerPC (pseries) systems, where only one polling state
> (Snooze) and one deep state (CEDE) are available, selecting CEDE when
> the predicted idle duration exceeds the target residency of the CEDE
> state can hurt performance. In such cases, the entry/exit overhead of
> CEDE outweighs the power savings, leading to unnecessary state transitions
> and higher latency.
>
> Menu governor currently contains a special-case rule that prioritizes
> the first non-polling state over polling, even when its target residency
> is much longer than the predicted idle duration. On PowerPC/pseries,
> where the gap between the polling state (Snooze) and the first non-polling
> state (CEDE) is large, this behavior causes performance regressions.
>
> This patch refines the special case by adding an extra requirement:
> the first non-polling state may only be chosen if its
> target_residency_ns is below the defined RESIDENCY_THRESHOLD_NS. If this
> condition is not met, the non-polling state is not selected, and polling
> state is retained instead.
>
> This change is limited to the single special-case condition for the first
> non-polling state. The general state selection logic in the menu governor
> remains unchanged.
>
> Performance improvement observed with pgbench on PowerPC (pseries)
> system:
> +---------------------------+------------+------------+------------+
> | Metric | Baseline | Patched | Change (%) |
> +---------------------------+------------+------------+------------+
> | Transactions/sec (TPS) | 495,210 | 536,982 | +8.45% |
> | Avg latency (ms) | 0.163 | 0.150 | -7.98% |
> +---------------------------+------------+------------+------------+
> CPUIdle state usage:
> +--------------+--------------+-------------+
> | Metric | Baseline | Patched |
> +--------------+--------------+-------------+
> | Total usage | 12,735,820 | 13,918,442 |
> | Above usage | 11,401,520 | 1,598,210 |
> | Below usage | 20,145 | 702,395 |
> +--------------+--------------+-------------+
>
> Above/Total and Below/Total usage percentages which indicates
> mispredictions:
> +------------------------+-----------+---------+
> | Metric | Baseline | Patched |
> +------------------------+-----------+---------+
> | Above % (Above/Total) | 89.56% | 11.49% |
> | Below % (Below/Total) | 0.16% | 5.05% |
> | Total cpuidle miss (%) | 89.72% | 16.54% |
> +------------------------+-----------+---------+
>
> The results show that restricting non-polling state selection to
> cases where its residency is within the threshold reduces mispredictions,
> lowers unnecessary state transitions, and improves overall throughput.
>
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>
> v2: https://lore.kernel.org/all/20250317060357.29451-1-aboorvad@linux.ibm.com/
>
> Changes in v2 -> v3:
> - Modifed the patch following Rafael's feedback, incorporated a residency threshold check
> (s->target_residency_ns < RESIDENCY_THRESHOLD_NS) as suggested.
> - Updated commit message accordingly.
> ---
> drivers/cpuidle/governors/menu.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index b2e3d0b0a116..d25b04539109 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -316,11 +316,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>
> if (s->target_residency_ns > predicted_ns) {
> /*
> - * Use a physical idle state, not busy polling, unless
> - * a timer is going to trigger soon enough.
> + * Use a physical idle state instead of busy polling
> + * if the next timer doesn't expire soon and its
> + * target residency is below the residency threshold.
> */
> if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> - s->target_residency_ns <= data->next_timer_ns) {
> + s->target_residency_ns <= data->next_timer_ns &&
> + s->target_residency_ns < RESIDENCY_THRESHOLD_NS) {
> predicted_ns = s->target_residency_ns;
> idx = i;
> break;
To me that seems the least intrusive way the issue for your platform.
Rafael, can you live with this?
FWIW
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
next prev parent reply other threads:[~2025-09-09 13:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 7:54 [PATCH v3 1/1] cpuidle: menu: Add residency threshold for non-polling state selection Aboorva Devarajan
2025-09-09 13:32 ` Christian Loehle [this message]
2025-09-09 13:35 ` Rafael J. Wysocki
2025-09-10 10:47 ` Rafael J. Wysocki
2025-09-11 8:14 ` Aboorva Devarajan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b59d09e-a508-4bca-a110-ab2b12c2284a@arm.com \
--to=christian.loehle@arm.com \
--cc=aboorvad@linux.ibm.com \
--cc=daniel.lezcano@linaro.org \
--cc=gautam@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox