public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
To: rafael@kernel.org, christian.loehle@arm.com, daniel.lezcano@linaro.org
Cc: aboorvad@linux.ibm.com, gautam@linux.ibm.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v3 1/1] cpuidle: menu: Add residency threshold for non-polling state selection
Date: Mon,  8 Sep 2025 13:24:43 +0530	[thread overview]
Message-ID: <20250908075443.208570-1-aboorvad@linux.ibm.com> (raw)

On virtualized PowerPC (pseries) systems, where only one polling state
(Snooze) and one deep state (CEDE) are available, selecting CEDE when
the predicted idle duration exceeds the target residency of the CEDE
state can hurt performance. In such cases, the entry/exit overhead of
CEDE outweighs the power savings, leading to unnecessary state transitions
and higher latency.

Menu governor currently contains a special-case rule that prioritizes
the first non-polling state over polling, even when its target residency
is much longer than the predicted idle duration. On PowerPC/pseries,
where the gap between the polling state (Snooze) and the first non-polling
state (CEDE) is large, this behavior causes performance regressions.

This patch refines the special case by adding an extra requirement:
the first non-polling state may only be chosen if its
target_residency_ns is below the defined RESIDENCY_THRESHOLD_NS. If this
condition is not met, the non-polling state is not selected, and polling
state is retained instead.

This change is limited to the single special-case condition for the first
non-polling state. The general state selection logic in the menu governor
remains unchanged.

Performance improvement observed with pgbench on PowerPC (pseries)
system:
+---------------------------+------------+------------+------------+
| Metric                    | Baseline   | Patched    | Change (%) |
+---------------------------+------------+------------+------------+
| Transactions/sec (TPS)    | 495,210    | 536,982    | +8.45%     |
| Avg latency (ms)          | 0.163      | 0.150      | -7.98%     |
+---------------------------+------------+------------+------------+
CPUIdle state usage:
+--------------+--------------+-------------+
| Metric       | Baseline     | Patched     |
+--------------+--------------+-------------+
| Total usage  | 12,735,820   | 13,918,442  |
| Above usage  | 11,401,520   | 1,598,210   |
| Below usage  | 20,145       | 702,395     |
+--------------+--------------+-------------+

Above/Total and Below/Total usage percentages which indicates
mispredictions:
+------------------------+-----------+---------+
| Metric                 | Baseline  | Patched |
+------------------------+-----------+---------+
| Above % (Above/Total)  | 89.56%    | 11.49%  |
| Below % (Below/Total)  | 0.16%     | 5.05%   |
| Total cpuidle miss (%) | 89.72%    | 16.54%  |
+------------------------+-----------+---------+

The results show that restricting non-polling state selection to
cases where its residency is within the threshold reduces mispredictions,
lowers unnecessary state transitions, and improves overall throughput.

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---

v2: https://lore.kernel.org/all/20250317060357.29451-1-aboorvad@linux.ibm.com/

Changes in v2 -> v3:
  - Modifed the patch following Rafael's feedback, incorporated a residency threshold check
    (s->target_residency_ns < RESIDENCY_THRESHOLD_NS) as suggested.
  - Updated commit message accordingly.
---
 drivers/cpuidle/governors/menu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b2e3d0b0a116..d25b04539109 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -316,11 +316,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 
 		if (s->target_residency_ns > predicted_ns) {
 			/*
-			 * Use a physical idle state, not busy polling, unless
-			 * a timer is going to trigger soon enough.
+			 * Use a physical idle state instead of busy polling
+			 * if the next timer doesn't expire soon and its
+			 * target residency is below the residency threshold.
 			 */
 			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
-			    s->target_residency_ns <= data->next_timer_ns) {
+			    s->target_residency_ns <= data->next_timer_ns &&
+			    s->target_residency_ns < RESIDENCY_THRESHOLD_NS) {
 				predicted_ns = s->target_residency_ns;
 				idx = i;
 				break;
-- 
2.50.1


             reply	other threads:[~2025-09-08  7:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08  7:54 Aboorva Devarajan [this message]
2025-09-09 13:32 ` [PATCH v3 1/1] cpuidle: menu: Add residency threshold for non-polling state selection Christian Loehle
2025-09-09 13:35   ` Rafael J. Wysocki
2025-09-10 10:47 ` Rafael J. Wysocki
2025-09-11  8:14   ` Aboorva Devarajan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250908075443.208570-1-aboorvad@linux.ibm.com \
    --to=aboorvad@linux.ibm.com \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=gautam@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox