Linux Power Management development
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Christian Loehle <christian.loehle@arm.com>,
	Reka Norman <rekanorman@chromium.org>
Cc: daniel.lezcano@linaro.org, linux-pm@vger.kernel.org
Subject: [PATCH v1] cpuidle: governors: teo: Rework the handling of tick wakeups
Date: Thu, 13 Nov 2025 17:56:27 +0100	[thread overview]
Message-ID: <6228387.lOV4Wx5bFT@rafael.j.wysocki> (raw)
In-Reply-To: <ca45366d-4c85-4802-8a35-886a6f69d10d@arm.com>

On Thursday, November 13, 2025 4:43:18 PM CET Christian Loehle wrote:
> On 11/12/25 18:33, Christian Loehle wrote:
> > On 11/12/25 14:16, Rafael J. Wysocki wrote:
> >> On Wed, Nov 12, 2025 at 3:03 PM Christian Loehle
> >> <christian.loehle@arm.com> wrote:
> >>>
> >>> On 11/12/25 13:32, Rafael J. Wysocki wrote:
> >>>> On Tue, Nov 11, 2025 at 6:20 PM Christian Loehle
> >>>> <christian.loehle@arm.com> wrote:
> >>>>>
> >>>>> On 11/11/25 11:48, Rafael J. Wysocki wrote:
> >>>>>> On Tue, Nov 11, 2025 at 11:48 AM Christian Loehle
> >>>>>> <christian.loehle@arm.com> wrote:
> >>>>>>>
> >>>>>>> On 11/11/25 10:00, Christian Loehle wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>>>>> I see two issues:
> >>>>>>>> 1) Because of DECAY_SHIFT 3 values < 8 cannot decay (I guess this wouldn't really be an issue without 2))
> >>>>>>
> >>>>>> This shouldn't be a problem.
> >>>>>
> >>>>> Agreed, it should be a non-issue. Nonetheless if this wasn't the case $subject would've likely
> >>>>> never been an issue.
> >>>>
> >>>> Well, I think that the leftovers can be cleared when they become less than 8.
> >>>>
> >>>>>>
> >>>>>>>> 2) if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) isn't an appropriate check, it will
> >>>>>>>> exclude the state if it its idx_hit_sum make up the vast majority of cpu_data->total (i.e. it would
> >>>>>>>> have been a really good candidate actually).
> >>>>>>
> >>>>>> Well, it would exclude the state if the sum of hits for the states
> >>>>>> below it is large enough.  This is questionable (because why would
> >>>>>> hits matter here), but I attempted to make the change below and
> >>>>>> somebody reported a regression IIRC.
> >>>>>>
> >>>>>> This check is related to the problem at hand though (see below).
> >>>>>>
> >>>>>>>>
> >>>>>>>> I lightly tested the below, it seems to be at least comparable to mainline teo.
> >>>>>>>> (the documentation/comments would need adapting too, of course)
> >>>>>>>>
> >>>>>>>> -----8<-----
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> >>>>>>>> index bfa55c1eab5b..f8f76e3b8364 100644
> >>>>>>>> --- a/drivers/cpuidle/governors/teo.c
> >>>>>>>> +++ b/drivers/cpuidle/governors/teo.c
> >>>>>>>> @@ -355,7 +355,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >>>>>>>>          * all of the deeper states, a shallower idle state is likely to be a
> >>>>>>>>          * better choice.
> >>>>>>>>          */
> >>>>>>>> -       if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> >>>>>>>> +       if (2 * idx_intercept_sum > idx_hit_sum) {
> >>>>>>>>                 int first_suitable_idx = idx;
> >>>>>>>>
> >>>>>>>>                 /*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> ... nevermind the patch, idx_hit_sum is of course the sum of 0...idx-1.
> >>>>>>> Maybe something like this, again lightly tested:
> >>>>>>>
> >>>>>>> -----8<-----
> >>>>>>>
> >>>>>>> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> >>>>>>> index 173ddcac540a..6bfb9cedb75e 100644
> >>>>>>> --- a/drivers/cpuidle/governors/teo.c
> >>>>>>> +++ b/drivers/cpuidle/governors/teo.c
> >>>>>>> @@ -383,13 +395,15 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >>>>>>>                  * has been stopped already into account.
> >>>>>>>                  */
> >>>>>>>                 intercept_sum = 0;
> >>>>>>> +               hit_sum = 0;
> >>>>>>>
> >>>>>>>                 for (i = idx - 1; i >= 0; i--) {
> >>>>>>>                         struct teo_bin *bin = &cpu_data->state_bins[i];
> >>>>>>>
> >>>>>>>                         intercept_sum += bin->intercepts;
> >>>>>>> +                       hit_sum += bin->hits;
> >>>>>>>
> >>>>>>> -                       if (2 * intercept_sum > idx_intercept_sum) {
> >>>>>>> +                       if (2 * intercept_sum > cpu_data->total || 2 * hit_sum > cpu_data->total) {
> >>>>>>>                                 /*
> >>>>>>>                                  * Use the current state unless it is too
> >>>>>>>                                  * shallow or disabled, in which case take the
> >>>>>>
> >>>>>> This will only matter after the deepest state has been rejected
> >>>>>> already and on the system in question this means selecting state 0 no
> >>>>>> matter what.
> >>>>>>
> >>>>>
> >>>>> Ah, right!
> >>>>>
> >>>>>
> >>>>>> The pre-6.12 behavior can be explained if tick wakeups are taken into account.
> >>>>>>
> >>>>>> Namely, when state 0 is chosen (because of the check mentioned above),
> >>>>>> the tick is not stopped and the sleep length is KTIME_MAX.  If the
> >>>>>> subsequent wakeup is a tick one, it will be counted as a hit on the
> >>>>>> deepest state (and it will contribute to the total sum in the check
> >>>>>> mentioned above).  Then, at one point, cpu_data->total will be large
> >>>>>> enough and the deepest state will become the candidate one.  If
> >>>>>> tick_nohz_get_sleep_length() returns a large value at that point, the
> >>>>>> tick will be stopped and the deepest state will be entered.  Nirvana
> >>>>>> ensues.
> >>>>>
> >>>>> So fundamentally we will have to count tick-wakeups as a) nothing, which
> >>>>> doesn't allow us to ever break out of the intercept logic that caused us
> >>>>> to leave the tick on b) intercepts, which is bonkers and doesn't allow us
> >>>>> to ever break out and c) hits == sleep_length would've been accurate.
> >>>>> Of course counting a tick wakeup as a hit for sleep_length negates the
> >>>>> intercept logic.
> >>>>
> >>>> Not quite.  The intercept logic is there for wakeups other than tick
> >>>> wakeups and timer wakeups.
> >>>>
> >>>> I actually think that tick wakeups can be counted as hits on the
> >>>> deepest available state - maybe only when tick wakeups dominate the
> >>>> wakeup pattern - but generally this is not unreasonable: When the
> >>>> wakeup pattern is dominated by tick wakeups, this by itself is a good
> >>>> enough reason to stop the tick.
> >>>
> >>> (assuming HZ=1000 below but it doesn't matter)
> >>> That will exclude any 'intercept' logic from having much effect if the
> >>> avg idle duration is >TICK_NSEC/2, which is potentially still quite a bit
> >>> off from state1 residency, like in Reka's case here.
> >>> That's why I thought it would cause unreasonable regressions here.
> >>> I'll give it a go as well though!
> >>
> >> Thanks!
> >>
> >> Note that I'd prefer to add a check if tick wakeups dominate the
> >> wakeup pattern before setting sleep_length_ns to KTIME_MAX though.
> >> I'd first like to know how the Reka's system reacts to the more
> >> drastic variant of this change.
> > 
> > Below are my usual tests, it's definitely visible but the impact is limited
> > on this platform anyway. I think if we gate the KTIME_MAX setting behind
> > the "tick wakeup dominate" it should be acceptable!
> > Let's see what Reka reports.
> > 
> Forgot to post the full results, anyway as expected with mtdblock (a very slow
> / low frequent wakeup scenario) the impact becomes clearly visible.
> Still hopeful that the more conservative approach will be acceptable!

Speaking of which, the patch to test is appended below, but it doesn't apply
directly on top of the mainline.  It is based on some other patches that have
been posted recently, so here's a git branch with all of the requisite
material:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git cpuidle-teo-testing

Reka, please try this one and let us know how it goes.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH v1] cpuidle: governors: teo: Rework the handling of tick wakeups

If the wakeup pattern is clearly dominated by tick wakeups, count those
wakeups as hits on the deepest available idle state to increase the
likelihood of stopping the tick, especially on systems where there are
only 2 usable idle states and the tick can only be stopped when the
deeper state is selected.

This change is expected to reduce power on some systems where state 0 is
selected relatively often even though they are almost idle.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |   39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -133,17 +133,19 @@ struct teo_bin {
  * @sleep_length_ns: Time till the closest timer event (at the selection time).
  * @state_bins: Idle state data bins for this CPU.
  * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
+ * @total_tick: Wakeups by the scheduler tick.
  * @tick_intercepts: "Intercepts" before TICK_NSEC.
  * @short_idles: Wakeups after short idle periods.
- * @artificial_wakeup: Set if the wakeup has been triggered by a safety net.
+ * @tick_wakeup: Set if the last wakeup was by the scheduler tick.
  */
 struct teo_cpu {
 	s64 sleep_length_ns;
 	struct teo_bin state_bins[CPUIDLE_STATE_MAX];
 	unsigned int total;
+	unsigned int total_tick;
 	unsigned int tick_intercepts;
 	unsigned int short_idles;
-	bool artificial_wakeup;
+	bool tick_wakeup;
 };
 
 static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -172,9 +174,10 @@ static void teo_update(struct cpuidle_dr
 
 	teo_decay(&cpu_data->short_idles);
 
-	if (cpu_data->artificial_wakeup) {
+	if (dev->poll_time_limit) {
+		dev->poll_time_limit = false;
 		/*
-		 * If one of the safety nets has triggered, assume that this
+		 * Polling state timeout has triggered, so assume that this
 		 * might have been a long sleep.
 		 */
 		measured_ns = S64_MAX;
@@ -223,6 +226,21 @@ static void teo_update(struct cpuidle_dr
 	cpu_data->total = total + PULSE;
 
 	teo_decay(&cpu_data->tick_intercepts);
+
+	teo_decay(&cpu_data->total_tick);
+	if (cpu_data->tick_wakeup) {
+		cpu_data->total_tick += PULSE;
+		/*
+		 * If tick wakeups dominate the wakeup pattern, count this one
+		 * as a hit on the deepest available idle state to increase the
+		 * likelihood of stopping the tick.
+		 */
+		if (3 * cpu_data->total_tick > 2 * cpu_data->total) {
+			cpu_data->state_bins[drv->state_count-1].hits += PULSE;
+			return;
+		}
+	}
+
 	/*
 	 * If the measured idle duration falls into the same bin as the sleep
 	 * length, this is a "hit", so update the "hits" metric for that bin.
@@ -512,18 +530,9 @@ static void teo_reflect(struct cpuidle_d
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 
+	cpu_data->tick_wakeup = tick_nohz_idle_got_tick();
+
 	dev->last_state_idx = state;
-	if (dev->poll_time_limit ||
-	    (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
-		/*
-		 * The wakeup was not "genuine", but triggered by one of the
-		 * safety nets.
-		 */
-		dev->poll_time_limit = false;
-		cpu_data->artificial_wakeup = true;
-	} else {
-		cpu_data->artificial_wakeup = false;
-	}
 }
 
 /**




  reply	other threads:[~2025-11-13 16:56 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04  3:36 Regression in TEO cpuidle governor between 6.6 and 6.12 Reka Norman
2025-11-04  9:03 ` Christian Loehle
2025-11-04 23:24   ` Christian Loehle
2025-11-05  6:22     ` Reka Norman
2025-11-05 22:47       ` Christian Loehle
2025-11-06 15:21         ` Rafael J. Wysocki
2025-11-05 20:48     ` Rafael J. Wysocki
2025-11-06 11:13       ` Christian Loehle
2025-11-06 20:33         ` Rafael J. Wysocki
2025-11-07  3:28           ` Reka Norman
2025-11-07 11:35             ` Rafael J. Wysocki
2025-11-09 20:35               ` Christian Loehle
2025-11-10  6:10                 ` Reka Norman
2025-11-10 12:06                   ` Christian Loehle
2025-11-11  4:23                     ` Reka Norman
2025-11-11 10:00                       ` Christian Loehle
2025-11-11 10:48                         ` Christian Loehle
2025-11-11 11:48                           ` Rafael J. Wysocki
2025-11-11 17:20                             ` Christian Loehle
2025-11-12  9:51                               ` Christian Loehle
2025-11-12 13:40                                 ` Rafael J. Wysocki
2025-11-12 14:14                                   ` Christian Loehle
2025-11-12 14:23                                     ` Rafael J. Wysocki
2025-11-12 13:32                               ` Rafael J. Wysocki
2025-11-12 14:03                                 ` Christian Loehle
2025-11-12 14:16                                   ` Rafael J. Wysocki
2025-11-12 18:33                                     ` Christian Loehle
2025-11-13 15:43                                       ` Christian Loehle
2025-11-13 16:56                                         ` Rafael J. Wysocki [this message]
2025-11-14  4:05                                           ` [PATCH v1] cpuidle: governors: teo: Rework the handling of tick wakeups Reka Norman
2025-11-14  8:33                                             ` Christian Loehle
2025-11-14 14:15                                               ` Rafael J. Wysocki
2025-11-14 14:49                                                 ` Christian Loehle
2025-11-14 15:13                                                   ` Rafael J. Wysocki
2025-11-17  5:44                                                 ` Reka Norman
2025-11-17  5:14                                               ` Reka Norman
2025-11-17  8:45                                                 ` Christian Loehle
2025-11-17 16:29                                                   ` Rafael J. Wysocki
2025-11-18  0:19                                                   ` Reka Norman
2025-11-14  9:20                                             ` Christian Loehle
2025-11-17  5:41                                               ` Reka Norman
2025-11-14 14:47                                           ` Christian Loehle
2025-11-14 15:12                                             ` Rafael J. Wysocki
2025-11-14 15:33                                               ` Christian Loehle
2025-11-19 22:43                                         ` Regression in TEO cpuidle governor between 6.6 and 6.12 Doug Smythies
2025-11-13  5:13                                 ` Reka Norman
2025-11-13 10:52                                   ` Rafael J. Wysocki
2025-11-10 20:51                   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6228387.lOV4Wx5bFT@rafael.j.wysocki \
    --to=rafael@kernel.org \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rekanorman@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox