* [PATCH v2 0/2] cpuidle: governors: teo: Wakeup events classification change and refinement
@ 2026-01-26 19:41 Rafael J. Wysocki
2026-01-26 19:45 ` [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events Rafael J. Wysocki
2026-01-26 19:51 ` [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup Rafael J. Wysocki
0 siblings, 2 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-26 19:41 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Christian Loehle, Doug Smythies
Hi All,
This is a follow-up to
https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
including new versions of patches [4-5/5] in that series. The other patches
from it have been already queued up for 6.20.
Patch [1/2] changes the criteria used for classifying wakeup events as hits
or intercepts to (hopefully) make the classification work better for large
state bins.
Patch [2/2] refines the idle state lookup based on intercepts to first
consider the state with the maximum intercepts metric, so that state is
always taken into consideration.
Please see the individual patch changelogs for details.
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events
2026-01-26 19:41 [PATCH v2 0/2] cpuidle: governors: teo: Wakeup events classification change and refinement Rafael J. Wysocki
@ 2026-01-26 19:45 ` Rafael J. Wysocki
2026-01-29 9:16 ` Christian Loehle
2026-01-26 19:51 ` [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup Rafael J. Wysocki
1 sibling, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-26 19:45 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Christian Loehle, Doug Smythies
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If differences between target residency values of adjacent idle states
of a given CPU are relatively large, the corresponding idle state bins
used by the teo governors are large either and the rule by which hits
are distinguished from intercepts is inaccurate.
Namely, by that rule, a wakeup event is classified as a hit if the
sleep length (the time till the closest timer other than the tick)
and the measured idle duration, adjusted for the entered idle state
exit latency, fall into the same idle state bin. However, if that bin
is large enough, the actual difference between the sleep length and
the measured idle duration may be significant. It may in fact be
significantly greater than the analogous difference for an event where
the sleep length and the measured idle duration fall into different
bins.
For this reason, amend the rule in question with a check that will
only allow a wakeup event to be counted as a hit if the difference
between the sleep length and the measured idle duration is less than
LATENCY_THRESHOLD_NS (which means that the difference between the
sleep length and the raw measured idle duration is below the sum of
LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
Otherwise, the event will be counted as an intercept.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1.1 -> v2: No changes
v1 -> v1.1
* Drop the change in teo_select() along with the corresponding
part of the changelog (after receiving testing feedback from
Christian)
This is a resend of
https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
It applies on top of the first three patches from
https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
---
drivers/cpuidle/governors/teo.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -48,13 +48,11 @@
* in accordance with what happened last time.
*
* The "hits" metric reflects the relative frequency of situations in which the
- * sleep length and the idle duration measured after CPU wakeup fall into the
- * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
- * length). In turn, the "intercepts" metric reflects the relative frequency of
- * non-timer wakeup events for which the measured idle duration falls into a bin
- * that corresponds to an idle state shallower than the one whose bin is fallen
- * into by the sleep length (these events are also referred to as "intercepts"
- * below).
+ * sleep length and the idle duration measured after CPU wakeup are close enough
+ * (that is, the CPU appears to wake up "on time" relative to the sleep length).
+ * In turn, the "intercepts" metric reflects the relative frequency of non-timer
+ * wakeup events for which the measured idle duration is measurably less than
+ * the sleep length (these events are also referred to as "intercepts" below).
*
* The governor also counts "intercepts" with the measured idle duration below
* the tick period length and uses this information when deciding whether or not
@@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
}
/*
- * If the measured idle duration falls into the same bin as the sleep
- * length, this is a "hit", so update the "hits" metric for that bin.
+ * If the measured idle duration falls into the same bin as the
+ * sleep length and the difference between them is less than
+ * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
+ * metric for that bin.
+ *
* Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration.
*/
- if (idx_timer == idx_duration) {
+ if (idx_timer == idx_duration &&
+ cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
cpu_data->state_bins[idx_timer].hits += PULSE;
} else {
cpu_data->state_bins[idx_duration].intercepts += PULSE;
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
2026-01-26 19:41 [PATCH v2 0/2] cpuidle: governors: teo: Wakeup events classification change and refinement Rafael J. Wysocki
2026-01-26 19:45 ` [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events Rafael J. Wysocki
@ 2026-01-26 19:51 ` Rafael J. Wysocki
2026-01-29 9:19 ` Christian Loehle
1 sibling, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-26 19:51 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Christian Loehle, Doug Smythies
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are cases in which decisions made by the teo governor are
arguably overly conservative.
For instance, suppose that there are 4 idle states and the values of
the intercepts metric for the first 3 of them are 400, 250, and 251,
respectively. If the total sum computed in teo_update() is 1000, the
governor will select idle state 1 (provided that all idle states are
enabled and the scheduler tick has not been stopped) although arguably
idle state 0 would be a better choice because the likelihood of getting
an idle duration below the target residency of idle state 1 is greater
than the likelihood of getting an idle duration between the target
residency of idle state 1 and the target residency of idle state 2.
To address this, refine the candidate idle state lookup based on
intercepts to start at the state with the maximum intercepts metric,
below the deepest enabled one, to avoid the cases in which the search
may stop before reaching that state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2:
* Multiple fixes related to the handling of cases in which some states
are disabled.
* Fixes in new comments (there was some confusion in those comments
regarding the direction of idle states table traversal).
* Fixed typos in new comments.
---
drivers/cpuidle/governors/teo.c | 50 ++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 7 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -73,12 +73,17 @@
* than the candidate one (it represents the cases in which the CPU was
* likely woken up by a non-timer wakeup source).
*
+ * Also find the idle state with the maximum intercepts metric (if there are
+ * multiple states with the maximum intercepts metric, choose the one with
+ * the highest index).
+ *
* 2. If the second sum computed in step 1 is greater than a half of the sum of
* both metrics for the candidate state bin and all subsequent bins (if any),
* a shallower idle state is likely to be more suitable, so look for it.
*
* - Traverse the enabled idle states shallower than the candidate one in the
- * descending order.
+ * descending order, starting at the state with the maximum intercepts
+ * metric found in step 1.
*
* - For each of them compute the sum of the "intercepts" metrics over all
* of the idle states between it and the candidate one (including the
@@ -307,8 +312,10 @@ static int teo_select(struct cpuidle_dri
ktime_t delta_tick = TICK_NSEC / 2;
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
+ unsigned int intercept_max = 0;
unsigned int idx_hit_sum = 0;
unsigned int hit_sum = 0;
+ int intercept_max_idx = -1;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
s64 duration_ns;
@@ -339,17 +346,32 @@ static int teo_select(struct cpuidle_dri
if (!dev->states_usage[0].disable)
idx = 0;
- /* Compute the sums of metrics for early wakeup pattern detection. */
+ /*
+ * Compute the sums of metrics for early wakeup pattern detection and
+ * look for the state bin with the maximum intercepts metric below the
+ * deepest enabled one (if there are multiple states with the maximum
+ * intercepts metric, choose the one with the highest index).
+ */
for (i = 1; i < drv->state_count; i++) {
struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
+ unsigned int prev_intercepts = prev_bin->intercepts;
struct cpuidle_state *s = &drv->states[i];
/*
* Update the sums of idle state metrics for all of the states
* shallower than the current one.
*/
- intercept_sum += prev_bin->intercepts;
hit_sum += prev_bin->hits;
+ intercept_sum += prev_intercepts;
+ /*
+ * Check if this is the bin with the maximum number of
+ * intercepts so far and in that case update the index of
+ * the state with the maximum intercepts metric.
+ */
+ if (prev_intercepts >= intercept_max) {
+ intercept_max = prev_intercepts;
+ intercept_max_idx = i - 1;
+ }
if (dev->states_usage[i].disable)
continue;
@@ -413,9 +435,22 @@ static int teo_select(struct cpuidle_dri
}
/*
- * Look for the deepest idle state whose target residency had
- * not exceeded the idle duration in over a half of the relevant
- * cases in the past.
+ * If the minimum state index is greater than or equal to the
+ * index of the state with the maximum intercepts metric and
+ * the corresponding state is enabled, there is no need to look
+ * at the deeper states.
+ */
+ if (min_idx >= intercept_max_idx &&
+ !dev->states_usage[min_idx].disable) {
+ idx = min_idx;
+ goto constraint;
+ }
+
+ /*
+ * Look for the deepest enabled idle state, at most as deep as
+ * the one with the maximum intercepts metric, whose target
+ * residency had not been greater than the idle duration in over
+ * a half of the relevant cases in the past.
*
* Take the possible duration limitation present if the tick
* has been stopped already into account.
@@ -427,7 +462,8 @@ static int teo_select(struct cpuidle_dri
continue;
idx = i;
- if (2 * intercept_sum > idx_intercept_sum)
+ if (2 * intercept_sum > idx_intercept_sum &&
+ i <= intercept_max_idx)
break;
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events
2026-01-26 19:45 ` [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events Rafael J. Wysocki
@ 2026-01-29 9:16 ` Christian Loehle
2026-01-29 17:18 ` Rafael J. Wysocki
0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2026-01-29 9:16 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Doug Smythies
On 1/26/26 19:45, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> If differences between target residency values of adjacent idle states
> of a given CPU are relatively large, the corresponding idle state bins
> used by the teo governors are large either and the rule by which hits
> are distinguished from intercepts is inaccurate.
>
> Namely, by that rule, a wakeup event is classified as a hit if the
> sleep length (the time till the closest timer other than the tick)
> and the measured idle duration, adjusted for the entered idle state
> exit latency, fall into the same idle state bin. However, if that bin
> is large enough, the actual difference between the sleep length and
> the measured idle duration may be significant. It may in fact be
> significantly greater than the analogous difference for an event where
> the sleep length and the measured idle duration fall into different
> bins.
>
> For this reason, amend the rule in question with a check that will
> only allow a wakeup event to be counted as a hit if the difference
> between the sleep length and the measured idle duration is less than
> LATENCY_THRESHOLD_NS (which means that the difference between the
> sleep length and the raw measured idle duration is below the sum of
> LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> Otherwise, the event will be counted as an intercept.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v1.1 -> v2: No changes
>
> v1 -> v1.1
> * Drop the change in teo_select() along with the corresponding
> part of the changelog (after receiving testing feedback from
> Christian)
>
> This is a resend of
>
> https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
>
> It applies on top of the first three patches from
>
> https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
>
> ---
> drivers/cpuidle/governors/teo.c | 32 ++++++++++++++++----------------
> 1 file changed, 16 insertions(+), 16 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -48,13 +48,11 @@
> * in accordance with what happened last time.
> *
> * The "hits" metric reflects the relative frequency of situations in which the
> - * sleep length and the idle duration measured after CPU wakeup fall into the
> - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> - * length). In turn, the "intercepts" metric reflects the relative frequency of
> - * non-timer wakeup events for which the measured idle duration falls into a bin
> - * that corresponds to an idle state shallower than the one whose bin is fallen
> - * into by the sleep length (these events are also referred to as "intercepts"
> - * below).
> + * sleep length and the idle duration measured after CPU wakeup are close enough
> + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> + * wakeup events for which the measured idle duration is measurably less than
> + * the sleep length (these events are also referred to as "intercepts" below).
> *
> * The governor also counts "intercepts" with the measured idle duration below
> * the tick period length and uses this information when deciding whether or not
> @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> }
>
> /*
> - * If the measured idle duration falls into the same bin as the sleep
> - * length, this is a "hit", so update the "hits" metric for that bin.
> + * If the measured idle duration falls into the same bin as the
> + * sleep length and the difference between them is less than
> + * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> + * metric for that bin.
> + *
> * Otherwise, update the "intercepts" metric for the bin fallen into by
> * the measured idle duration.
> */
> - if (idx_timer == idx_duration) {
> + if (idx_timer == idx_duration &&
> + cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
So it needs to be within 7.5us here.
Can we always expect that to be true?
Especially since measured_ns does this "infer average from worst-case exit
latency" handling.
On deeper states this
measured_ns -= lat_ns / 2;
is an order of magnitude higher than our threshold.
So it should probably be something like
exit_latency / 2 + LATENCY_THRESHOLD_NS?
Or just exit_latency and allow the error to both sides?
> cpu_data->state_bins[idx_timer].hits += PULSE;
> } else {
> cpu_data->state_bins[idx_duration].intercepts += PULSE;
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
2026-01-26 19:51 ` [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup Rafael J. Wysocki
@ 2026-01-29 9:19 ` Christian Loehle
2026-01-29 13:21 ` Rafael J. Wysocki
0 siblings, 1 reply; 8+ messages in thread
From: Christian Loehle @ 2026-01-29 9:19 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Doug Smythies
On 1/26/26 19:51, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> There are cases in which decisions made by the teo governor are
> arguably overly conservative.
>
> For instance, suppose that there are 4 idle states and the values of
> the intercepts metric for the first 3 of them are 400, 250, and 251,
> respectively. If the total sum computed in teo_update() is 1000, the
> governor will select idle state 1 (provided that all idle states are
> enabled and the scheduler tick has not been stopped) although arguably
> idle state 0 would be a better choice because the likelihood of getting
> an idle duration below the target residency of idle state 1 is greater
> than the likelihood of getting an idle duration between the target
> residency of idle state 1 and the target residency of idle state 2.
>
> To address this, refine the candidate idle state lookup based on
> intercepts to start at the state with the maximum intercepts metric,
> below the deepest enabled one, to avoid the cases in which the search
> may stop before reaching that state.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v1 -> v2:
> * Multiple fixes related to the handling of cases in which some states
> are disabled.
> * Fixes in new comments (there was some confusion in those comments
> regarding the direction of idle states table traversal).
> * Fixed typos in new comments.
>
> ---
> drivers/cpuidle/governors/teo.c | 50 ++++++++++++++++++++++++++++++++++------
> 1 file changed, 43 insertions(+), 7 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -73,12 +73,17 @@
> * than the candidate one (it represents the cases in which the CPU was
> * likely woken up by a non-timer wakeup source).
> *
> + * Also find the idle state with the maximum intercepts metric (if there are
> + * multiple states with the maximum intercepts metric, choose the one with
> + * the highest index).
> + *
> * 2. If the second sum computed in step 1 is greater than a half of the sum of
> * both metrics for the candidate state bin and all subsequent bins (if any),
> * a shallower idle state is likely to be more suitable, so look for it.
> *
> * - Traverse the enabled idle states shallower than the candidate one in the
> - * descending order.
> + * descending order, starting at the state with the maximum intercepts
> + * metric found in step 1.
> *
> * - For each of them compute the sum of the "intercepts" metrics over all
> * of the idle states between it and the candidate one (including the
> @@ -307,8 +312,10 @@ static int teo_select(struct cpuidle_dri
> ktime_t delta_tick = TICK_NSEC / 2;
> unsigned int idx_intercept_sum = 0;
> unsigned int intercept_sum = 0;
> + unsigned int intercept_max = 0;
> unsigned int idx_hit_sum = 0;
> unsigned int hit_sum = 0;
> + int intercept_max_idx = -1;
> int constraint_idx = 0;
> int idx0 = 0, idx = -1;
> s64 duration_ns;
> @@ -339,17 +346,32 @@ static int teo_select(struct cpuidle_dri
> if (!dev->states_usage[0].disable)
> idx = 0;
>
> - /* Compute the sums of metrics for early wakeup pattern detection. */
> + /*
> + * Compute the sums of metrics for early wakeup pattern detection and
> + * look for the state bin with the maximum intercepts metric below the
> + * deepest enabled one (if there are multiple states with the maximum
> + * intercepts metric, choose the one with the highest index).
> + */
> for (i = 1; i < drv->state_count; i++) {
> struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
> + unsigned int prev_intercepts = prev_bin->intercepts;
> struct cpuidle_state *s = &drv->states[i];
>
> /*
> * Update the sums of idle state metrics for all of the states
> * shallower than the current one.
> */
> - intercept_sum += prev_bin->intercepts;
> hit_sum += prev_bin->hits;
> + intercept_sum += prev_intercepts;
> + /*
> + * Check if this is the bin with the maximum number of
> + * intercepts so far and in that case update the index of
> + * the state with the maximum intercepts metric.
> + */
> + if (prev_intercepts >= intercept_max) {
> + intercept_max = prev_intercepts;
> + intercept_max_idx = i - 1;
> + }
>
> if (dev->states_usage[i].disable)
> continue;
> @@ -413,9 +435,22 @@ static int teo_select(struct cpuidle_dri
> }
>
> /*
> - * Look for the deepest idle state whose target residency had
> - * not exceeded the idle duration in over a half of the relevant
> - * cases in the past.
> + * If the minimum state index is greater than or equal to the
> + * index of the state with the maximum intercepts metric and
> + * the corresponding state is enabled, there is no need to look
> + * at the deeper states.
> + */
> + if (min_idx >= intercept_max_idx &&
> + !dev->states_usage[min_idx].disable) {
> + idx = min_idx;
> + goto constraint;
> + }
> +
> + /*
> + * Look for the deepest enabled idle state, at most as deep as
> + * the one with the maximum intercepts metric, whose target
> + * residency had not been greater than the idle duration in over
> + * a half of the relevant cases in the past.
> *
> * Take the possible duration limitation present if the tick
> * has been stopped already into account.
> @@ -427,7 +462,8 @@ static int teo_select(struct cpuidle_dri
> continue;
>
> idx = i;
> - if (2 * intercept_sum > idx_intercept_sum)
> + if (2 * intercept_sum > idx_intercept_sum &&
> + i <= intercept_max_idx)
Should this be i >= intercept_max_idx?
> break;
> }
> }
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup
2026-01-29 9:19 ` Christian Loehle
@ 2026-01-29 13:21 ` Rafael J. Wysocki
0 siblings, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-29 13:21 UTC (permalink / raw)
To: Christian Loehle; +Cc: Rafael J. Wysocki, Linux PM, LKML, Doug Smythies
On Thu, Jan 29, 2026 at 10:19 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/26/26 19:51, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > There are cases in which decisions made by the teo governor are
> > arguably overly conservative.
> >
> > For instance, suppose that there are 4 idle states and the values of
> > the intercepts metric for the first 3 of them are 400, 250, and 251,
> > respectively. If the total sum computed in teo_update() is 1000, the
> > governor will select idle state 1 (provided that all idle states are
> > enabled and the scheduler tick has not been stopped) although arguably
> > idle state 0 would be a better choice because the likelihood of getting
> > an idle duration below the target residency of idle state 1 is greater
> > than the likelihood of getting an idle duration between the target
> > residency of idle state 1 and the target residency of idle state 2.
> >
> > To address this, refine the candidate idle state lookup based on
> > intercepts to start at the state with the maximum intercepts metric,
> > below the deepest enabled one, to avoid the cases in which the search
> > may stop before reaching that state.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1 -> v2:
> > * Multiple fixes related to the handling of cases in which some states
> > are disabled.
> > * Fixes in new comments (there was some confusion in those comments
> > regarding the direction of idle states table traversal).
> > * Fixed typos in new comments.
> >
> > ---
> > drivers/cpuidle/governors/teo.c | 50 ++++++++++++++++++++++++++++++++++------
> > 1 file changed, 43 insertions(+), 7 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -73,12 +73,17 @@
> > * than the candidate one (it represents the cases in which the CPU was
> > * likely woken up by a non-timer wakeup source).
> > *
> > + * Also find the idle state with the maximum intercepts metric (if there are
> > + * multiple states with the maximum intercepts metric, choose the one with
> > + * the highest index).
> > + *
> > * 2. If the second sum computed in step 1 is greater than a half of the sum of
> > * both metrics for the candidate state bin and all subsequent bins (if any),
> > * a shallower idle state is likely to be more suitable, so look for it.
> > *
> > * - Traverse the enabled idle states shallower than the candidate one in the
> > - * descending order.
> > + * descending order, starting at the state with the maximum intercepts
> > + * metric found in step 1.
> > *
> > * - For each of them compute the sum of the "intercepts" metrics over all
> > * of the idle states between it and the candidate one (including the
> > @@ -307,8 +312,10 @@ static int teo_select(struct cpuidle_dri
> > ktime_t delta_tick = TICK_NSEC / 2;
> > unsigned int idx_intercept_sum = 0;
> > unsigned int intercept_sum = 0;
> > + unsigned int intercept_max = 0;
> > unsigned int idx_hit_sum = 0;
> > unsigned int hit_sum = 0;
> > + int intercept_max_idx = -1;
> > int constraint_idx = 0;
> > int idx0 = 0, idx = -1;
> > s64 duration_ns;
> > @@ -339,17 +346,32 @@ static int teo_select(struct cpuidle_dri
> > if (!dev->states_usage[0].disable)
> > idx = 0;
> >
> > - /* Compute the sums of metrics for early wakeup pattern detection. */
> > + /*
> > + * Compute the sums of metrics for early wakeup pattern detection and
> > + * look for the state bin with the maximum intercepts metric below the
> > + * deepest enabled one (if there are multiple states with the maximum
> > + * intercepts metric, choose the one with the highest index).
> > + */
> > for (i = 1; i < drv->state_count; i++) {
> > struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
> > + unsigned int prev_intercepts = prev_bin->intercepts;
> > struct cpuidle_state *s = &drv->states[i];
> >
> > /*
> > * Update the sums of idle state metrics for all of the states
> > * shallower than the current one.
> > */
> > - intercept_sum += prev_bin->intercepts;
> > hit_sum += prev_bin->hits;
> > + intercept_sum += prev_intercepts;
> > + /*
> > + * Check if this is the bin with the maximum number of
> > + * intercepts so far and in that case update the index of
> > + * the state with the maximum intercepts metric.
> > + */
> > + if (prev_intercepts >= intercept_max) {
> > + intercept_max = prev_intercepts;
> > + intercept_max_idx = i - 1;
> > + }
> >
> > if (dev->states_usage[i].disable)
> > continue;
> > @@ -413,9 +435,22 @@ static int teo_select(struct cpuidle_dri
> > }
> >
> > /*
> > - * Look for the deepest idle state whose target residency had
> > - * not exceeded the idle duration in over a half of the relevant
> > - * cases in the past.
> > + * If the minimum state index is greater than or equal to the
> > + * index of the state with the maximum intercepts metric and
> > + * the corresponding state is enabled, there is no need to look
> > + * at the deeper states.
> > + */
> > + if (min_idx >= intercept_max_idx &&
> > + !dev->states_usage[min_idx].disable) {
> > + idx = min_idx;
> > + goto constraint;
> > + }
> > +
> > + /*
> > + * Look for the deepest enabled idle state, at most as deep as
> > + * the one with the maximum intercepts metric, whose target
> > + * residency had not been greater than the idle duration in over
> > + * a half of the relevant cases in the past.
> > *
> > * Take the possible duration limitation present if the tick
> > * has been stopped already into account.
> > @@ -427,7 +462,8 @@ static int teo_select(struct cpuidle_dri
> > continue;
> >
> > idx = i;
> > - if (2 * intercept_sum > idx_intercept_sum)
> > + if (2 * intercept_sum > idx_intercept_sum &&
> > + i <= intercept_max_idx)
>
> Should this be i >= intercept_max_idx?
No, the point is to get to intercept_max_idx, or below it if it is
disabled (note that i is decremented in each step of the loop, so i
cannot be greater than intercept_max_idx if its initial value isn't).
> > break;
> > }
> > }
> >
> >
> >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events
2026-01-29 9:16 ` Christian Loehle
@ 2026-01-29 17:18 ` Rafael J. Wysocki
2026-01-29 20:08 ` Rafael J. Wysocki
0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-29 17:18 UTC (permalink / raw)
To: Christian Loehle; +Cc: Rafael J. Wysocki, Linux PM, LKML, Doug Smythies
On Thu, Jan 29, 2026 at 10:16 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/26/26 19:45, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > If differences between target residency values of adjacent idle states
> > of a given CPU are relatively large, the corresponding idle state bins
> > used by the teo governors are large either and the rule by which hits
> > are distinguished from intercepts is inaccurate.
> >
> > Namely, by that rule, a wakeup event is classified as a hit if the
> > sleep length (the time till the closest timer other than the tick)
> > and the measured idle duration, adjusted for the entered idle state
> > exit latency, fall into the same idle state bin. However, if that bin
> > is large enough, the actual difference between the sleep length and
> > the measured idle duration may be significant. It may in fact be
> > significantly greater than the analogous difference for an event where
> > the sleep length and the measured idle duration fall into different
> > bins.
> >
> > For this reason, amend the rule in question with a check that will
> > only allow a wakeup event to be counted as a hit if the difference
> > between the sleep length and the measured idle duration is less than
> > LATENCY_THRESHOLD_NS (which means that the difference between the
> > sleep length and the raw measured idle duration is below the sum of
> > LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> > Otherwise, the event will be counted as an intercept.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1.1 -> v2: No changes
> >
> > v1 -> v1.1
> > * Drop the change in teo_select() along with the corresponding
> > part of the changelog (after receiving testing feedback from
> > Christian)
> >
> > This is a resend of
> >
> > https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
> >
> > It applies on top of the first three patches from
> >
> > https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
> >
> > ---
> > drivers/cpuidle/governors/teo.c | 32 ++++++++++++++++----------------
> > 1 file changed, 16 insertions(+), 16 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -48,13 +48,11 @@
> > * in accordance with what happened last time.
> > *
> > * The "hits" metric reflects the relative frequency of situations in which the
> > - * sleep length and the idle duration measured after CPU wakeup fall into the
> > - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> > - * length). In turn, the "intercepts" metric reflects the relative frequency of
> > - * non-timer wakeup events for which the measured idle duration falls into a bin
> > - * that corresponds to an idle state shallower than the one whose bin is fallen
> > - * into by the sleep length (these events are also referred to as "intercepts"
> > - * below).
> > + * sleep length and the idle duration measured after CPU wakeup are close enough
> > + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> > + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> > + * wakeup events for which the measured idle duration is measurably less than
> > + * the sleep length (these events are also referred to as "intercepts" below).
> > *
> > * The governor also counts "intercepts" with the measured idle duration below
> > * the tick period length and uses this information when deciding whether or not
> > @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> > }
> >
> > /*
> > - * If the measured idle duration falls into the same bin as the sleep
> > - * length, this is a "hit", so update the "hits" metric for that bin.
> > + * If the measured idle duration falls into the same bin as the
> > + * sleep length and the difference between them is less than
> > + * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> > + * metric for that bin.
> > + *
> > * Otherwise, update the "intercepts" metric for the bin fallen into by
> > * the measured idle duration.
> > */
> > - if (idx_timer == idx_duration) {
> > + if (idx_timer == idx_duration &&
> > + cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
>
> So it needs to be within 7.5us here.
> Can we always expect that to be true?
It's just a margin.
> Especially since measured_ns does this "infer average from worst-case exit
> latency" handling.
> On deeper states this
> measured_ns -= lat_ns / 2;
> is an order of magnitude higher than our threshold.
True.
> So it should probably be something like
> exit_latency / 2 + LATENCY_THRESHOLD_NS?
> Or just exit_latency and allow the error to both sides?
Well, the exit latency is already there in this inequality because
measured_ns == raw_measured_ns - exit_latency / 2 and I didn't want to
take it into account twice.
And in fact I want sleep_length_ns and measured_us (already adjusted
for the entered state exit latency) to be equal up to a margin and I
just think that the margin can be the same for all of the state bins
because it's basically the granularity of the comparison.
I didn't get it right though and the code should be something like this:
if (idx_timer == idx_duration) {
s64 delta_ns = cpu_data->sleep_length_ns - measured_ns;
if (delta_ns < 0)
delta_ns = -delta_ns;
if (delta_ns < LATENCY_THRESHOLD_NS) {
cpu_data->state_bins[idx_timer].hits += PULSE;
return;
}
}
/*
* Update the "intercepts" metric for the bin fallen into by the
* measured idle duration.
*/
cpu_data->state_bins[idx_duration].intercepts += PULSE;
if (measured_ns <= TICK_NSEC)
cpu_data->tick_intercepts += PULSE;
LATENCY_THRESHOLD_NS is as good as anything else here and for bins
narrower than it (which means C1 and C1e on Intel x86 for instance)
delta_ns will always be less than it, so the behavior there won't
change after the patch.
> > cpu_data->state_bins[idx_timer].hits += PULSE;
> > } else {
> > cpu_data->state_bins[idx_duration].intercepts += PULSE;
> >
> >
Overall, I'll respin the series.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events
2026-01-29 17:18 ` Rafael J. Wysocki
@ 2026-01-29 20:08 ` Rafael J. Wysocki
0 siblings, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2026-01-29 20:08 UTC (permalink / raw)
To: Christian Loehle; +Cc: Linux PM, LKML, Doug Smythies
On Thu, Jan 29, 2026 at 6:18 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jan 29, 2026 at 10:16 AM Christian Loehle
> <christian.loehle@arm.com> wrote:
> >
> > On 1/26/26 19:45, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > If differences between target residency values of adjacent idle states
> > > of a given CPU are relatively large, the corresponding idle state bins
> > > used by the teo governors are large either and the rule by which hits
> > > are distinguished from intercepts is inaccurate.
> > >
> > > Namely, by that rule, a wakeup event is classified as a hit if the
> > > sleep length (the time till the closest timer other than the tick)
> > > and the measured idle duration, adjusted for the entered idle state
> > > exit latency, fall into the same idle state bin. However, if that bin
> > > is large enough, the actual difference between the sleep length and
> > > the measured idle duration may be significant. It may in fact be
> > > significantly greater than the analogous difference for an event where
> > > the sleep length and the measured idle duration fall into different
> > > bins.
> > >
> > > For this reason, amend the rule in question with a check that will
> > > only allow a wakeup event to be counted as a hit if the difference
> > > between the sleep length and the measured idle duration is less than
> > > LATENCY_THRESHOLD_NS (which means that the difference between the
> > > sleep length and the raw measured idle duration is below the sum of
> > > LATENCY_THRESHOLD_NS and 1/2 of the entered idle state exit latency).
> > > Otherwise, the event will be counted as an intercept.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > v1.1 -> v2: No changes
> > >
> > > v1 -> v1.1
> > > * Drop the change in teo_select() along with the corresponding
> > > part of the changelog (after receiving testing feedback from
> > > Christian)
> > >
> > > This is a resend of
> > >
> > > https://lore.kernel.org/linux-pm/4707705.LvFx2qVVIh@rafael.j.wysocki/
> > >
> > > It applies on top of the first three patches from
> > >
> > > https://lore.kernel.org/linux-pm/2257365.irdbgypaU6@rafael.j.wysocki/
> > >
> > > ---
> > > drivers/cpuidle/governors/teo.c | 32 ++++++++++++++++----------------
> > > 1 file changed, 16 insertions(+), 16 deletions(-)
> > >
> > > --- a/drivers/cpuidle/governors/teo.c
> > > +++ b/drivers/cpuidle/governors/teo.c
> > > @@ -48,13 +48,11 @@
> > > * in accordance with what happened last time.
> > > *
> > > * The "hits" metric reflects the relative frequency of situations in which the
> > > - * sleep length and the idle duration measured after CPU wakeup fall into the
> > > - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
> > > - * length). In turn, the "intercepts" metric reflects the relative frequency of
> > > - * non-timer wakeup events for which the measured idle duration falls into a bin
> > > - * that corresponds to an idle state shallower than the one whose bin is fallen
> > > - * into by the sleep length (these events are also referred to as "intercepts"
> > > - * below).
> > > + * sleep length and the idle duration measured after CPU wakeup are close enough
> > > + * (that is, the CPU appears to wake up "on time" relative to the sleep length).
> > > + * In turn, the "intercepts" metric reflects the relative frequency of non-timer
> > > + * wakeup events for which the measured idle duration is measurably less than
> > > + * the sleep length (these events are also referred to as "intercepts" below).
> > > *
> > > * The governor also counts "intercepts" with the measured idle duration below
> > > * the tick period length and uses this information when deciding whether or not
> > > @@ -253,12 +251,16 @@ static void teo_update(struct cpuidle_dr
> > > }
> > >
> > > /*
> > > - * If the measured idle duration falls into the same bin as the sleep
> > > - * length, this is a "hit", so update the "hits" metric for that bin.
> > > + * If the measured idle duration falls into the same bin as the
> > > + * sleep length and the difference between them is less than
> > > + * LATENCY_THRESHOLD_NS, this is a "hit", so update the "hits"
> > > + * metric for that bin.
> > > + *
> > > * Otherwise, update the "intercepts" metric for the bin fallen into by
> > > * the measured idle duration.
> > > */
> > > - if (idx_timer == idx_duration) {
> > > + if (idx_timer == idx_duration &&
> > > + cpu_data->sleep_length_ns - measured_ns < LATENCY_THRESHOLD_NS) {
> >
> > So it needs to be within 7.5us here.
> > Can we always expect that to be true?
>
> It's just a margin.
>
> > Especially since measured_ns does this "infer average from worst-case exit
> > latency" handling.
> > On deeper states this
> > measured_ns -= lat_ns / 2;
> > is an order of magnitude higher than our threshold.
>
> True.
>
> > So it should probably be something like
> > exit_latency / 2 + LATENCY_THRESHOLD_NS?
> > Or just exit_latency and allow the error to both sides?
>
> Well, the exit latency is already there in this inequality because
> measured_ns == raw_measured_ns - exit_latency / 2 and I didn't want to
> take it into account twice.
>
> And in fact I want sleep_length_ns and measured_us (already adjusted
> for the entered state exit latency) to be equal up to a margin and I
> just think that the margin can be the same for all of the state bins
> because it's basically the granularity of the comparison.
Well, scratch the above paragraph.
The point is that cpu_data->sleep_length_ns should be less than
measured_ns (which means that the wakeup appears to have occurred
after the anticipated timer event) or at least not much greater than
it (the actual wakeup latency might be shorter than 1/2 of the
declared one due to a prewake or similar). How much sleep_length_ns
can be greater than measured_ns for the wakeup to still count as a
"hit" is, of course, a matter of choice and I thought that it would be
reasonable to use a constant limit.
However, the limit may as well be chosen to depend on the exit latency
of the entered state and it can be as large as 1/2 of that number (I
don't think that using a larger number would make a lot of sense).
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-29 20:09 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-26 19:41 [PATCH v2 0/2] cpuidle: governors: teo: Wakeup events classification change and refinement Rafael J. Wysocki
2026-01-26 19:45 ` [PATCH v2 1/2] cpuidle: governors: teo: Adjust the classification of wakeup events Rafael J. Wysocki
2026-01-29 9:16 ` Christian Loehle
2026-01-29 17:18 ` Rafael J. Wysocki
2026-01-29 20:08 ` Rafael J. Wysocki
2026-01-26 19:51 ` [PATCH v2 2/2] cpuidle: governors: teo: Refine intercepts-based idle state lookup Rafael J. Wysocki
2026-01-29 9:19 ` Christian Loehle
2026-01-29 13:21 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox