* [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update
@ 2025-01-13 18:32 Rafael J. Wysocki
2025-01-13 18:34 ` [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code Rafael J. Wysocki
` (9 more replies)
0 siblings, 10 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:32 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
Hi Everyone,
This supersedes
https://lore.kernel.org/linux-pm/4953183.GXAFRqVoOG@rjwysocki.net/
but because the majority of patches in it are new, I've decided to count
version numbers back from 1.
This addresses a relatively recently added inconsistency in behavior of the teo
governor regarding the handling of very frequent wakeups handling (patch [7/9])
and makes some other changes that may be regarded as cleanups.
Please review.
Thanks!
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
@ 2025-01-13 18:34 ` Rafael J. Wysocki
2025-01-15 14:21 ` Christian Loehle
2025-01-13 18:36 ` [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks Rafael J. Wysocki
` (8 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:34 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rearrange code in the idle state lookup loop in teo_select() to make it
somewhat easier to follow and update comments around it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
This is the same patch as
https://lore.kernel.org/linux-pm/3332506.aeNJFYEL58@rjwysocki.net/
---
drivers/cpuidle/governors/teo.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -367,7 +367,7 @@
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
- * all of the deeper states a shallower idle state is likely to be a
+ * all of the deeper states, a shallower idle state is likely to be a
* better choice.
*/
prev_intercept_idx = idx;
@@ -396,30 +396,36 @@
* first enabled state that is deep enough.
*/
if (teo_state_ok(i, drv) &&
- !dev->states_usage[i].disable)
+ !dev->states_usage[i].disable) {
idx = i;
- else
- idx = first_suitable_idx;
-
+ break;
+ }
+ idx = first_suitable_idx;
break;
}
if (dev->states_usage[i].disable)
continue;
- if (!teo_state_ok(i, drv)) {
+ if (teo_state_ok(i, drv)) {
/*
- * The current state is too shallow, but if an
- * alternative candidate state has been found,
- * it may still turn out to be a better choice.
+ * The current state is deep enough, but still
+ * there may be a better one.
*/
- if (first_suitable_idx != idx)
- continue;
-
- break;
+ first_suitable_idx = i;
+ continue;
}
- first_suitable_idx = i;
+ /*
+ * The current state is too shallow, so if no suitable
+ * states other than the initial candidate have been
+ * found, give up (the remaining states to check are
+ * shallower still), but otherwise the first suitable
+ * state other than the initial candidate may turn out
+ * to be preferable.
+ */
+ if (first_suitable_idx == idx)
+ break;
}
}
if (!idx && prev_intercept_idx) {
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
2025-01-13 18:34 ` [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code Rafael J. Wysocki
@ 2025-01-13 18:36 ` Rafael J. Wysocki
2025-01-15 14:46 ` Christian Loehle
2025-01-16 13:27 ` Christian Loehle
2025-01-13 18:39 ` [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0 Rafael J. Wysocki
` (7 subsequent siblings)
9 siblings, 2 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:36 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since constraint_idx may be 0, the candidate state index may change to 0
after assigning constraint_idx to it, so first check if it is greater
than constraint_idx (and update it if so) and then check it against 0.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
This is a rebased variant of
https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
---
drivers/cpuidle/governors/teo.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -428,6 +428,14 @@
break;
}
}
+
+ /*
+ * If there is a latency constraint, it may be necessary to select an
+ * idle state shallower than the current candidate one.
+ */
+ if (idx > constraint_idx)
+ idx = constraint_idx;
+
if (!idx && prev_intercept_idx) {
/*
* We have to query the sleep length here otherwise we don't
@@ -439,13 +447,6 @@
}
/*
- * If there is a latency constraint, it may be necessary to select an
- * idle state shallower than the current candidate one.
- */
- if (idx > constraint_idx)
- idx = constraint_idx;
-
- /*
* Skip the timers check if state 0 is the current candidate one,
* because an immediate non-timer wakeup is expected in that case.
*/
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
2025-01-13 18:34 ` [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code Rafael J. Wysocki
2025-01-13 18:36 ` [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks Rafael J. Wysocki
@ 2025-01-13 18:39 ` Rafael J. Wysocki
2025-01-15 19:44 ` Christian Loehle
2025-01-13 18:40 ` [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx Rafael J. Wysocki
` (6 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:39 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are two candidate state index checks against 0 in teo_select()
that need not be separate, so combine them and update comments around
them.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
This is a rebased variant of
https://lore.kernel.org/linux-pm/2296767.iZASKD2KPV@rjwysocki.net/
---
drivers/cpuidle/governors/teo.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -436,24 +436,19 @@
if (idx > constraint_idx)
idx = constraint_idx;
- if (!idx && prev_intercept_idx) {
- /*
- * We have to query the sleep length here otherwise we don't
- * know after wakeup if our guess was correct.
- */
- duration_ns = tick_nohz_get_sleep_length(&delta_tick);
- cpu_data->sleep_length_ns = duration_ns;
+ if (!idx) {
+ if (prev_intercept_idx) {
+ /*
+ * Query the sleep length to be able to count the wakeup
+ * as a hit if it is caused by a timer.
+ */
+ duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+ cpu_data->sleep_length_ns = duration_ns;
+ }
goto out_tick;
}
/*
- * Skip the timers check if state 0 is the current candidate one,
- * because an immediate non-timer wakeup is expected in that case.
- */
- if (!idx)
- goto out_tick;
-
- /*
* If state 0 is a polling one, check if the target residency of
* the current candidate state is low enough and skip the timers
* check in that case too.
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (2 preceding siblings ...)
2025-01-13 18:39 ` [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0 Rafael J. Wysocki
@ 2025-01-13 18:40 ` Rafael J. Wysocki
2025-01-15 19:46 ` Christian Loehle
2025-01-13 18:41 ` [PATCH v1 5/9] cpuidle: teo: Clarify two code comments Rafael J. Wysocki
` (5 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:40 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Local variable prev_intercept_idx in teo_select() is redundant becase
it cannot be 0 when candidate state index is 0.
The prev_intercept_idx value is the index of the deepest enabled idle
state, so if it is 0, state 0 is the deepest enabled idle state, in
which case it must be the only enabled idle state, but then teo_select()
would have returned early before initializing prev_intercept_idx.
Thus prev_intercept_idx must be nonzero and the check of it against 0
always passes, so it can be dropped altogether.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -292,7 +292,6 @@
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
- int prev_intercept_idx;
s64 duration_ns;
int i;
@@ -370,7 +369,6 @@
* all of the deeper states, a shallower idle state is likely to be a
* better choice.
*/
- prev_intercept_idx = idx;
if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
int first_suitable_idx = idx;
@@ -437,14 +435,11 @@
idx = constraint_idx;
if (!idx) {
- if (prev_intercept_idx) {
- /*
- * Query the sleep length to be able to count the wakeup
- * as a hit if it is caused by a timer.
- */
- duration_ns = tick_nohz_get_sleep_length(&delta_tick);
- cpu_data->sleep_length_ns = duration_ns;
- }
+ /*
+ * Query the sleep length to be able to count the wakeup as a
+ * hit if it is caused by a timer.
+ */
+ cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
goto out_tick;
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 5/9] cpuidle: teo: Clarify two code comments
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (3 preceding siblings ...)
2025-01-13 18:40 ` [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx Rafael J. Wysocki
@ 2025-01-13 18:41 ` Rafael J. Wysocki
2025-01-15 19:43 ` Christian Loehle
2025-01-13 18:45 ` [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management Rafael J. Wysocki
` (4 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:41 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rewrite two code comments suposed to explain its behavior that are too
concise or not sufficiently clear.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -154,9 +154,10 @@
if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
/*
- * One of the safety nets has triggered or the wakeup was close
- * enough to the closest timer event expected at the idle state
- * selection time to be discarded.
+ * This causes the wakeup to be counted as a hit regardless of
+ * regardless of the real idle duration which doesn't need to be
+ * computed because the wakeup has been close enough to an
+ * anticipated timer.
*/
measured_ns = U64_MAX;
} else {
@@ -302,8 +303,13 @@
cpu_data->time_span_ns = local_clock();
/*
- * Set the expected sleep length to infinity in case of an early
- * return.
+ * Set the sleep length to infitity in case the invocation of
+ * tick_nohz_get_sleep_length() below is skipped, in which case it won't
+ * be known whether or not the subsequent wakeup is caused by a timer.
+ * It is generally fine to count the wakeup as an intercept then, except
+ * for the cases when the CPU is mostly woken up by timers and there may
+ * be opportunities to ask for a deeper idle state when no imminent
+ * timers are scheduled which may be missed.
*/
cpu_data->sleep_length_ns = KTIME_MAX;
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (4 preceding siblings ...)
2025-01-13 18:41 ` [PATCH v1 5/9] cpuidle: teo: Clarify two code comments Rafael J. Wysocki
@ 2025-01-13 18:45 ` Rafael J. Wysocki
2025-01-20 11:27 ` Christian Loehle
2025-01-13 18:48 ` [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent Rafael J. Wysocki
` (3 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:45 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace the tick_hits metric with a new tick_intercepts one that can be
used directly when deciding whether or not to stop the scheduler tick
and update the governor functional description accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 49 +++++++++++-----------------------------
1 file changed, 14 insertions(+), 35 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -41,11 +41,7 @@
* idle state 2, the third bin spans from the target residency of idle state 2
* up to, but not including, the target residency of idle state 3 and so on.
* The last bin spans from the target residency of the deepest idle state
- * supplied by the driver to the scheduler tick period length or to infinity if
- * the tick period length is less than the targer residency of that state. In
- * the latter case, the governor also counts events with the measured idle
- * duration between the tick period length and the target residency of the
- * deepest idle state.
+ * supplied by the driver to infinity.
*
* Two metrics called "hits" and "intercepts" are associated with each bin.
* They are updated every time before selecting an idle state for the given CPU
@@ -60,6 +56,10 @@
* into by the sleep length (these events are also referred to as "intercepts"
* below).
*
+ * The governor also counts "intercepts" with the measured idle duration below
+ * the tick period length and uses this information when deciding whether or not
+ * to stop the scheduler tick.
+ *
* In order to select an idle state for a CPU, the governor takes the following
* steps (modulo the possible latency constraint that must be taken into account
* too):
@@ -128,14 +128,14 @@
* @sleep_length_ns: Time till the closest timer event (at the selection time).
* @state_bins: Idle state data bins for this CPU.
* @total: Grand total of the "intercepts" and "hits" metrics for all bins.
- * @tick_hits: Number of "hits" after TICK_NSEC.
+ * @tick_intercepts: "Intercepts" before TICK_NSEC.
*/
struct teo_cpu {
s64 time_span_ns;
s64 sleep_length_ns;
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
- unsigned int tick_hits;
+ unsigned int tick_intercepts;
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -207,38 +207,21 @@
}
}
- /*
- * If the deepest state's target residency is below the tick length,
- * make a record of it to help teo_select() decide whether or not
- * to stop the tick. This effectively adds an extra hits-only bin
- * beyond the last state-related one.
- */
- if (target_residency_ns < TICK_NSEC) {
- cpu_data->tick_hits -= cpu_data->tick_hits >> DECAY_SHIFT;
-
- cpu_data->total += cpu_data->tick_hits;
-
- if (TICK_NSEC <= cpu_data->sleep_length_ns) {
- idx_timer = drv->state_count;
- if (TICK_NSEC <= measured_ns) {
- cpu_data->tick_hits += PULSE;
- goto end;
- }
- }
- }
-
+ cpu_data->tick_intercepts -= cpu_data->tick_intercepts >> DECAY_SHIFT;
/*
* If the measured idle duration falls into the same bin as the sleep
* length, this is a "hit", so update the "hits" metric for that bin.
* Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration.
*/
- if (idx_timer == idx_duration)
+ if (idx_timer == idx_duration) {
cpu_data->state_bins[idx_timer].hits += PULSE;
- else
+ } else {
cpu_data->state_bins[idx_duration].intercepts += PULSE;
+ if (TICK_NSEC <= measured_ns)
+ cpu_data->tick_intercepts += PULSE;
+ }
-end:
cpu_data->total += PULSE;
}
@@ -286,7 +269,6 @@
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
ktime_t delta_tick = TICK_NSEC / 2;
- unsigned int tick_intercept_sum = 0;
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
unsigned int idx_hit_sum = 0;
@@ -365,9 +347,6 @@
goto end;
}
- tick_intercept_sum = intercept_sum +
- cpu_data->state_bins[drv->state_count-1].intercepts;
-
/*
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the
@@ -477,7 +456,7 @@
* total wakeup events, do not stop the tick.
*/
if (drv->states[idx].target_residency_ns < TICK_NSEC &&
- tick_intercept_sum > cpu_data->total / 2 + cpu_data->total / 8)
+ cpu_data->tick_intercepts > cpu_data->total / 2 + cpu_data->total / 8)
duration_ns = TICK_NSEC / 2;
end:
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (5 preceding siblings ...)
2025-01-13 18:45 ` [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management Rafael J. Wysocki
@ 2025-01-13 18:48 ` Rafael J. Wysocki
2025-01-20 12:08 ` Christian Loehle
2025-01-13 18:50 ` [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count Rafael J. Wysocki
` (2 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:48 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length()
call in some cases") attempted to reduce the governor overhead in some
cases by making it avoid obtaining the sleep length (the time till the
next timer event) which may be costly.
Among other things, after the above commit, tick_nohz_get_sleep_length()
was not called any more when idle state 0 was to be returned, which
turned out to be problematic and the previous behavior in that respect
was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non-
existent intercepts").
However, commit 6da8f9ba5a87 also caused the governor to avoid calling
tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling"
one (that is, it is not really an idle state, but a loop continuously
executed by the CPU) when the target residency of the idle state to be
returned was low enough, so there was no practical need to refine the
idle state selection in any way. This change was not removed by the
other commit, so now on systems where idle state 0 is a "polling" one,
tick_nohz_get_sleep_length() is called when idle state 0 is to be
returned, but it is not called when a deeper idle state with
sufficiently low target residency is to be returned. That is arguably
confusing and inconsistent.
Moreover, there is no specific reason why the behavior in question
should depend on whether or not idle state 0 is a "polling" one.
One way to address this would be to make the governor always call
tick_nohz_get_sleep_length() to obtain the sleep length, but that would
effectively mean reverting commit 6da8f9ba5a87 and restoring the latency
issue that was the reason for doing it. This approach is thus not
particularly attractive.
To address it differently, notice that if a CPU is woken up very often,
this is not likely to be caused by timers in the first place (user space
has a default timer slack of 50 us and there are relatively few timers
with a deadline shorter than several microseconds in the kernel) and
even if it were the case, the potential benefit from using a deep idle
state would then be questionable for latency reasons. Therefore, if the
majority of CPU wakeups occur within several microseconds, it can be
assumed that all wakeups in that range are non-timer and the sleep
length need not be determined.
Accordingly, introduce a new metric for counting wakeups with the
measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle
state selection to skip the tick_nohz_get_sleep_length() invocation if
idle state 0 has been selected or the target residency of the candidate
idle state is below RESIDENCY_THRESHOLD_NS and the value of the new
metric is at least 1/2 of the total event count.
Since the above requires the measured idle duration to be determined
every time, except for the cases when one of the safety nets has
triggered in which the wakeup is counted as a hit in the deepest
idle state idle residency range, update the handling of those cases
to avoid skipping the idle duration computation when the CPU wakeup
is "genuine".
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 58 ++++++++++++++++++++++++----------------
1 file changed, 36 insertions(+), 22 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -129,6 +129,7 @@
* @state_bins: Idle state data bins for this CPU.
* @total: Grand total of the "intercepts" and "hits" metrics for all bins.
* @tick_intercepts: "Intercepts" before TICK_NSEC.
+ * @short_idle: Wakeups after short idle periods.
*/
struct teo_cpu {
s64 time_span_ns;
@@ -136,6 +137,7 @@
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
unsigned int tick_intercepts;
+ unsigned int short_idle;
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -152,12 +154,12 @@
s64 target_residency_ns;
u64 measured_ns;
- if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
+ cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
+
+ if (cpu_data->time_span_ns < 0) {
/*
- * This causes the wakeup to be counted as a hit regardless of
- * regardless of the real idle duration which doesn't need to be
- * computed because the wakeup has been close enough to an
- * anticipated timer.
+ * If one of the safety nets has triggered, assume that this
+ * might have been a long sleep.
*/
measured_ns = U64_MAX;
} else {
@@ -177,10 +179,14 @@
* time, so take 1/2 of the exit latency as a very rough
* approximation of the average of it.
*/
- if (measured_ns >= lat_ns)
+ if (measured_ns >= lat_ns) {
measured_ns -= lat_ns / 2;
- else
+ if (measured_ns < RESIDENCY_THRESHOLD_NS)
+ cpu_data->short_idle += PULSE;
+ } else {
measured_ns /= 2;
+ cpu_data->short_idle += PULSE;
+ }
}
cpu_data->total = 0;
@@ -419,27 +425,35 @@
if (idx > constraint_idx)
idx = constraint_idx;
- if (!idx) {
- /*
- * Query the sleep length to be able to count the wakeup as a
- * hit if it is caused by a timer.
- */
- cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
- goto out_tick;
- }
-
/*
- * If state 0 is a polling one, check if the target residency of
- * the current candidate state is low enough and skip the timers
- * check in that case too.
+ * If either the candidate state is state 0 or its target residency is
+ * low enough, there is basically nothing more to do, but if the sleep
+ * length is not updated, the subsequent wakeup will be counted as an
+ * "intercept" which may be problematic in the cases when timer wakeups
+ * are dominant. Namely, it may effectively prevent deeper idle states
+ * from being selected at one point even if no imminent timers are
+ * scheduled.
+ *
+ * However, frequent timers in the RESIDENCY_THRESHOLD_NS range on one
+ * CPU are unlikely (user space has a default 50 us slack value for
+ * hrtimers and there are relatively few timers with a lower deadline
+ * value in the kernel), and even if they did happene, the potential
+ * benefit from using a deep idle state in that case would be
+ * questionable anyway for latency reasons. Thus if the measured idle
+ * duration falls into that range in the majority of cases, assume
+ * non-timer wakeups to be dominant and skip updating the sleep length
+ * to reduce latency.
*/
- if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
- drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
+ if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
+ 2 * cpu_data->short_idle >= cpu_data->total)
goto out_tick;
duration_ns = tick_nohz_get_sleep_length(&delta_tick);
cpu_data->sleep_length_ns = duration_ns;
+ if (!idx)
+ goto out_tick;
+
/*
* If the closest expected timer is before the target residency of the
* candidate state, a shallower one needs to be found.
@@ -501,7 +515,7 @@
if (dev->poll_time_limit ||
(tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
dev->poll_time_limit = false;
- cpu_data->time_span_ns = cpu_data->sleep_length_ns;
+ cpu_data->time_span_ns = KTIME_MIN;
} else {
cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (6 preceding siblings ...)
2025-01-13 18:48 ` [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent Rafael J. Wysocki
@ 2025-01-13 18:50 ` Rafael J. Wysocki
2025-01-20 12:10 ` Christian Loehle
2025-01-13 18:51 ` [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag Rafael J. Wysocki
2025-01-15 14:52 ` [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Christian Loehle
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:50 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Instead of computing the total events count from scratch every time,
decay it and add a PULSE value to it in teo_update().
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -189,8 +189,6 @@
}
}
- cpu_data->total = 0;
-
/*
* Decay the "hits" and "intercepts" metrics for all of the bins and
* find the bins that the sleep length and the measured idle duration
@@ -202,8 +200,6 @@
bin->hits -= bin->hits >> DECAY_SHIFT;
bin->intercepts -= bin->intercepts >> DECAY_SHIFT;
- cpu_data->total += bin->hits + bin->intercepts;
-
target_residency_ns = drv->states[i].target_residency_ns;
if (target_residency_ns <= cpu_data->sleep_length_ns) {
@@ -228,6 +224,7 @@
cpu_data->tick_intercepts += PULSE;
}
+ cpu_data->total -= cpu_data->total >> DECAY_SHIFT;
cpu_data->total += PULSE;
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (7 preceding siblings ...)
2025-01-13 18:50 ` [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count Rafael J. Wysocki
@ 2025-01-13 18:51 ` Rafael J. Wysocki
2025-01-20 12:16 ` Christian Loehle
2025-01-15 14:52 ` [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Christian Loehle
9 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-13 18:51 UTC (permalink / raw)
To: Linux PM; +Cc: LKML, Daniel Lezcano, Christian Loehle, Artem Bityutskiy
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After recent updates, the time_span_ns field in struct teo_cpu has
become an indicator on whether or not the most recent wakeup has been
"genuine" which may as well be indicated by a bool field without
calling local_clock(), so update the code accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -124,20 +124,20 @@
/**
* struct teo_cpu - CPU data used by the TEO cpuidle governor.
- * @time_span_ns: Time between idle state selection and post-wakeup update.
* @sleep_length_ns: Time till the closest timer event (at the selection time).
* @state_bins: Idle state data bins for this CPU.
* @total: Grand total of the "intercepts" and "hits" metrics for all bins.
* @tick_intercepts: "Intercepts" before TICK_NSEC.
* @short_idle: Wakeups after short idle periods.
+ * @artificial_wakeup: Set if the wakeup has been triggered by a safety net.
*/
struct teo_cpu {
- s64 time_span_ns;
s64 sleep_length_ns;
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
unsigned int tick_intercepts;
unsigned int short_idle;
+ bool artificial_wakeup;
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -156,7 +156,7 @@
cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
- if (cpu_data->time_span_ns < 0) {
+ if (cpu_data->artificial_wakeup) {
/*
* If one of the safety nets has triggered, assume that this
* might have been a long sleep.
@@ -165,13 +165,6 @@
} else {
u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
- /*
- * The computations below are to determine whether or not the
- * (saved) time till the next timer event and the measured idle
- * duration fall into the same "bin", so use last_residency_ns
- * for that instead of time_span_ns which includes the cpuidle
- * overhead.
- */
measured_ns = dev->last_residency_ns;
/*
* The delay between the wakeup and the first instruction
@@ -286,7 +279,6 @@
dev->last_state_idx = -1;
}
- cpu_data->time_span_ns = local_clock();
/*
* Set the sleep length to infitity in case the invocation of
* tick_nohz_get_sleep_length() below is skipped, in which case it won't
@@ -504,17 +496,16 @@
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
dev->last_state_idx = state;
- /*
- * If the wakeup was not "natural", but triggered by one of the safety
- * nets, assume that the CPU might have been idle for the entire sleep
- * length time.
- */
if (dev->poll_time_limit ||
(tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
+ /*
+ * The wakeup was not "geniune", but triggered by one of the
+ * safety nets.
+ */
dev->poll_time_limit = false;
- cpu_data->time_span_ns = KTIME_MIN;
+ cpu_data->artificial_wakeup = true;
} else {
- cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
+ cpu_data->artificial_wakeup = false;
}
}
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code
2025-01-13 18:34 ` [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code Rafael J. Wysocki
@ 2025-01-15 14:21 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 14:21 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:34, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Rearrange code in the idle state lookup loop in teo_select() to make it
> somewhat easier to follow and update comments around it.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
>
> This is the same patch as
>
> https://lore.kernel.org/linux-pm/3332506.aeNJFYEL58@rjwysocki.net/
>
> ---
> drivers/cpuidle/governors/teo.c | 34 ++++++++++++++++++++--------------
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -367,7 +367,7 @@
> * If the sum of the intercepts metric for all of the idle states
> * shallower than the current candidate one (idx) is greater than the
> * sum of the intercepts and hits metrics for the candidate state and
> - * all of the deeper states a shallower idle state is likely to be a
> + * all of the deeper states, a shallower idle state is likely to be a
> * better choice.
> */
> prev_intercept_idx = idx;
> @@ -396,30 +396,36 @@
> * first enabled state that is deep enough.
> */
> if (teo_state_ok(i, drv) &&
> - !dev->states_usage[i].disable)
> + !dev->states_usage[i].disable) {
> idx = i;
> - else
> - idx = first_suitable_idx;
> -
> + break;
> + }
> + idx = first_suitable_idx;
> break;
> }
>
> if (dev->states_usage[i].disable)
> continue;
>
> - if (!teo_state_ok(i, drv)) {
> + if (teo_state_ok(i, drv)) {
> /*
> - * The current state is too shallow, but if an
> - * alternative candidate state has been found,
> - * it may still turn out to be a better choice.
> + * The current state is deep enough, but still
> + * there may be a better one.
> */
> - if (first_suitable_idx != idx)
> - continue;
> -
> - break;
> + first_suitable_idx = i;
> + continue;
> }
>
> - first_suitable_idx = i;
> + /*
> + * The current state is too shallow, so if no suitable
> + * states other than the initial candidate have been
> + * found, give up (the remaining states to check are
> + * shallower still), but otherwise the first suitable
> + * state other than the initial candidate may turn out
> + * to be preferable.
> + */
> + if (first_suitable_idx == idx)
> + break;
> }
> }
> if (!idx && prev_intercept_idx) {
>
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-13 18:36 ` [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks Rafael J. Wysocki
@ 2025-01-15 14:46 ` Christian Loehle
2025-01-15 15:54 ` Rafael J. Wysocki
2025-01-16 13:27 ` Christian Loehle
1 sibling, 1 reply; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 14:46 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:36, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since constraint_idx may be 0, the candidate state index may change to 0
> after assigning constraint_idx to it, so first check if it is greater
> than constraint_idx (and update it if so) and then check it against 0.
So the reason I've left this where it was is because the prev_intercept_idx
was supposed to query the sleep length if we're in an majority-intercept
period and then it makes sense to query the sleep length (to detect such
a period being over).
A constraint_idx == 0 scenario doesn't need the intercept-machinery to
work at all, why are we querying the sleep length then?
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a rebased variant of
>
> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
>
> ---
> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -428,6 +428,14 @@
> break;
> }
> }
> +
> + /*
> + * If there is a latency constraint, it may be necessary to select an
> + * idle state shallower than the current candidate one.
> + */
> + if (idx > constraint_idx)
> + idx = constraint_idx;
> +
> if (!idx && prev_intercept_idx) {
> /*
> * We have to query the sleep length here otherwise we don't
> @@ -439,13 +447,6 @@
> }
>
> /*
> - * If there is a latency constraint, it may be necessary to select an
> - * idle state shallower than the current candidate one.
> - */
> - if (idx > constraint_idx)
> - idx = constraint_idx;
> -
> - /*
We could leave this here and just do goto end;?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
` (8 preceding siblings ...)
2025-01-13 18:51 ` [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag Rafael J. Wysocki
@ 2025-01-15 14:52 ` Christian Loehle
2025-01-20 8:17 ` Aboorva Devarajan
2025-01-20 16:24 ` Rafael J. Wysocki
9 siblings, 2 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 14:52 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM, Aboorva Devarajan
Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:32, Rafael J. Wysocki wrote:
> Hi Everyone,
>
> This supersedes
>
> https://lore.kernel.org/linux-pm/4953183.GXAFRqVoOG@rjwysocki.net/
>
> but because the majority of patches in it are new, I've decided to count
> version numbers back from 1.
>
> This addresses a relatively recently added inconsistency in behavior of the teo
> governor regarding the handling of very frequent wakeups handling (patch [7/9])
> and makes some other changes that may be regarded as cleanups.
>
> Please review.
Hi Rafael,
that looks promising. I'll review the individual patches in detail now, but
I let a few tests run overnight and can report that there's no significant
behaviour change with the series on an arm64 (no polling state, rk3399), which
is my expected result.
I'll get something running on a system with a polling state as well.
(I don't have a POWER system, so that will just be x86, adding Aboorva.)
Tested-by: Christian Loehle <christian.loehle@arm.com>
for the entire series
I'll just dump some of the raw results here for reference.
teo-X is teo with everything up to X/9 applied. teo-m is mainline, teo is
the entire series (equivalent to teo-9).
Everything is done with fio direct random4k read. Range of real devices
with varying latency, mapper/slow is 51ms timer delay, nullb0 is
essentially CPU-bound, few intercepts expected.
--
device gov iter iops idles idle_misses idle_miss_ratio belows aboves
mmcblk1 menu 0 1284 344126 106309 0.309 89264 17045
mmcblk1 menu 1 1340 358646 109469 0.305 91726 17743
mmcblk1 menu 2 1316 348812 106773 0.306 89845 16928
mmcblk1 teo 0 2023 559551 30053 0.054 2962 27091
mmcblk1 teo 1 2007 548422 25816 0.047 2113 23703
mmcblk1 teo 2 2010 557120 29434 0.053 2782 26652
mmcblk1 teo-m 0 2010 542790 23336 0.043 1720 21616
mmcblk1 teo-m 1 2009 554912 29239 0.053 2750 26489
mmcblk1 teo-m 2 2013 548384 26444 0.048 2180 24264
mmcblk1 teo-1 0 2030 524056 3571 0.007 610 2961
mmcblk1 teo-1 1 2022 559304 27074 0.048 3373 23701
mmcblk1 teo-1 2 2009 554744 27406 0.049 2964 24442
mmcblk1 teo-2 0 2012 542886 18233 0.034 2140 16093
mmcblk1 teo-2 1 2010 554094 27680 0.050 2957 24723
mmcblk1 teo-2 2 2010 555222 28549 0.051 3085 25464
mmcblk1 teo-3 0 2011 556816 28708 0.052 3076 25632
mmcblk1 teo-3 1 2015 546998 20280 0.037 2414 17866
mmcblk1 teo-3 2 2020 549758 20802 0.038 2662 18140
mmcblk1 teo-4 0 2026 536464 11692 0.022 1457 10235
mmcblk1 teo-4 1 2031 557978 25464 0.046 2928 22536
mmcblk1 teo-4 2 2012 553812 26323 0.048 2943 23380
mmcblk1 teo-5 0 2021 557966 3454 0.006 3003 451
mmcblk1 teo-5 1 2027 546630 18400 0.034 2303 16097
mmcblk1 teo-5 2 2012 554390 28013 0.051 2990 25023
mmcblk1 teo-6 0 2037 539956 12764 0.024 1701 11063
mmcblk1 teo-6 1 2019 556202 29074 0.052 3071 26003
mmcblk1 teo-6 2 2017 555004 26814 0.048 2869 23945
mmcblk1 teo-7 0 2028 545802 23806 0.044 1625 22181
mmcblk1 teo-7 1 2019 557030 29913 0.054 2990 26923
mmcblk1 teo-7 2 2018 553236 28534 0.052 2657 25877
mmcblk1 teo-8 0 2018 554662 28595 0.052 2660 25935
mmcblk1 teo-8 1 2014 554646 28947 0.052 2721 26226
mmcblk1 teo-8 2 2014 543828 29379 0.054 2915 26464
mmcblk1 teo-9 0 2013 555356 28867 0.052 2719 26148
mmcblk1 teo-9 1 2015 557254 29709 0.053 2836 26873
mmcblk1 teo-9 2 2030 559272 29321 0.052 3052 26269
mmcblk2 menu 0 2885 434990 140864 0.324 123507 17357
mmcblk2 menu 1 2766 450918 152931 0.339 135384 17547
mmcblk2 menu 2 2960 407712 125573 0.308 109224 16349
mmcblk2 teo 0 5677 741006 24131 0.033 2233 21898
mmcblk2 teo 1 5652 714494 24412 0.034 2680 21732
mmcblk2 teo 2 5660 752410 20497 0.027 1809 18688
mmcblk2 teo-m 0 5683 855232 23454 0.027 1934 21520
mmcblk2 teo-m 1 5588 517234 32016 0.062 5443 26573
mmcblk2 teo-m 2 5654 668648 29108 0.044 3171 25937
mmcblk2 teo-1 0 5666 773452 25740 0.033 3430 22310
mmcblk2 teo-1 1 5638 668190 27619 0.041 4150 23469
mmcblk2 teo-1 2 5689 866340 28191 0.033 3122 25069
mmcblk2 teo-2 0 5769 859462 2116 0.002 1835 281
mmcblk2 teo-2 1 5577 611956 25490 0.042 8647 16843
mmcblk2 teo-2 2 5665 721308 25826 0.036 2876 22950
mmcblk2 teo-3 0 5677 805946 30345 0.038 3877 26468
mmcblk2 teo-3 1 5752 855840 14392 0.017 2588 11804
mmcblk2 teo-3 2 5740 825580 27403 0.033 3750 23653
mmcblk2 teo-4 0 5664 777848 3435 0.004 3229 206
mmcblk2 teo-4 1 5660 728926 25796 0.035 2871 22925
mmcblk2 teo-4 2 5689 774342 7844 0.010 2838 5006
mmcblk2 teo-5 0 5672 781618 2733 0.003 2551 182
mmcblk2 teo-5 1 5693 866530 28658 0.033 3332 25326
mmcblk2 teo-5 2 5687 837294 12172 0.015 2338 9834
mmcblk2 teo-6 0 5859 892798 5226 0.006 3566 1660
mmcblk2 teo-6 1 5696 864197 24962 0.029 2796 22166
mmcblk2 teo-6 2 5671 788200 26612 0.034 3157 23455
mmcblk2 teo-7 0 5617 525864 30143 0.057 3739 26404
mmcblk2 teo-7 1 5732 802794 25553 0.032 3240 22313
mmcblk2 teo-7 2 5838 858620 12661 0.015 188 12473
mmcblk2 teo-8 0 5696 780894 24440 0.031 2928 21512
mmcblk2 teo-8 1 5868 862794 12978 0.015 187 12791
mmcblk2 teo-8 2 5607 660124 31632 0.048 6728 24904
mmcblk2 teo-9 0 5652 669318 28330 0.042 2692 25638
mmcblk2 teo-9 1 5657 710892 29291 0.041 3233 26058
mmcblk2 teo-9 2 5594 740988 37086 0.050 10275 26811
nvme0n1 menu 0 4775 384186 70675 0.184 54346 16329
nvme0n1 menu 1 5211 409374 69545 0.170 53484 16061
nvme0n1 menu 2 7213 535088 81312 0.152 64253 17059
nvme0n1 teo 0 10655 754882 29476 0.039 2960 26516
nvme0n1 teo 1 10627 755322 29324 0.039 3011 26313
nvme0n1 teo 2 10559 749498 29850 0.040 3265 26585
nvme0n1 teo-m 0 10353 738802 29937 0.041 2996 26941
nvme0n1 teo-m 1 10629 748148 27076 0.036 2349 24727
nvme0n1 teo-m 2 10477 743256 28193 0.038 2645 25548
nvme0n1 teo-1 0 10947 772905 24896 0.032 3150 21746
nvme0n1 teo-1 1 10412 732894 21868 0.030 2423 19445
nvme0n1 teo-1 2 10530 748377 26338 0.035 3479 22859
nvme0n1 teo-2 0 10570 751961 5195 0.007 2853 2342
nvme0n1 teo-2 1 10482 732428 18667 0.025 2102 16565
nvme0n1 teo-2 2 10829 768292 26400 0.034 3189 23211
nvme0n1 teo-3 0 10493 746638 26371 0.035 3320 23051
nvme0n1 teo-3 1 10445 742924 27871 0.038 3027 24844
nvme0n1 teo-3 2 10920 775112 25646 0.033 3520 22126
nvme0n1 teo-4 0 10500 734792 18125 0.025 2283 15842
nvme0n1 teo-4 1 11091 783828 25904 0.033 3031 22873
nvme0n1 teo-4 2 10413 736248 23786 0.032 2660 21126
nvme0n1 teo-5 0 10479 744340 26933 0.036 3096 23837
nvme0n1 teo-5 1 10740 764188 27065 0.035 3062 24003
nvme0n1 teo-5 2 10500 747060 25962 0.035 3233 22729
nvme0n1 teo-6 0 10715 757772 2186 0.003 1872 314
nvme0n1 teo-6 1 10485 734878 17620 0.024 2174 15446
nvme0n1 teo-6 2 10478 744106 26909 0.036 3007 23902
nvme0n1 teo-7 0 10549 750022 30416 0.041 3320 27096
nvme0n1 teo-7 1 10611 752182 29877 0.040 3332 26545
nvme0n1 teo-7 2 11170 791572 30790 0.039 3453 27337
nvme0n1 teo-8 0 10622 752523 29002 0.039 2745 26257
nvme0n1 teo-8 1 10424 739904 28641 0.039 2756 25885
nvme0n1 teo-8 2 10533 731440 22101 0.030 1280 20821
nvme0n1 teo-9 0 10367 727768 24895 0.034 1820 23075
nvme0n1 teo-9 1 10815 766230 29378 0.038 3136 26242
nvme0n1 teo-9 2 10426 739224 28173 0.038 2563 25610
sda menu 0 872 526240 175645 0.334 159373 16272
sda menu 1 900 536434 188122 0.351 170749 17373
sda menu 2 901 534826 189778 0.355 171745 18033
sda teo 0 1687 999108 16015 0.016 1249 14766
sda teo 1 1668 1007551 31316 0.031 5809 25507
sda teo 2 1682 985830 18784 0.019 1003 17781
sda teo-m 0 1676 969984 25825 0.027 3191 22634
sda teo-m 1 1682 995266 30340 0.030 4040 26300
sda teo-m 2 1681 940562 20830 0.022 1508 19322
sda teo-1 0 1680 996214 15553 0.016 2192 13361
sda teo-1 1 1674 1050168 28033 0.027 4011 24022
sda teo-1 2 1665 808202 35292 0.044 8873 26419
sda teo-2 0 1689 1013360 546 0.001 243 303
sda teo-2 1 1672 1033806 29432 0.028 4363 25069
sda teo-2 2 1667 1046100 31110 0.030 6395 24715
sda teo-3 0 1625 922694 40891 0.044 22925 17966
sda teo-3 1 1670 894480 27946 0.031 3045 24901
sda teo-3 2 1658 1009366 28514 0.028 8887 19627
sda teo-4 0 1674 977280 3605 0.004 3279 326
sda teo-4 1 1677 960990 2861 0.003 1058 1803
sda teo-4 2 1660 1016592 33894 0.033 7687 26207
sda teo-5 0 1678 1033470 16996 0.016 2163 14833
sda teo-5 1 1667 873077 25011 0.029 4393 20618
sda teo-5 2 1672 1042020 24589 0.024 3858 20731
sda teo-6 0 1680 962686 7255 0.008 1797 5458
sda teo-6 1 1682 1055472 28602 0.027 5081 23521
sda teo-6 2 1675 989244 19110 0.019 5620 13490
sda teo-7 0 1618 1001000 51178 0.051 26011 25167
sda teo-7 1 1666 1047310 34032 0.032 6520 27512
sda teo-7 2 1676 982046 28788 0.029 2836 25952
sda teo-8 0 1618 975930 52929 0.054 26386 26543
sda teo-8 1 1678 1023796 16182 0.016 2191 13991
sda teo-8 2 1633 981693 43520 0.044 18782 24738
sda teo-9 0 1661 1029494 32423 0.031 5855 26568
sda teo-9 1 1678 969400 17843 0.018 2763 15080
sda teo-9 2 1679 1055060 32288 0.031 5607 26681
mtdblock3 menu 0 180 278442 80547 0.289 65714 14833
mtdblock3 menu 1 167 379840 109168 0.287 93460 15708
mtdblock3 menu 2 155 373948 126262 0.338 107799 18463
mtdblock3 teo 0 256 663190 29333 0.044 4836 24497
mtdblock3 teo 1 257 808114 29394 0.036 2742 26652
mtdblock3 teo 2 257 456150 25812 0.057 2468 23344
mtdblock3 teo-m 0 257 624492 24678 0.040 2326 22352
mtdblock3 teo-m 1 256 734548 31165 0.042 4307 26858
mtdblock3 teo-m 2 257 812004 30479 0.038 3510 26969
mtdblock3 teo-1 0 253 759708 34146 0.045 10081 24065
mtdblock3 teo-1 1 254 730160 23105 0.032 7156 15949
mtdblock3 teo-1 2 253 721558 32848 0.046 10028 22820
mtdblock3 teo-2 0 257 666426 4948 0.007 782 4166
mtdblock3 teo-2 1 256 682046 25126 0.037 4886 20240
mtdblock3 teo-2 2 253 639950 36029 0.056 11074 24955
mtdblock3 teo-3 0 252 709122 37198 0.052 12603 24595
mtdblock3 teo-3 1 257 709680 30000 0.042 3670 26330
mtdblock3 teo-3 2 254 540408 32696 0.061 9035 23661
mtdblock3 teo-4 0 256 442356 28281 0.064 3785 24496
mtdblock3 teo-4 1 254 588362 16934 0.029 5663 11271
mtdblock3 teo-4 2 257 628776 28300 0.045 3667 24633
mtdblock3 teo-5 0 250 762594 43008 0.056 16752 26256
mtdblock3 teo-5 1 256 586098 29744 0.051 4035 25709
mtdblock3 teo-5 2 262 903736 375 0.000 139 236
mtdblock3 teo-6 0 250 795274 42742 0.054 17160 25582
mtdblock3 teo-6 1 256 800172 28195 0.035 4016 24179
mtdblock3 teo-6 2 257 523365 26260 0.050 2788 23472
mtdblock3 teo-7 0 261 944626 31616 0.033 5090 26526
mtdblock3 teo-7 1 258 647048 28997 0.045 2728 26269
mtdblock3 teo-7 2 260 901858 18521 0.021 3698 14823
mtdblock3 teo-8 0 256 914076 27272 0.030 6844 20428
mtdblock3 teo-8 1 256 809113 31696 0.039 4672 27024
mtdblock3 teo-8 2 252 798396 40223 0.050 13270 26953
mtdblock3 teo-9 0 259 726886 30515 0.042 4002 26513
mtdblock3 teo-9 1 262 892854 9411 0.011 1686 7725
mtdblock3 teo-9 2 259 909528 16583 0.018 3054 13529
nullb0 menu 0 107364 86086 20668 0.240 2604 18064
nullb0 menu 1 107701 85072 20251 0.238 2808 17443
nullb0 menu 2 107976 85178 20591 0.242 3063 17528
nullb0 teo 0 105758 84464 28872 0.342 3429 25443
nullb0 teo 1 106892 87122 30255 0.347 3722 26533
nullb0 teo 2 107080 83374 27911 0.335 2918 24993
nullb0 teo-m 0 107702 88337 29824 0.338 2956 26868
nullb0 teo-m 1 108218 88130 29710 0.337 3038 26672
nullb0 teo-m 2 106993 86866 30033 0.346 3664 26369
nullb0 teo-1 0 106416 82936 23635 0.285 3186 20449
nullb0 teo-1 1 107570 87425 27682 0.317 3220 24462
nullb0 teo-1 2 107832 75656 19434 0.257 2775 16659
nullb0 teo-2 0 106320 88182 28603 0.324 3213 25390
nullb0 teo-2 1 107316 79968 21643 0.271 3081 18562
nullb0 teo-2 2 108007 84018 25244 0.300 3034 22210
nullb0 teo-3 0 106989 85160 24558 0.288 3601 20957
nullb0 teo-3 1 107325 83582 24994 0.299 2963 22031
nullb0 teo-3 2 107063 77066 18978 0.246 2913 16065
nullb0 teo-4 0 107990 86156 25012 0.290 3404 21608
nullb0 teo-4 1 107352 75750 15096 0.199 2671 12425
nullb0 teo-4 2 107685 87644 26047 0.297 3825 22222
nullb0 teo-5 0 106687 87788 13428 0.153 2937 10491
nullb0 teo-5 1 107820 86442 25703 0.297 3686 22017
nullb0 teo-5 2 108319 81498 22895 0.281 2713 20182
nullb0 teo-6 0 107784 84364 24239 0.287 3459 20780
nullb0 teo-6 1 108762 83854 24017 0.286 3289 20728
nullb0 teo-6 2 106024 88238 27778 0.315 3253 24525
nullb0 teo-7 0 106041 86354 30076 0.348 3420 26656
nullb0 teo-7 1 108039 79564 26805 0.337 2924 23881
nullb0 teo-7 2 107389 88264 30713 0.348 3156 27557
nullb0 teo-8 0 106465 87924 29679 0.338 2958 26721
nullb0 teo-8 1 107215 81348 27038 0.332 2916 24122
nullb0 teo-8 2 108238 87070 29284 0.336 2971 26313
nullb0 teo-9 0 107220 84508 28894 0.342 3411 25483
nullb0 teo-9 1 107400 86920 29945 0.345 3579 26366
nullb0 teo-9 2 107845 87686 30439 0.347 3784 26655
mapper/dm-slow menu 0 19 85216 20060 0.235 2478 17582
mapper/dm-slow teo 0 19 86454 28959 0.335 2860 26099
mapper/dm-slow teo-m 0 19 84740 28056 0.331 2582 25474
mapper/dm-slow teo-1 0 19 88698 28945 0.326 3210 25735
mapper/dm-slow teo-2 0 19 83138 24326 0.293 2589 21737
mapper/dm-slow teo-3 0 19 86416 27274 0.316 2998 24276
mapper/dm-slow teo-4 0 19 89286 29369 0.329 3151 26218
mapper/dm-slow teo-5 0 19 88516 29213 0.330 3163 26050
mapper/dm-slow teo-6 0 19 87896 24675 0.281 3113 21562
mapper/dm-slow teo-7 0 19 83986 27976 0.333 2788 25188
mapper/dm-slow teo-8 0 19 88072 29667 0.337 2962 26705
mapper/dm-slow teo-9 0 19 87260 29428 0.337 2923 26505
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-15 14:46 ` Christian Loehle
@ 2025-01-15 15:54 ` Rafael J. Wysocki
2025-01-15 19:20 ` Christian Loehle
0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-15 15:54 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
Artem Bityutskiy
On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/13/25 18:36, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Since constraint_idx may be 0, the candidate state index may change to 0
> > after assigning constraint_idx to it, so first check if it is greater
> > than constraint_idx (and update it if so) and then check it against 0.
>
> So the reason I've left this where it was is because the prev_intercept_idx
> was supposed to query the sleep length if we're in an majority-intercept
> period and then it makes sense to query the sleep length (to detect such
> a period being over).
> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
> work at all, why are we querying the sleep length then?
In case the constraint is different next time and it's better to know
the sleep length to properly classify the wakeup.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > This is a rebased variant of
> >
> > https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
> >
> > ---
> > drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -428,6 +428,14 @@
> > break;
> > }
> > }
> > +
> > + /*
> > + * If there is a latency constraint, it may be necessary to select an
> > + * idle state shallower than the current candidate one.
> > + */
> > + if (idx > constraint_idx)
> > + idx = constraint_idx;
> > +
> > if (!idx && prev_intercept_idx) {
> > /*
> > * We have to query the sleep length here otherwise we don't
> > @@ -439,13 +447,6 @@
> > }
> >
> > /*
> > - * If there is a latency constraint, it may be necessary to select an
> > - * idle state shallower than the current candidate one.
> > - */
> > - if (idx > constraint_idx)
> > - idx = constraint_idx;
> > -
> > - /*
>
> We could leave this here and just do goto end;?
Why would this be better?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-15 15:54 ` Rafael J. Wysocki
@ 2025-01-15 19:20 ` Christian Loehle
2025-01-15 20:48 ` Rafael J. Wysocki
0 siblings, 1 reply; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 19:20 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
Artem Bityutskiy
On 1/15/25 15:54, Rafael J. Wysocki wrote:
> On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> On 1/13/25 18:36, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Since constraint_idx may be 0, the candidate state index may change to 0
>>> after assigning constraint_idx to it, so first check if it is greater
>>> than constraint_idx (and update it if so) and then check it against 0.
>>
>> So the reason I've left this where it was is because the prev_intercept_idx
>> was supposed to query the sleep length if we're in an majority-intercept
>> period and then it makes sense to query the sleep length (to detect such
>> a period being over).
>> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
>> work at all, why are we querying the sleep length then?
>
> In case the constraint is different next time and it's better to know
> the sleep length to properly classify the wakeup.
I would hope constraints change nowhere near as frequently as
idle entry / exit happen, is your experience different?
>
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>>
>>> This is a rebased variant of
>>>
>>> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
>>>
>>> ---
>>> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
>>> 1 file changed, 8 insertions(+), 7 deletions(-)
>>>
>>> --- a/drivers/cpuidle/governors/teo.c
>>> +++ b/drivers/cpuidle/governors/teo.c
>>> @@ -428,6 +428,14 @@
>>> break;
>>> }
>>> }
>>> +
>>> + /*
>>> + * If there is a latency constraint, it may be necessary to select an
>>> + * idle state shallower than the current candidate one.
>>> + */
>>> + if (idx > constraint_idx)
>>> + idx = constraint_idx;
>>> +
>>> if (!idx && prev_intercept_idx) {
>>> /*
>>> * We have to query the sleep length here otherwise we don't
>>> @@ -439,13 +447,6 @@
>>> }
>>>
>>> /*
>>> - * If there is a latency constraint, it may be necessary to select an
>>> - * idle state shallower than the current candidate one.
>>> - */
>>> - if (idx > constraint_idx)
>>> - idx = constraint_idx;
>>> -
>>> - /*
>>
>> We could leave this here and just do goto end;?
>
> Why would this be better?
Saves querying the sleep length in case of constraint_idx == 0, i.e.
qos request to be very latency-sensitive and us actually adding latency
here.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 5/9] cpuidle: teo: Clarify two code comments
2025-01-13 18:41 ` [PATCH v1 5/9] cpuidle: teo: Clarify two code comments Rafael J. Wysocki
@ 2025-01-15 19:43 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 19:43 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:41, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Rewrite two code comments suposed to explain its behavior that are too
s/suposed/supposed
> concise or not sufficiently clear.
>
> No functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -154,9 +154,10 @@
>
> if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
> /*
> - * One of the safety nets has triggered or the wakeup was close
> - * enough to the closest timer event expected at the idle state
> - * selection time to be discarded.
> + * This causes the wakeup to be counted as a hit regardless of
regardless of twice.
> + * regardless of the real idle duration which doesn't need to be
> + * computed because the wakeup has been close enough to an
> + * anticipated timer.
> */
> measured_ns = U64_MAX;
> } else {
> @@ -302,8 +303,13 @@
>
> cpu_data->time_span_ns = local_clock();
> /*
> - * Set the expected sleep length to infinity in case of an early
> - * return.
> + * Set the sleep length to infitity in case the invocation of
s/infitity/infinity
> + * tick_nohz_get_sleep_length() below is skipped, in which case it won't
> + * be known whether or not the subsequent wakeup is caused by a timer.
> + * It is generally fine to count the wakeup as an intercept then, except
> + * for the cases when the CPU is mostly woken up by timers and there may
> + * be opportunities to ask for a deeper idle state when no imminent
> + * timers are scheduled which may be missed.
With the above typo fixes.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0
2025-01-13 18:39 ` [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0 Rafael J. Wysocki
@ 2025-01-15 19:44 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 19:44 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:39, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> There are two candidate state index checks against 0 in teo_select()
> that need not be separate, so combine them and update comments around
> them.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a rebased variant of
>
> https://lore.kernel.org/linux-pm/2296767.iZASKD2KPV@rjwysocki.net/
>
> ---
> drivers/cpuidle/governors/teo.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -436,24 +436,19 @@
> if (idx > constraint_idx)
> idx = constraint_idx;
>
> - if (!idx && prev_intercept_idx) {
> - /*
> - * We have to query the sleep length here otherwise we don't
> - * know after wakeup if our guess was correct.
> - */
> - duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> - cpu_data->sleep_length_ns = duration_ns;
> + if (!idx) {
> + if (prev_intercept_idx) {
> + /*
> + * Query the sleep length to be able to count the wakeup
> + * as a hit if it is caused by a timer.
> + */
> + duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> + cpu_data->sleep_length_ns = duration_ns;
> + }
> goto out_tick;
> }
>
> /*
> - * Skip the timers check if state 0 is the current candidate one,
> - * because an immediate non-timer wakeup is expected in that case.
> - */
> - if (!idx)
> - goto out_tick;
> -
> - /*
> * If state 0 is a polling one, check if the target residency of
> * the current candidate state is low enough and skip the timers
> * check in that case too.
>
>
>
This patch relies on the previous one, but given that you'll probably
convince me on that:
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx
2025-01-13 18:40 ` [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx Rafael J. Wysocki
@ 2025-01-15 19:46 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-15 19:46 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:40, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Local variable prev_intercept_idx in teo_select() is redundant becase
s/becase/because
> it cannot be 0 when candidate state index is 0.
>
> The prev_intercept_idx value is the index of the deepest enabled idle
> state, so if it is 0, state 0 is the deepest enabled idle state, in
> which case it must be the only enabled idle state, but then teo_select()
> would have returned early before initializing prev_intercept_idx.
>
> Thus prev_intercept_idx must be nonzero and the check of it against 0
> always passes, so it can be dropped altogether.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -292,7 +292,6 @@
> unsigned int hit_sum = 0;
> int constraint_idx = 0;
> int idx0 = 0, idx = -1;
> - int prev_intercept_idx;
> s64 duration_ns;
> int i;
>
> @@ -370,7 +369,6 @@
> * all of the deeper states, a shallower idle state is likely to be a
> * better choice.
> */
> - prev_intercept_idx = idx;
> if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> int first_suitable_idx = idx;
>
> @@ -437,14 +435,11 @@
> idx = constraint_idx;
>
> if (!idx) {
> - if (prev_intercept_idx) {
> - /*
> - * Query the sleep length to be able to count the wakeup
> - * as a hit if it is caused by a timer.
> - */
> - duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> - cpu_data->sleep_length_ns = duration_ns;
> - }
> + /*
> + * Query the sleep length to be able to count the wakeup as a
> + * hit if it is caused by a timer.
> + */
> + cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
> goto out_tick;
Just like the previous one:
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-15 19:20 ` Christian Loehle
@ 2025-01-15 20:48 ` Rafael J. Wysocki
2025-01-15 21:10 ` Rafael J. Wysocki
0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-15 20:48 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML,
Daniel Lezcano, Artem Bityutskiy
On Wed, Jan 15, 2025 at 8:20 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/15/25 15:54, Rafael J. Wysocki wrote:
> > On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
> > <christian.loehle@arm.com> wrote:
> >>
> >> On 1/13/25 18:36, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >>> Since constraint_idx may be 0, the candidate state index may change to 0
> >>> after assigning constraint_idx to it, so first check if it is greater
> >>> than constraint_idx (and update it if so) and then check it against 0.
> >>
> >> So the reason I've left this where it was is because the prev_intercept_idx
> >> was supposed to query the sleep length if we're in an majority-intercept
> >> period and then it makes sense to query the sleep length (to detect such
> >> a period being over).
> >> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
> >> work at all, why are we querying the sleep length then?
> >
> > In case the constraint is different next time and it's better to know
> > the sleep length to properly classify the wakeup.
>
> I would hope constraints change nowhere near as frequently as
> idle entry / exit happen, is your experience different?
They don't, but they may change at any time and it is kind of good to
have history in case this happens.
> >
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>> ---
> >>>
> >>> This is a rebased variant of
> >>>
> >>> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
> >>>
> >>> ---
> >>> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> >>> 1 file changed, 8 insertions(+), 7 deletions(-)
> >>>
> >>> --- a/drivers/cpuidle/governors/teo.c
> >>> +++ b/drivers/cpuidle/governors/teo.c
> >>> @@ -428,6 +428,14 @@
> >>> break;
> >>> }
> >>> }
> >>> +
> >>> + /*
> >>> + * If there is a latency constraint, it may be necessary to select an
> >>> + * idle state shallower than the current candidate one.
> >>> + */
> >>> + if (idx > constraint_idx)
> >>> + idx = constraint_idx;
> >>> +
> >>> if (!idx && prev_intercept_idx) {
> >>> /*
> >>> * We have to query the sleep length here otherwise we don't
> >>> @@ -439,13 +447,6 @@
> >>> }
> >>>
> >>> /*
> >>> - * If there is a latency constraint, it may be necessary to select an
> >>> - * idle state shallower than the current candidate one.
> >>> - */
> >>> - if (idx > constraint_idx)
> >>> - idx = constraint_idx;
> >>> -
> >>> - /*
> >>
> >> We could leave this here and just do goto end;?
> >
> > Why would this be better?
>
> Saves querying the sleep length in case of constraint_idx == 0, i.e.
> qos request to be very latency-sensitive and us actually adding latency
> here.
Fair enough, but before patch [7/9] leaving it where it is doesn't
really cause it to skip the sleep length check unless state 0 is
"polling".
After patch [7/9] it is possible to add a constraint_idx check against
0 to the "goto out_tick" condition before the
tick_nohz_get_sleep_length() call, that is
if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
(2 * cpu_data->short_idle >= cpu_data->total || !constraint_idx))
goto out_tick;
but that would be a separate patch if you will.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-15 20:48 ` Rafael J. Wysocki
@ 2025-01-15 21:10 ` Rafael J. Wysocki
2025-01-16 12:22 ` Rafael J. Wysocki
0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-15 21:10 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
Artem Bityutskiy
On Wed, Jan 15, 2025 at 9:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jan 15, 2025 at 8:20 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
> >
> > On 1/15/25 15:54, Rafael J. Wysocki wrote:
> > > On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
> > > <christian.loehle@arm.com> wrote:
> > >>
> > >> On 1/13/25 18:36, Rafael J. Wysocki wrote:
> > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >>>
> > >>> Since constraint_idx may be 0, the candidate state index may change to 0
> > >>> after assigning constraint_idx to it, so first check if it is greater
> > >>> than constraint_idx (and update it if so) and then check it against 0.
> > >>
> > >> So the reason I've left this where it was is because the prev_intercept_idx
> > >> was supposed to query the sleep length if we're in an majority-intercept
> > >> period and then it makes sense to query the sleep length (to detect such
> > >> a period being over).
> > >> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
> > >> work at all, why are we querying the sleep length then?
> > >
> > > In case the constraint is different next time and it's better to know
> > > the sleep length to properly classify the wakeup.
> >
> > I would hope constraints change nowhere near as frequently as
> > idle entry / exit happen, is your experience different?
>
> They don't, but they may change at any time and it is kind of good to
> have history in case this happens.
>
> > >
> > >>>
> > >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >>> ---
> > >>>
> > >>> This is a rebased variant of
> > >>>
> > >>> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
> > >>>
> > >>> ---
> > >>> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> > >>> 1 file changed, 8 insertions(+), 7 deletions(-)
> > >>>
> > >>> --- a/drivers/cpuidle/governors/teo.c
> > >>> +++ b/drivers/cpuidle/governors/teo.c
> > >>> @@ -428,6 +428,14 @@
> > >>> break;
> > >>> }
> > >>> }
> > >>> +
> > >>> + /*
> > >>> + * If there is a latency constraint, it may be necessary to select an
> > >>> + * idle state shallower than the current candidate one.
> > >>> + */
> > >>> + if (idx > constraint_idx)
> > >>> + idx = constraint_idx;
> > >>> +
> > >>> if (!idx && prev_intercept_idx) {
> > >>> /*
> > >>> * We have to query the sleep length here otherwise we don't
> > >>> @@ -439,13 +447,6 @@
> > >>> }
> > >>>
> > >>> /*
> > >>> - * If there is a latency constraint, it may be necessary to select an
> > >>> - * idle state shallower than the current candidate one.
> > >>> - */
> > >>> - if (idx > constraint_idx)
> > >>> - idx = constraint_idx;
> > >>> -
> > >>> - /*
> > >>
> > >> We could leave this here and just do goto end;?
> > >
> > > Why would this be better?
> >
> > Saves querying the sleep length in case of constraint_idx == 0, i.e.
> > qos request to be very latency-sensitive and us actually adding latency
> > here.
>
> Fair enough, but before patch [7/9] leaving it where it is doesn't
> really cause it to skip the sleep length check unless state 0 is
> "polling".
>
> After patch [7/9] it is possible to add a constraint_idx check against
> 0 to the "goto out_tick" condition before the
> tick_nohz_get_sleep_length() call, that is
>
> if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> (2 * cpu_data->short_idle >= cpu_data->total || !constraint_idx))
> goto out_tick;
Or even
if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
(2 * cpu_data->short_idle >= cpu_data->total || latency_req <
A_SMALL_VALUE))
goto out_tick;
for that matter.
> but that would be a separate patch if you will.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-15 21:10 ` Rafael J. Wysocki
@ 2025-01-16 12:22 ` Rafael J. Wysocki
2025-01-16 13:26 ` Christian Loehle
0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-16 12:22 UTC (permalink / raw)
To: Christian Loehle, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On Wednesday, January 15, 2025 10:10:11 PM CET Rafael J. Wysocki wrote:
> On Wed, Jan 15, 2025 at 9:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 15, 2025 at 8:20 PM Christian Loehle
> > <christian.loehle@arm.com> wrote:
> > >
> > > On 1/15/25 15:54, Rafael J. Wysocki wrote:
> > > > On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
> > > > <christian.loehle@arm.com> wrote:
> > > >>
> > > >> On 1/13/25 18:36, Rafael J. Wysocki wrote:
> > > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >>>
> > > >>> Since constraint_idx may be 0, the candidate state index may change to 0
> > > >>> after assigning constraint_idx to it, so first check if it is greater
> > > >>> than constraint_idx (and update it if so) and then check it against 0.
> > > >>
> > > >> So the reason I've left this where it was is because the prev_intercept_idx
> > > >> was supposed to query the sleep length if we're in an majority-intercept
> > > >> period and then it makes sense to query the sleep length (to detect such
> > > >> a period being over).
> > > >> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
> > > >> work at all, why are we querying the sleep length then?
> > > >
> > > > In case the constraint is different next time and it's better to know
> > > > the sleep length to properly classify the wakeup.
> > >
> > > I would hope constraints change nowhere near as frequently as
> > > idle entry / exit happen, is your experience different?
> >
> > They don't, but they may change at any time and it is kind of good to
> > have history in case this happens.
> >
> > > >
> > > >>>
> > > >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >>> ---
> > > >>>
> > > >>> This is a rebased variant of
> > > >>>
> > > >>> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
> > > >>>
> > > >>> ---
> > > >>> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> > > >>> 1 file changed, 8 insertions(+), 7 deletions(-)
> > > >>>
> > > >>> --- a/drivers/cpuidle/governors/teo.c
> > > >>> +++ b/drivers/cpuidle/governors/teo.c
> > > >>> @@ -428,6 +428,14 @@
> > > >>> break;
> > > >>> }
> > > >>> }
> > > >>> +
> > > >>> + /*
> > > >>> + * If there is a latency constraint, it may be necessary to select an
> > > >>> + * idle state shallower than the current candidate one.
> > > >>> + */
> > > >>> + if (idx > constraint_idx)
> > > >>> + idx = constraint_idx;
> > > >>> +
> > > >>> if (!idx && prev_intercept_idx) {
> > > >>> /*
> > > >>> * We have to query the sleep length here otherwise we don't
> > > >>> @@ -439,13 +447,6 @@
> > > >>> }
> > > >>>
> > > >>> /*
> > > >>> - * If there is a latency constraint, it may be necessary to select an
> > > >>> - * idle state shallower than the current candidate one.
> > > >>> - */
> > > >>> - if (idx > constraint_idx)
> > > >>> - idx = constraint_idx;
> > > >>> -
> > > >>> - /*
> > > >>
> > > >> We could leave this here and just do goto end;?
> > > >
> > > > Why would this be better?
> > >
> > > Saves querying the sleep length in case of constraint_idx == 0, i.e.
> > > qos request to be very latency-sensitive and us actually adding latency
> > > here.
> >
> > Fair enough, but before patch [7/9] leaving it where it is doesn't
> > really cause it to skip the sleep length check unless state 0 is
> > "polling".
> >
> > After patch [7/9] it is possible to add a constraint_idx check against
> > 0 to the "goto out_tick" condition before the
> > tick_nohz_get_sleep_length() call, that is
> >
> > if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> > (2 * cpu_data->short_idle >= cpu_data->total || !constraint_idx))
> > goto out_tick;
>
> Or even
>
> if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> (2 * cpu_data->short_idle >= cpu_data->total || latency_req <
> A_SMALL_VALUE))
> goto out_tick;
>
> for that matter.
>
> > but that would be a separate patch if you will.
So for completeness, it would be a patch like the one below, on top of the [7/9].
---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH v1] cpuidle: teo: Skip sleep length computation for low latency constraints
If the idle state exit latency constraint is sufficiently low, it
is better to avoid the additional latency related to calling
tick_nohz_get_sleep_length(). It is also not necessary to compute
the sleep length in that case because shallow idle state selection
will be forced then regardless of the recent wakeup history.
Accordingly, skip the sleep length computation and subsequent
checks of the exit latency constraint is low enough.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -106,6 +106,12 @@
#include "gov.h"
/*
+ * Idle state exit latency threshold used for deciding whether or not to check
+ * the time till the closest expected timer event.
+ */
+#define LATENCY_THRESHOLD_NS (RESIDENCY_THRESHOLD_NS / 2)
+
+/*
* The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
* is used for decreasing metrics on a regular basis.
*/
@@ -432,9 +438,14 @@
* duration falls into that range in the majority of cases, assume
* non-timer wakeups to be dominant and skip updating the sleep length
* to reduce latency.
+ *
+ * Also, if the latency constraint is sufficiently low, it will force
+ * shallow idle states regardless of the wakeup type, so the sleep
+ * length need not be known in that case.
*/
if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
- 2 * cpu_data->short_idle >= cpu_data->total)
+ (2 * cpu_data->short_idle >= cpu_data->total ||
+ latency_req < LATENCY_THRESHOLD_NS))
goto out_tick;
duration_ns = tick_nohz_get_sleep_length(&delta_tick);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-16 12:22 ` Rafael J. Wysocki
@ 2025-01-16 13:26 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-16 13:26 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/16/25 12:22, Rafael J. Wysocki wrote:
> On Wednesday, January 15, 2025 10:10:11 PM CET Rafael J. Wysocki wrote:
>> On Wed, Jan 15, 2025 at 9:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>
>>> On Wed, Jan 15, 2025 at 8:20 PM Christian Loehle
>>> <christian.loehle@arm.com> wrote:
>>>>
>>>> On 1/15/25 15:54, Rafael J. Wysocki wrote:
>>>>> On Wed, Jan 15, 2025 at 3:46 PM Christian Loehle
>>>>> <christian.loehle@arm.com> wrote:
>>>>>>
>>>>>> On 1/13/25 18:36, Rafael J. Wysocki wrote:
>>>>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>>>>
>>>>>>> Since constraint_idx may be 0, the candidate state index may change to 0
>>>>>>> after assigning constraint_idx to it, so first check if it is greater
>>>>>>> than constraint_idx (and update it if so) and then check it against 0.
>>>>>>
>>>>>> So the reason I've left this where it was is because the prev_intercept_idx
>>>>>> was supposed to query the sleep length if we're in an majority-intercept
>>>>>> period and then it makes sense to query the sleep length (to detect such
>>>>>> a period being over).
>>>>>> A constraint_idx == 0 scenario doesn't need the intercept-machinery to
>>>>>> work at all, why are we querying the sleep length then?
>>>>>
>>>>> In case the constraint is different next time and it's better to know
>>>>> the sleep length to properly classify the wakeup.
>>>>
>>>> I would hope constraints change nowhere near as frequently as
>>>> idle entry / exit happen, is your experience different?
>>>
>>> They don't, but they may change at any time and it is kind of good to
>>> have history in case this happens.
>>>
>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>>>> ---
>>>>>>>
>>>>>>> This is a rebased variant of
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
>>>>>>>
>>>>>>> ---
>>>>>>> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
>>>>>>> 1 file changed, 8 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> --- a/drivers/cpuidle/governors/teo.c
>>>>>>> +++ b/drivers/cpuidle/governors/teo.c
>>>>>>> @@ -428,6 +428,14 @@
>>>>>>> break;
>>>>>>> }
>>>>>>> }
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * If there is a latency constraint, it may be necessary to select an
>>>>>>> + * idle state shallower than the current candidate one.
>>>>>>> + */
>>>>>>> + if (idx > constraint_idx)
>>>>>>> + idx = constraint_idx;
>>>>>>> +
>>>>>>> if (!idx && prev_intercept_idx) {
>>>>>>> /*
>>>>>>> * We have to query the sleep length here otherwise we don't
>>>>>>> @@ -439,13 +447,6 @@
>>>>>>> }
>>>>>>>
>>>>>>> /*
>>>>>>> - * If there is a latency constraint, it may be necessary to select an
>>>>>>> - * idle state shallower than the current candidate one.
>>>>>>> - */
>>>>>>> - if (idx > constraint_idx)
>>>>>>> - idx = constraint_idx;
>>>>>>> -
>>>>>>> - /*
>>>>>>
>>>>>> We could leave this here and just do goto end;?
>>>>>
>>>>> Why would this be better?
>>>>
>>>> Saves querying the sleep length in case of constraint_idx == 0, i.e.
>>>> qos request to be very latency-sensitive and us actually adding latency
>>>> here.
>>>
>>> Fair enough, but before patch [7/9] leaving it where it is doesn't
>>> really cause it to skip the sleep length check unless state 0 is
>>> "polling".
>>>
>>> After patch [7/9] it is possible to add a constraint_idx check against
>>> 0 to the "goto out_tick" condition before the
>>> tick_nohz_get_sleep_length() call, that is
>>>
>>> if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
>>> (2 * cpu_data->short_idle >= cpu_data->total || !constraint_idx))
>>> goto out_tick;
>>
>> Or even
>>
>> if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
>> (2 * cpu_data->short_idle >= cpu_data->total || latency_req <
>> A_SMALL_VALUE))
>> goto out_tick;
>>
>> for that matter.
>>
>>> but that would be a separate patch if you will.
>
> So for completeness, it would be a patch like the one below, on top of the [7/9].
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH v1] cpuidle: teo: Skip sleep length computation for low latency constraints
>
> If the idle state exit latency constraint is sufficiently low, it
> is better to avoid the additional latency related to calling
> tick_nohz_get_sleep_length(). It is also not necessary to compute
> the sleep length in that case because shallow idle state selection
> will be forced then regardless of the recent wakeup history.
>
> Accordingly, skip the sleep length computation and subsequent
> checks of the exit latency constraint is low enough.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thank you, that makes sense.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
> drivers/cpuidle/governors/teo.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -106,6 +106,12 @@
> #include "gov.h"
>
> /*
> + * Idle state exit latency threshold used for deciding whether or not to check
> + * the time till the closest expected timer event.
> + */
> +#define LATENCY_THRESHOLD_NS (RESIDENCY_THRESHOLD_NS / 2)
> +
> +/*
> * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
> * is used for decreasing metrics on a regular basis.
> */
> @@ -432,9 +438,14 @@
> * duration falls into that range in the majority of cases, assume
> * non-timer wakeups to be dominant and skip updating the sleep length
> * to reduce latency.
> + *
> + * Also, if the latency constraint is sufficiently low, it will force
> + * shallow idle states regardless of the wakeup type, so the sleep
> + * length need not be known in that case.
> */
> if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> - 2 * cpu_data->short_idle >= cpu_data->total)
> + (2 * cpu_data->short_idle >= cpu_data->total ||
> + latency_req < LATENCY_THRESHOLD_NS))
> goto out_tick;
>
> duration_ns = tick_nohz_get_sleep_length(&delta_tick);
>
>
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks
2025-01-13 18:36 ` [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks Rafael J. Wysocki
2025-01-15 14:46 ` Christian Loehle
@ 2025-01-16 13:27 ` Christian Loehle
1 sibling, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-16 13:27 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:36, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since constraint_idx may be 0, the candidate state index may change to 0
> after assigning constraint_idx to it, so first check if it is greater
> than constraint_idx (and update it if so) and then check it against 0.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fine with me now, since the posted patch addresses my concern.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
>
> This is a rebased variant of
>
> https://lore.kernel.org/linux-pm/8476650.T7Z3S40VBb@rjwysocki.net/
>
> ---
> drivers/cpuidle/governors/teo.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -428,6 +428,14 @@
> break;
> }
> }
> +
> + /*
> + * If there is a latency constraint, it may be necessary to select an
> + * idle state shallower than the current candidate one.
> + */
> + if (idx > constraint_idx)
> + idx = constraint_idx;
> +
> if (!idx && prev_intercept_idx) {
> /*
> * We have to query the sleep length here otherwise we don't
> @@ -439,13 +447,6 @@
> }
>
> /*
> - * If there is a latency constraint, it may be necessary to select an
> - * idle state shallower than the current candidate one.
> - */
> - if (idx > constraint_idx)
> - idx = constraint_idx;
> -
> - /*
> * Skip the timers check if state 0 is the current candidate one,
> * because an immediate non-timer wakeup is expected in that case.
> */
>
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update
2025-01-15 14:52 ` [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Christian Loehle
@ 2025-01-20 8:17 ` Aboorva Devarajan
2025-01-20 16:26 ` Rafael J. Wysocki
2025-01-20 16:24 ` Rafael J. Wysocki
1 sibling, 1 reply; 32+ messages in thread
From: Aboorva Devarajan @ 2025-01-20 8:17 UTC (permalink / raw)
To: Christian Loehle, Rafael J. Wysocki
Cc: LKML, Daniel Lezcano, Artem Bityutskiy, Linux PM
On Wed, 2025-01-15 at 14:52 +0000, Christian Loehle wrote:
> On 1/13/25 18:32, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This supersedes
> >
> > https://lore.kernel.org/linux-pm/4953183.GXAFRqVoOG@rjwysocki.net/
> >
> > but because the majority of patches in it are new, I've decided to count
> > version numbers back from 1.
> >
> > This addresses a relatively recently added inconsistency in behavior of the teo
> > governor regarding the handling of very frequent wakeups handling (patch [7/9])
> > and makes some other changes that may be regarded as cleanups.
> >
> > Please review.
>
> Hi Rafael,
> that looks promising. I'll review the individual patches in detail now, but
> I let a few tests run overnight and can report that there's no significant
> behaviour change with the series on an arm64 (no polling state, rk3399), which
> is my expected result.
>
> I'll get something running on a system with a polling state as well.
> (I don't have a POWER system, so that will just be x86, adding Aboorva.)
Christian,
Thanks for adding me to the thread.
Rafael,
I did some tests with the patch on a Pseries Power10 system:
Here are the system details:
--------------------------------------------------------------------------------
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 6
Socket(s): 1
Physical sockets: 8
Physical chips: 1
Physical cores/chip: 10
--------------------------------------------------------------------------------
# cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 5:
Number of idle states: 2
Available idle states: snooze Shared Cede
snooze:
Flags/Description: snooze
Latency: 0
Usage: 1411724
Duration: 27481504
Shared Cede:
Flags/Description: Shared Cede
Latency: 10
Usage: 326573
Duration: 31098864616
--------------------------------------------------------------------------------
How to infer the results:
Above Diff (%) and Below Diff (%) represent the number of cpuidle misses,
indicating how frequently the selected cpuidle state was either too deep or
too shallow. So, these values should not be too high.
--------------------------------------------------------------------------------
The below test is done using a predictable timer and non-timer benchmark [1]:
--------------------------------------------------------------------------------
Menu Governor:
--------------------------------------------------------------------------------
With pipe wakeup (non-timer):
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 5.980 1656304 2 0.000121 0 0.000000
1 10.959 901972 1954 0.216636 0 0.000000
--------------------------------------------------------------------------------------------------------
2 15.726 243971 237112 97.188600 0 0.000000
3 20.813 232069 227258 97.926910 0 0.000000
4 30.896 209884 206492 98.383869 0 0.000000
5 40.991 216704 213642 98.587013 0 0.000000
6 51.002 195632 192963 98.635704 0 0.000000
7 61.014 163726 161506 98.644076 0 0.000000
8 71.006 140739 138809 98.628667 0 0.000000
9 81.008 123386 120725 97.843353 0 0.000000
10 101.020 98974 81235 82.077111 0 0.000000
--------------------------------------------------------------------------------------------------------
11 111.044 90018 1513 1.680775 12 0.013331
12 121.015 82704 189 0.228526 77 0.093103
13 131.028 76534 272 0.355398 321 0.419421
14 141.008 71610 698 0.974724 693 0.967742
15 151.021 66869 666 0.995977 656 0.981023
16 161.027 62709 611 0.974342 605 0.964774
17 171.033 59063 593 1.004013 593 1.004013
18 181.019 55819 571 1.022949 541 0.969204
19 191.016 52998 641 1.209480 628 1.184950
20 201.017 50353 551 1.094274 501 0.994975
21 251.054 40535 289 0.712964 398 0.981868
22 301.037 33966 252 0.741918 330 0.971560
23 351.038 29279 216 0.737730 294 1.004133
24 401.047 25765 190 0.737435 262 1.016883
25 451.060 23021 185 0.803614 187 0.812302
26 501.049 20831 150 0.720081 216 1.036916
27 1001.076 10951 77 0.703132 126 1.150580
With timer wakeup:
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 7.590 1310772 0 0.000000 0 0.000000
1 12.631 789377 780 0.098812 0 0.000000
2 21.791 458001 52321 11.423774 0 0.000000
3 22.648 440752 36 0.008168 0 0.000000
4 32.644 305983 0 0.000000 0 0.000000
5 42.646 234305 0 0.000000 0 0.000000
6 52.647 189858 2 0.001053 0 0.000000
7 62.649 159561 10 0.006267 0 0.000000
8 72.644 137643 5 0.003633 1 0.000727
9 82.666 120963 5 0.004133 0 0.000000
10 102.654 97442 3 0.003079 610 0.626013
11 145.805 69937 441 0.630568 1345 1.923159
12 156.057 64511 75 0.116259 439 0.680504
13 166.047 60765 215 0.353822 534 0.878795
14 175.894 57564 178 0.309221 687 1.193454
15 185.933 54471 255 0.468139 638 1.171265
16 195.975 51403 98 0.190650 212 0.412427
17 206.062 49281 174 0.353077 577 1.170837
18 216.188 46980 33 0.070243 571 1.215411
19 226.346 44879 30 0.066846 543 1.209920
20 236.353 43081 27 0.062673 516 1.197744
21 286.158 35782 5 0.013974 154 0.430384
22 336.730 30531 11 0.036029 266 0.871246
23 386.730 26722 18 0.067360 232 0.868198
24 436.770 23797 9 0.037820 196 0.823633
25 487.229 21359 13 0.060864 229 1.072148
26 537.375 19557 13 0.066472 259 1.324334
27 1037.871 10638 12 0.112803 127 1.193833
--------------------------------------------------------------------------------
Teo governor:
--------------------------------------------------------------------------------
With pipe wakeup (non-timer):
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 5.972 1657561 6 0.000362 0 0.000000
1 10.964 907279 0 0.000000 0 0.000000
2 15.977 623681 0 0.000000 0 0.000000
3 20.980 475385 1 0.000210 0 0.000000
4 30.981 322151 0 0.000000 0 0.000000
5 40.975 243749 0 0.000000 0 0.000000
6 50.977 195989 0 0.000000 0 0.000000
7 60.981 163876 0 0.000000 0 0.000000
8 70.978 140818 0 0.000000 0 0.000000
9 80.976 123460 0 0.000000 1 0.000810
10 100.970 99038 0 0.000000 16 0.016155
--------------------------------------------------------------------------------------------------------
11 111.027 106919 5388 5.039329 16873 15.781105
--------------------------------------------------------------------------------------------------------
12 121.017 83074 381 0.458627 443 0.533260
13 131.024 76588 323 0.421737 373 0.487021
14 141.027 71592 695 0.970779 695 0.970779
15 151.023 66874 669 1.000389 662 0.989921
16 161.026 62719 607 0.967809 595 0.948676
17 171.027 59064 581 0.983679 574 0.971827
18 181.020 55817 561 1.005070 532 0.953115
19 191.020 52883 523 0.988976 446 0.843371
20 201.025 50509 670 1.326496 645 1.277000
21 251.037 40544 280 0.690608 409 1.008781
22 301.033 33988 263 0.773803 349 1.026833
23 351.036 29285 221 0.754653 299 1.021001
24 401.042 25777 203 0.787524 272 1.055204
25 451.040 23029 174 0.755569 238 1.033480
26 501.048 20838 157 0.753431 224 1.074959
27 1001.097 10949 79 0.721527 134 1.223856
With timer wakeup:
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 7.541 1319205 0 0.000000 0 0.000000
1 12.546 794464 0 0.000000 0 0.000000
2 17.540 568954 0 0.000000 0 0.000000
3 22.572 442307 0 0.000000 0 0.000000
4 32.583 306443 0 0.000000 1 0.000326
5 42.597 233238 0 0.000000 0 0.000000
6 52.587 190067 0 0.000000 0 0.000000
7 62.590 159714 0 0.000000 1 0.000626
8 72.574 137755 0 0.000000 2 0.001452
9 82.581 121081 0 0.000000 0 0.000000
10 102.589 97491 0 0.000000 1912 1.961207
11 146.385 68906 47 0.068209 599 0.869300
12 156.548 64565 86 0.133199 670 1.037714
13 166.588 60562 100 0.165120 518 0.855322
14 176.676 57264 263 0.459276 642 1.121123
15 186.563 54262 293 0.539973 601 1.107589
16 195.986 51668 192 0.371603 526 1.018038
17 206.860 49028 97 0.197846 564 1.150363
18 216.899 46669 27 0.057854 460 0.985665
19 227.016 44528 22 0.049407 367 0.824201
20 237.055 42883 28 0.065294 507 1.182287
21 286.998 35665 9 0.025235 283 0.793495
22 337.410 30439 7 0.022997 264 0.867308
23 387.522 26652 18 0.067537 251 0.941768
24 437.570 23742 8 0.033696 221 0.930840
25 487.804 21293 10 0.046964 94 0.441460
26 537.884 19505 7 0.035888 243 1.245834
27 1038.863 10633 4 0.037619 135 1.269632
--------------------------------------------------------------------------------
Teo Governor with patch
--------------------------------------------------------------------------------
With pipe wakeup (non-timer):
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 5.959 1661754 5 0.000301 0 0.000000
1 10.963 907497 2 0.000220 0 0.000000
2 15.968 623957 2 0.000321 0 0.000000
3 20.970 475574 2 0.000421 0 0.000000
4 30.974 321718 2 0.000622 0 0.000000
5 40.974 243714 2 0.000821 0 0.000000
6 50.983 195931 2 0.001021 0 0.000000
7 60.974 163876 2 0.001220 1 0.000610
8 70.973 140810 2 0.001420 1 0.000710
9 80.988 123420 1 0.000810 4 0.003241
10 100.994 99014 2 0.002020 20 0.020199
--------------------------------------------------------------------------------------------------------
11 111.023 135597 11864 8.749456 45561 33.600301 => This is observed even
-------------------------------------------------------------------------------------------------------- without the patch,
12 121.035 82948 348 0.419540 352 0.424362 when the sleep interval
13 131.019 76342 82 0.107411 48 0.062875 is almost equal to the
14 141.028 70948 70 0.098664 57 0.080341 residency time of state1.
15 151.023 66278 81 0.122212 69 0.104107
16 161.021 62146 65 0.104592 51 0.082065
17 171.023 58509 64 0.109385 47 0.080330
18 181.026 55301 64 0.115730 48 0.086798
19 191.033 52407 67 0.127846 45 0.085866
20 201.024 49803 52 0.104411 48 0.096380
21 251.042 39911 39 0.097717 45 0.112751
22 301.040 33302 29 0.087082 40 0.120113
23 351.045 28572 37 0.129497 34 0.118998
24 401.057 25005 20 0.079984 27 0.107978
25 451.055 22246 21 0.094399 26 0.116875
26 501.053 20031 14 0.069892 25 0.124807
27 1001.099 10055 7 0.069617 15 0.149180
With timer wakeup:
--------------------------------------------------------------------------------------------------------
Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
--------------------------------------------------------------------------------------------------------
0 7.566 1314872 0 0.000000 0 0.000000
1 12.553 794091 0 0.000000 0 0.000000
2 17.573 567627 0 0.000000 0 0.000000
3 22.631 441084 0 0.000000 0 0.000000
4 32.633 306095 0 0.000000 0 0.000000
5 42.631 234377 0 0.000000 0 0.000000
6 52.634 189899 0 0.000000 0 0.000000
7 62.642 159572 0 0.000000 0 0.000000
8 72.645 137619 0 0.000000 1 0.000727
9 82.616 121037 0 0.000000 1 0.000826
10 102.636 97423 0 0.000000 1047 1.074695
11 145.823 69245 43 0.062098 680 0.982020
12 155.912 64546 90 0.139435 478 0.740557
13 166.103 60709 219 0.360737 501 0.825248
14 176.036 57483 196 0.340970 639 1.111633
15 186.043 54448 250 0.459154 623 1.144211
16 195.552 51428 124 0.241114 230 0.447227
17 205.990 49270 141 0.286178 527 1.069616
18 216.300 46931 39 0.083101 545 1.161279
19 226.288 44884 23 0.051243 520 1.158542
20 236.372 43056 20 0.046451 493 1.145020
21 285.985 35724 1 0.002799 119 0.333109
22 336.636 30526 11 0.036035 262 0.858285
23 386.522 26772 17 0.063499 218 0.814284
24 436.749 23857 26 0.108983 208 0.871862
25 487.204 21358 9 0.042139 240 1.123701
26 537.312 19530 7 0.035842 235 1.203277
27 1038.147 10610 2 0.018850 119 1.121583
--------------------------------------------------------------------------------
I also did some tests with postgres (pgbench) - with the patch:
+---------------------------+--------------------+------------------+
| Metric | Shared Cede | Snooze |
+---------------------------+--------------------+------------------+
| Total Usage Difference | 119,453 | 20,472,846 |
| Total Time Difference | 502.79 seconds | 324.90 seconds |
| Total Above Difference | 74,500 (0.36%) | 0 |
| Total Below Difference | 0 | 336,703 (1.64%) |
+---------------------------+--------------------+------------------+
% Above Diff = 0.36% of total usage
% Below Diff = 1.64% of total usage
Using both the deterministic micro-benchmark and pgbench, I observed that the
teo governor with the patch as anticipated does not cause any noticable increase in the cpuidle
state prediction miss on PowerPC (pseries) (% above and below diff).
So, for the entire series:
Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
[1]: https://github.com/AboorvaDevarajan/linux-utils/tree/main/cpuidle/cpuidle_wakeup
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management
2025-01-13 18:45 ` [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management Rafael J. Wysocki
@ 2025-01-20 11:27 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-20 11:27 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:45, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Replace the tick_hits metric with a new tick_intercepts one that can be
> used directly when deciding whether or not to stop the scheduler tick
> and update the governor functional description accordingly.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent
2025-01-13 18:48 ` [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent Rafael J. Wysocki
@ 2025-01-20 12:08 ` Christian Loehle
2025-01-20 16:24 ` Rafael J. Wysocki
0 siblings, 1 reply; 32+ messages in thread
From: Christian Loehle @ 2025-01-20 12:08 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:48, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
Title has a typo: s/is/if
> Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length()
> call in some cases") attempted to reduce the governor overhead in some
> cases by making it avoid obtaining the sleep length (the time till the
> next timer event) which may be costly.
>
> Among other things, after the above commit, tick_nohz_get_sleep_length()
> was not called any more when idle state 0 was to be returned, which
> turned out to be problematic and the previous behavior in that respect
> was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non-
> existent intercepts").
>
> However, commit 6da8f9ba5a87 also caused the governor to avoid calling
> tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling"
> one (that is, it is not really an idle state, but a loop continuously
> executed by the CPU) when the target residency of the idle state to be
> returned was low enough, so there was no practical need to refine the
> idle state selection in any way. This change was not removed by the
> other commit, so now on systems where idle state 0 is a "polling" one,
> tick_nohz_get_sleep_length() is called when idle state 0 is to be
> returned, but it is not called when a deeper idle state with
> sufficiently low target residency is to be returned. That is arguably
> confusing and inconsistent.
>
> Moreover, there is no specific reason why the behavior in question
> should depend on whether or not idle state 0 is a "polling" one.
>
> One way to address this would be to make the governor always call
> tick_nohz_get_sleep_length() to obtain the sleep length, but that would
> effectively mean reverting commit 6da8f9ba5a87 and restoring the latency
> issue that was the reason for doing it. This approach is thus not
> particularly attractive.
>
> To address it differently, notice that if a CPU is woken up very often,
> this is not likely to be caused by timers in the first place (user space
> has a default timer slack of 50 us and there are relatively few timers
> with a deadline shorter than several microseconds in the kernel) and
> even if it were the case, the potential benefit from using a deep idle
> state would then be questionable for latency reasons. Therefore, if the
> majority of CPU wakeups occur within several microseconds, it can be
> assumed that all wakeups in that range are non-timer and the sleep
> length need not be determined.
>
> Accordingly, introduce a new metric for counting wakeups with the
> measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle
> state selection to skip the tick_nohz_get_sleep_length() invocation if
> idle state 0 has been selected or the target residency of the candidate
> idle state is below RESIDENCY_THRESHOLD_NS and the value of the new
> metric is at least 1/2 of the total event count.
>
> Since the above requires the measured idle duration to be determined
> every time, except for the cases when one of the safety nets has
> triggered in which the wakeup is counted as a hit in the deepest
> idle state idle residency range, update the handling of those cases
> to avoid skipping the idle duration computation when the CPU wakeup
> is "genuine".
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 58 ++++++++++++++++++++++++----------------
> 1 file changed, 36 insertions(+), 22 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -129,6 +129,7 @@
> * @state_bins: Idle state data bins for this CPU.
> * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
> * @tick_intercepts: "Intercepts" before TICK_NSEC.
> + * @short_idle: Wakeups after short idle periods.
> */
> struct teo_cpu {
> s64 time_span_ns;
> @@ -136,6 +137,7 @@
> struct teo_bin state_bins[CPUIDLE_STATE_MAX];
> unsigned int total;
> unsigned int tick_intercepts;
> + unsigned int short_idle;
Maybe call these short_idles?
> };
>
> static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
> @@ -152,12 +154,12 @@
> s64 target_residency_ns;
> u64 measured_ns;
>
> - if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
> + cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
> +
> + if (cpu_data->time_span_ns < 0) {
> /*
> - * This causes the wakeup to be counted as a hit regardless of
> - * regardless of the real idle duration which doesn't need to be
> - * computed because the wakeup has been close enough to an
> - * anticipated timer.
> + * If one of the safety nets has triggered, assume that this
> + * might have been a long sleep.
> */
> measured_ns = U64_MAX;
> } else {
> @@ -177,10 +179,14 @@
> * time, so take 1/2 of the exit latency as a very rough
> * approximation of the average of it.
> */
> - if (measured_ns >= lat_ns)
> + if (measured_ns >= lat_ns) {
> measured_ns -= lat_ns / 2;
> - else
> + if (measured_ns < RESIDENCY_THRESHOLD_NS)
> + cpu_data->short_idle += PULSE;
> + } else {
> measured_ns /= 2;
> + cpu_data->short_idle += PULSE;
> + }
> }
>
> cpu_data->total = 0;
> @@ -419,27 +425,35 @@
> if (idx > constraint_idx)
> idx = constraint_idx;
>
> - if (!idx) {
> - /*
> - * Query the sleep length to be able to count the wakeup as a
> - * hit if it is caused by a timer.
> - */
> - cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
> - goto out_tick;
> - }
> -
> /*
> - * If state 0 is a polling one, check if the target residency of
> - * the current candidate state is low enough and skip the timers
> - * check in that case too.
> + * If either the candidate state is state 0 or its target residency is
> + * low enough, there is basically nothing more to do, but if the sleep
> + * length is not updated, the subsequent wakeup will be counted as an
> + * "intercept" which may be problematic in the cases when timer wakeups
> + * are dominant. Namely, it may effectively prevent deeper idle states
> + * from being selected at one point even if no imminent timers are
> + * scheduled.
> + *
> + * However, frequent timers in the RESIDENCY_THRESHOLD_NS range on one
> + * CPU are unlikely (user space has a default 50 us slack value for
> + * hrtimers and there are relatively few timers with a lower deadline
> + * value in the kernel), and even if they did happene, the potential
s/happene/happen
> + * benefit from using a deep idle state in that case would be
> + * questionable anyway for latency reasons. Thus if the measured idle
> + * duration falls into that range in the majority of cases, assume
> + * non-timer wakeups to be dominant and skip updating the sleep length
> + * to reduce latency.
> */
> - if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
> - drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
> + if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> + 2 * cpu_data->short_idle >= cpu_data->total)
> goto out_tick;
>
> duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> cpu_data->sleep_length_ns = duration_ns;
>
> + if (!idx)
> + goto out_tick;
> +
> /*
> * If the closest expected timer is before the target residency of the
> * candidate state, a shallower one needs to be found.
> @@ -501,7 +515,7 @@
> if (dev->poll_time_limit ||
> (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
> dev->poll_time_limit = false;
> - cpu_data->time_span_ns = cpu_data->sleep_length_ns;
> + cpu_data->time_span_ns = KTIME_MIN;
> } else {
> cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
> }
>
Thanks, I like this approach.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count
2025-01-13 18:50 ` [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count Rafael J. Wysocki
@ 2025-01-20 12:10 ` Christian Loehle
0 siblings, 0 replies; 32+ messages in thread
From: Christian Loehle @ 2025-01-20 12:10 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:50, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Instead of computing the total events count from scratch every time,
> decay it and add a PULSE value to it in teo_update().
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag
2025-01-13 18:51 ` [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag Rafael J. Wysocki
@ 2025-01-20 12:16 ` Christian Loehle
2025-01-20 12:42 ` Rafael J. Wysocki
0 siblings, 1 reply; 32+ messages in thread
From: Christian Loehle @ 2025-01-20 12:16 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Artem Bityutskiy
On 1/13/25 18:51, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> After recent updates, the time_span_ns field in struct teo_cpu has
> become an indicator on whether or not the most recent wakeup has been
> "genuine" which may as well be indicated by a bool field without
> calling local_clock(), so update the code accordingly.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 27 +++++++++------------------
> 1 file changed, 9 insertions(+), 18 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -124,20 +124,20 @@
>
> /**
> * struct teo_cpu - CPU data used by the TEO cpuidle governor.
> - * @time_span_ns: Time between idle state selection and post-wakeup update.
> * @sleep_length_ns: Time till the closest timer event (at the selection time).
> * @state_bins: Idle state data bins for this CPU.
> * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
> * @tick_intercepts: "Intercepts" before TICK_NSEC.
> * @short_idle: Wakeups after short idle periods.
> + * @artificial_wakeup: Set if the wakeup has been triggered by a safety net.
> */
> struct teo_cpu {
> - s64 time_span_ns;
> s64 sleep_length_ns;
> struct teo_bin state_bins[CPUIDLE_STATE_MAX];
> unsigned int total;
> unsigned int tick_intercepts;
> unsigned int short_idle;
> + bool artificial_wakeup;
> };
>
> static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
> @@ -156,7 +156,7 @@
>
> cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
>
> - if (cpu_data->time_span_ns < 0) {
> + if (cpu_data->artificial_wakeup) {
> /*
> * If one of the safety nets has triggered, assume that this
> * might have been a long sleep.
> @@ -165,13 +165,6 @@
> } else {
> u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
>
> - /*
> - * The computations below are to determine whether or not the
> - * (saved) time till the next timer event and the measured idle
> - * duration fall into the same "bin", so use last_residency_ns
> - * for that instead of time_span_ns which includes the cpuidle
> - * overhead.
> - */
> measured_ns = dev->last_residency_ns;
> /*
> * The delay between the wakeup and the first instruction
> @@ -286,7 +279,6 @@
> dev->last_state_idx = -1;
> }
>
> - cpu_data->time_span_ns = local_clock();
> /*
> * Set the sleep length to infitity in case the invocation of
You're not touching this here, but might as well fix the infitity
typo with the series.
> * tick_nohz_get_sleep_length() below is skipped, in which case it won't
> @@ -504,17 +496,16 @@
> struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
>
> dev->last_state_idx = state;
> - /*
> - * If the wakeup was not "natural", but triggered by one of the safety
> - * nets, assume that the CPU might have been idle for the entire sleep
> - * length time.
> - */
> if (dev->poll_time_limit ||
> (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
> + /*
> + * The wakeup was not "geniune", but triggered by one of the
s/geniune/genuine
> + * safety nets.
> + */
> dev->poll_time_limit = false;
> - cpu_data->time_span_ns = KTIME_MIN;
> + cpu_data->artificial_wakeup = true;
> } else {
> - cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
> + cpu_data->artificial_wakeup = false;
> }
You could rewrite this without the else by just setting it to false before the if.
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag
2025-01-20 12:16 ` Christian Loehle
@ 2025-01-20 12:42 ` Rafael J. Wysocki
0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-20 12:42 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
Artem Bityutskiy
On Mon, Jan 20, 2025 at 1:16 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/13/25 18:51, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > After recent updates, the time_span_ns field in struct teo_cpu has
> > become an indicator on whether or not the most recent wakeup has been
> > "genuine" which may as well be indicated by a bool field without
> > calling local_clock(), so update the code accordingly.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 27 +++++++++------------------
> > 1 file changed, 9 insertions(+), 18 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -124,20 +124,20 @@
> >
> > /**
> > * struct teo_cpu - CPU data used by the TEO cpuidle governor.
> > - * @time_span_ns: Time between idle state selection and post-wakeup update.
> > * @sleep_length_ns: Time till the closest timer event (at the selection time).
> > * @state_bins: Idle state data bins for this CPU.
> > * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
> > * @tick_intercepts: "Intercepts" before TICK_NSEC.
> > * @short_idle: Wakeups after short idle periods.
> > + * @artificial_wakeup: Set if the wakeup has been triggered by a safety net.
> > */
> > struct teo_cpu {
> > - s64 time_span_ns;
> > s64 sleep_length_ns;
> > struct teo_bin state_bins[CPUIDLE_STATE_MAX];
> > unsigned int total;
> > unsigned int tick_intercepts;
> > unsigned int short_idle;
> > + bool artificial_wakeup;
> > };
> >
> > static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
> > @@ -156,7 +156,7 @@
> >
> > cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
> >
> > - if (cpu_data->time_span_ns < 0) {
> > + if (cpu_data->artificial_wakeup) {
> > /*
> > * If one of the safety nets has triggered, assume that this
> > * might have been a long sleep.
> > @@ -165,13 +165,6 @@
> > } else {
> > u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns;
> >
> > - /*
> > - * The computations below are to determine whether or not the
> > - * (saved) time till the next timer event and the measured idle
> > - * duration fall into the same "bin", so use last_residency_ns
> > - * for that instead of time_span_ns which includes the cpuidle
> > - * overhead.
> > - */
> > measured_ns = dev->last_residency_ns;
> > /*
> > * The delay between the wakeup and the first instruction
> > @@ -286,7 +279,6 @@
> > dev->last_state_idx = -1;
> > }
> >
> > - cpu_data->time_span_ns = local_clock();
> > /*
> > * Set the sleep length to infitity in case the invocation of
>
> You're not touching this here, but might as well fix the infitity
> typo with the series.
I'll fix this when applying the patches.
> > * tick_nohz_get_sleep_length() below is skipped, in which case it won't
> > @@ -504,17 +496,16 @@
> > struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
> >
> > dev->last_state_idx = state;
> > - /*
> > - * If the wakeup was not "natural", but triggered by one of the safety
> > - * nets, assume that the CPU might have been idle for the entire sleep
> > - * length time.
> > - */
> > if (dev->poll_time_limit ||
> > (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
> > + /*
> > + * The wakeup was not "geniune", but triggered by one of the
>
> s/geniune/genuine
Ditto.
> > + * safety nets.
> > + */
> > dev->poll_time_limit = false;
> > - cpu_data->time_span_ns = KTIME_MIN;
> > + cpu_data->artificial_wakeup = true;
> > } else {
> > - cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
> > + cpu_data->artificial_wakeup = false;
> > }
>
> You could rewrite this without the else by just setting it to false before the if.
True, but the "alse" is there already and I didn't want to make more
changes than necessary in this patch.
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Thanks for all of the reviews!
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent
2025-01-20 12:08 ` Christian Loehle
@ 2025-01-20 16:24 ` Rafael J. Wysocki
0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-20 16:24 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano,
Artem Bityutskiy
On Mon, Jan 20, 2025 at 1:08 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/13/25 18:48, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
>
> Title has a typo: s/is/if
Fixed when applying the patch.
> > Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length()
> > call in some cases") attempted to reduce the governor overhead in some
> > cases by making it avoid obtaining the sleep length (the time till the
> > next timer event) which may be costly.
> >
> > Among other things, after the above commit, tick_nohz_get_sleep_length()
> > was not called any more when idle state 0 was to be returned, which
> > turned out to be problematic and the previous behavior in that respect
> > was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non-
> > existent intercepts").
> >
> > However, commit 6da8f9ba5a87 also caused the governor to avoid calling
> > tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling"
> > one (that is, it is not really an idle state, but a loop continuously
> > executed by the CPU) when the target residency of the idle state to be
> > returned was low enough, so there was no practical need to refine the
> > idle state selection in any way. This change was not removed by the
> > other commit, so now on systems where idle state 0 is a "polling" one,
> > tick_nohz_get_sleep_length() is called when idle state 0 is to be
> > returned, but it is not called when a deeper idle state with
> > sufficiently low target residency is to be returned. That is arguably
> > confusing and inconsistent.
> >
> > Moreover, there is no specific reason why the behavior in question
> > should depend on whether or not idle state 0 is a "polling" one.
> >
> > One way to address this would be to make the governor always call
> > tick_nohz_get_sleep_length() to obtain the sleep length, but that would
> > effectively mean reverting commit 6da8f9ba5a87 and restoring the latency
> > issue that was the reason for doing it. This approach is thus not
> > particularly attractive.
> >
> > To address it differently, notice that if a CPU is woken up very often,
> > this is not likely to be caused by timers in the first place (user space
> > has a default timer slack of 50 us and there are relatively few timers
> > with a deadline shorter than several microseconds in the kernel) and
> > even if it were the case, the potential benefit from using a deep idle
> > state would then be questionable for latency reasons. Therefore, if the
> > majority of CPU wakeups occur within several microseconds, it can be
> > assumed that all wakeups in that range are non-timer and the sleep
> > length need not be determined.
> >
> > Accordingly, introduce a new metric for counting wakeups with the
> > measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle
> > state selection to skip the tick_nohz_get_sleep_length() invocation if
> > idle state 0 has been selected or the target residency of the candidate
> > idle state is below RESIDENCY_THRESHOLD_NS and the value of the new
> > metric is at least 1/2 of the total event count.
> >
> > Since the above requires the measured idle duration to be determined
> > every time, except for the cases when one of the safety nets has
> > triggered in which the wakeup is counted as a hit in the deepest
> > idle state idle residency range, update the handling of those cases
> > to avoid skipping the idle duration computation when the CPU wakeup
> > is "genuine".
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > drivers/cpuidle/governors/teo.c | 58 ++++++++++++++++++++++++----------------
> > 1 file changed, 36 insertions(+), 22 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -129,6 +129,7 @@
> > * @state_bins: Idle state data bins for this CPU.
> > * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
> > * @tick_intercepts: "Intercepts" before TICK_NSEC.
> > + * @short_idle: Wakeups after short idle periods.
> > */
> > struct teo_cpu {
> > s64 time_span_ns;
> > @@ -136,6 +137,7 @@
> > struct teo_bin state_bins[CPUIDLE_STATE_MAX];
> > unsigned int total;
> > unsigned int tick_intercepts;
> > + unsigned int short_idle;
>
> Maybe call these short_idles?
I renamed it accordingly when applying the patch.
> > };
> >
> > static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
> > @@ -152,12 +154,12 @@
> > s64 target_residency_ns;
> > u64 measured_ns;
> >
> > - if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
> > + cpu_data->short_idle -= cpu_data->short_idle >> DECAY_SHIFT;
> > +
> > + if (cpu_data->time_span_ns < 0) {
> > /*
> > - * This causes the wakeup to be counted as a hit regardless of
> > - * regardless of the real idle duration which doesn't need to be
> > - * computed because the wakeup has been close enough to an
> > - * anticipated timer.
> > + * If one of the safety nets has triggered, assume that this
> > + * might have been a long sleep.
> > */
> > measured_ns = U64_MAX;
> > } else {
> > @@ -177,10 +179,14 @@
> > * time, so take 1/2 of the exit latency as a very rough
> > * approximation of the average of it.
> > */
> > - if (measured_ns >= lat_ns)
> > + if (measured_ns >= lat_ns) {
> > measured_ns -= lat_ns / 2;
> > - else
> > + if (measured_ns < RESIDENCY_THRESHOLD_NS)
> > + cpu_data->short_idle += PULSE;
> > + } else {
> > measured_ns /= 2;
> > + cpu_data->short_idle += PULSE;
> > + }
> > }
> >
> > cpu_data->total = 0;
> > @@ -419,27 +425,35 @@
> > if (idx > constraint_idx)
> > idx = constraint_idx;
> >
> > - if (!idx) {
> > - /*
> > - * Query the sleep length to be able to count the wakeup as a
> > - * hit if it is caused by a timer.
> > - */
> > - cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
> > - goto out_tick;
> > - }
> > -
> > /*
> > - * If state 0 is a polling one, check if the target residency of
> > - * the current candidate state is low enough and skip the timers
> > - * check in that case too.
> > + * If either the candidate state is state 0 or its target residency is
> > + * low enough, there is basically nothing more to do, but if the sleep
> > + * length is not updated, the subsequent wakeup will be counted as an
> > + * "intercept" which may be problematic in the cases when timer wakeups
> > + * are dominant. Namely, it may effectively prevent deeper idle states
> > + * from being selected at one point even if no imminent timers are
> > + * scheduled.
> > + *
> > + * However, frequent timers in the RESIDENCY_THRESHOLD_NS range on one
> > + * CPU are unlikely (user space has a default 50 us slack value for
> > + * hrtimers and there are relatively few timers with a lower deadline
> > + * value in the kernel), and even if they did happene, the potential
>
> s/happene/happen
Fixed when applying the patch.
> > + * benefit from using a deep idle state in that case would be
> > + * questionable anyway for latency reasons. Thus if the measured idle
> > + * duration falls into that range in the majority of cases, assume
> > + * non-timer wakeups to be dominant and skip updating the sleep length
> > + * to reduce latency.
> > */
> > - if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
> > - drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
> > + if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> > + 2 * cpu_data->short_idle >= cpu_data->total)
> > goto out_tick;
> >
> > duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> > cpu_data->sleep_length_ns = duration_ns;
> >
> > + if (!idx)
> > + goto out_tick;
> > +
> > /*
> > * If the closest expected timer is before the target residency of the
> > * candidate state, a shallower one needs to be found.
> > @@ -501,7 +515,7 @@
> > if (dev->poll_time_limit ||
> > (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) {
> > dev->poll_time_limit = false;
> > - cpu_data->time_span_ns = cpu_data->sleep_length_ns;
> > + cpu_data->time_span_ns = KTIME_MIN;
> > } else {
> > cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns;
> > }
> >
>
> Thanks, I like this approach.
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Thank you!
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update
2025-01-15 14:52 ` [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Christian Loehle
2025-01-20 8:17 ` Aboorva Devarajan
@ 2025-01-20 16:24 ` Rafael J. Wysocki
1 sibling, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-20 16:24 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, Aboorva Devarajan, LKML,
Daniel Lezcano, Artem Bityutskiy
On Wed, Jan 15, 2025 at 3:52 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 1/13/25 18:32, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This supersedes
> >
> > https://lore.kernel.org/linux-pm/4953183.GXAFRqVoOG@rjwysocki.net/
> >
> > but because the majority of patches in it are new, I've decided to count
> > version numbers back from 1.
> >
> > This addresses a relatively recently added inconsistency in behavior of the teo
> > governor regarding the handling of very frequent wakeups handling (patch [7/9])
> > and makes some other changes that may be regarded as cleanups.
> >
> > Please review.
>
> Hi Rafael,
> that looks promising. I'll review the individual patches in detail now, but
> I let a few tests run overnight and can report that there's no significant
> behaviour change with the series on an arm64 (no polling state, rk3399), which
> is my expected result.
>
> I'll get something running on a system with a polling state as well.
> (I don't have a POWER system, so that will just be x86, adding Aboorva.)
>
> Tested-by: Christian Loehle <christian.loehle@arm.com>
> for the entire series
Thanks, much appreciated!
> I'll just dump some of the raw results here for reference.
> teo-X is teo with everything up to X/9 applied. teo-m is mainline, teo is
> the entire series (equivalent to teo-9).
>
> Everything is done with fio direct random4k read. Range of real devices
> with varying latency, mapper/slow is 51ms timer delay, nullb0 is
> essentially CPU-bound, few intercepts expected.
> --
>
> device gov iter iops idles idle_misses idle_miss_ratio belows aboves
> mmcblk1 menu 0 1284 344126 106309 0.309 89264 17045
> mmcblk1 menu 1 1340 358646 109469 0.305 91726 17743
> mmcblk1 menu 2 1316 348812 106773 0.306 89845 16928
> mmcblk1 teo 0 2023 559551 30053 0.054 2962 27091
> mmcblk1 teo 1 2007 548422 25816 0.047 2113 23703
> mmcblk1 teo 2 2010 557120 29434 0.053 2782 26652
> mmcblk1 teo-m 0 2010 542790 23336 0.043 1720 21616
> mmcblk1 teo-m 1 2009 554912 29239 0.053 2750 26489
> mmcblk1 teo-m 2 2013 548384 26444 0.048 2180 24264
> mmcblk1 teo-1 0 2030 524056 3571 0.007 610 2961
> mmcblk1 teo-1 1 2022 559304 27074 0.048 3373 23701
> mmcblk1 teo-1 2 2009 554744 27406 0.049 2964 24442
> mmcblk1 teo-2 0 2012 542886 18233 0.034 2140 16093
> mmcblk1 teo-2 1 2010 554094 27680 0.050 2957 24723
> mmcblk1 teo-2 2 2010 555222 28549 0.051 3085 25464
> mmcblk1 teo-3 0 2011 556816 28708 0.052 3076 25632
> mmcblk1 teo-3 1 2015 546998 20280 0.037 2414 17866
> mmcblk1 teo-3 2 2020 549758 20802 0.038 2662 18140
> mmcblk1 teo-4 0 2026 536464 11692 0.022 1457 10235
> mmcblk1 teo-4 1 2031 557978 25464 0.046 2928 22536
> mmcblk1 teo-4 2 2012 553812 26323 0.048 2943 23380
> mmcblk1 teo-5 0 2021 557966 3454 0.006 3003 451
> mmcblk1 teo-5 1 2027 546630 18400 0.034 2303 16097
> mmcblk1 teo-5 2 2012 554390 28013 0.051 2990 25023
> mmcblk1 teo-6 0 2037 539956 12764 0.024 1701 11063
> mmcblk1 teo-6 1 2019 556202 29074 0.052 3071 26003
> mmcblk1 teo-6 2 2017 555004 26814 0.048 2869 23945
> mmcblk1 teo-7 0 2028 545802 23806 0.044 1625 22181
> mmcblk1 teo-7 1 2019 557030 29913 0.054 2990 26923
> mmcblk1 teo-7 2 2018 553236 28534 0.052 2657 25877
> mmcblk1 teo-8 0 2018 554662 28595 0.052 2660 25935
> mmcblk1 teo-8 1 2014 554646 28947 0.052 2721 26226
> mmcblk1 teo-8 2 2014 543828 29379 0.054 2915 26464
> mmcblk1 teo-9 0 2013 555356 28867 0.052 2719 26148
> mmcblk1 teo-9 1 2015 557254 29709 0.053 2836 26873
> mmcblk1 teo-9 2 2030 559272 29321 0.052 3052 26269
> mmcblk2 menu 0 2885 434990 140864 0.324 123507 17357
> mmcblk2 menu 1 2766 450918 152931 0.339 135384 17547
> mmcblk2 menu 2 2960 407712 125573 0.308 109224 16349
> mmcblk2 teo 0 5677 741006 24131 0.033 2233 21898
> mmcblk2 teo 1 5652 714494 24412 0.034 2680 21732
> mmcblk2 teo 2 5660 752410 20497 0.027 1809 18688
> mmcblk2 teo-m 0 5683 855232 23454 0.027 1934 21520
> mmcblk2 teo-m 1 5588 517234 32016 0.062 5443 26573
> mmcblk2 teo-m 2 5654 668648 29108 0.044 3171 25937
> mmcblk2 teo-1 0 5666 773452 25740 0.033 3430 22310
> mmcblk2 teo-1 1 5638 668190 27619 0.041 4150 23469
> mmcblk2 teo-1 2 5689 866340 28191 0.033 3122 25069
> mmcblk2 teo-2 0 5769 859462 2116 0.002 1835 281
> mmcblk2 teo-2 1 5577 611956 25490 0.042 8647 16843
> mmcblk2 teo-2 2 5665 721308 25826 0.036 2876 22950
> mmcblk2 teo-3 0 5677 805946 30345 0.038 3877 26468
> mmcblk2 teo-3 1 5752 855840 14392 0.017 2588 11804
> mmcblk2 teo-3 2 5740 825580 27403 0.033 3750 23653
> mmcblk2 teo-4 0 5664 777848 3435 0.004 3229 206
> mmcblk2 teo-4 1 5660 728926 25796 0.035 2871 22925
> mmcblk2 teo-4 2 5689 774342 7844 0.010 2838 5006
> mmcblk2 teo-5 0 5672 781618 2733 0.003 2551 182
> mmcblk2 teo-5 1 5693 866530 28658 0.033 3332 25326
> mmcblk2 teo-5 2 5687 837294 12172 0.015 2338 9834
> mmcblk2 teo-6 0 5859 892798 5226 0.006 3566 1660
> mmcblk2 teo-6 1 5696 864197 24962 0.029 2796 22166
> mmcblk2 teo-6 2 5671 788200 26612 0.034 3157 23455
> mmcblk2 teo-7 0 5617 525864 30143 0.057 3739 26404
> mmcblk2 teo-7 1 5732 802794 25553 0.032 3240 22313
> mmcblk2 teo-7 2 5838 858620 12661 0.015 188 12473
> mmcblk2 teo-8 0 5696 780894 24440 0.031 2928 21512
> mmcblk2 teo-8 1 5868 862794 12978 0.015 187 12791
> mmcblk2 teo-8 2 5607 660124 31632 0.048 6728 24904
> mmcblk2 teo-9 0 5652 669318 28330 0.042 2692 25638
> mmcblk2 teo-9 1 5657 710892 29291 0.041 3233 26058
> mmcblk2 teo-9 2 5594 740988 37086 0.050 10275 26811
> nvme0n1 menu 0 4775 384186 70675 0.184 54346 16329
> nvme0n1 menu 1 5211 409374 69545 0.170 53484 16061
> nvme0n1 menu 2 7213 535088 81312 0.152 64253 17059
> nvme0n1 teo 0 10655 754882 29476 0.039 2960 26516
> nvme0n1 teo 1 10627 755322 29324 0.039 3011 26313
> nvme0n1 teo 2 10559 749498 29850 0.040 3265 26585
> nvme0n1 teo-m 0 10353 738802 29937 0.041 2996 26941
> nvme0n1 teo-m 1 10629 748148 27076 0.036 2349 24727
> nvme0n1 teo-m 2 10477 743256 28193 0.038 2645 25548
> nvme0n1 teo-1 0 10947 772905 24896 0.032 3150 21746
> nvme0n1 teo-1 1 10412 732894 21868 0.030 2423 19445
> nvme0n1 teo-1 2 10530 748377 26338 0.035 3479 22859
> nvme0n1 teo-2 0 10570 751961 5195 0.007 2853 2342
> nvme0n1 teo-2 1 10482 732428 18667 0.025 2102 16565
> nvme0n1 teo-2 2 10829 768292 26400 0.034 3189 23211
> nvme0n1 teo-3 0 10493 746638 26371 0.035 3320 23051
> nvme0n1 teo-3 1 10445 742924 27871 0.038 3027 24844
> nvme0n1 teo-3 2 10920 775112 25646 0.033 3520 22126
> nvme0n1 teo-4 0 10500 734792 18125 0.025 2283 15842
> nvme0n1 teo-4 1 11091 783828 25904 0.033 3031 22873
> nvme0n1 teo-4 2 10413 736248 23786 0.032 2660 21126
> nvme0n1 teo-5 0 10479 744340 26933 0.036 3096 23837
> nvme0n1 teo-5 1 10740 764188 27065 0.035 3062 24003
> nvme0n1 teo-5 2 10500 747060 25962 0.035 3233 22729
> nvme0n1 teo-6 0 10715 757772 2186 0.003 1872 314
> nvme0n1 teo-6 1 10485 734878 17620 0.024 2174 15446
> nvme0n1 teo-6 2 10478 744106 26909 0.036 3007 23902
> nvme0n1 teo-7 0 10549 750022 30416 0.041 3320 27096
> nvme0n1 teo-7 1 10611 752182 29877 0.040 3332 26545
> nvme0n1 teo-7 2 11170 791572 30790 0.039 3453 27337
> nvme0n1 teo-8 0 10622 752523 29002 0.039 2745 26257
> nvme0n1 teo-8 1 10424 739904 28641 0.039 2756 25885
> nvme0n1 teo-8 2 10533 731440 22101 0.030 1280 20821
> nvme0n1 teo-9 0 10367 727768 24895 0.034 1820 23075
> nvme0n1 teo-9 1 10815 766230 29378 0.038 3136 26242
> nvme0n1 teo-9 2 10426 739224 28173 0.038 2563 25610
> sda menu 0 872 526240 175645 0.334 159373 16272
> sda menu 1 900 536434 188122 0.351 170749 17373
> sda menu 2 901 534826 189778 0.355 171745 18033
> sda teo 0 1687 999108 16015 0.016 1249 14766
> sda teo 1 1668 1007551 31316 0.031 5809 25507
> sda teo 2 1682 985830 18784 0.019 1003 17781
> sda teo-m 0 1676 969984 25825 0.027 3191 22634
> sda teo-m 1 1682 995266 30340 0.030 4040 26300
> sda teo-m 2 1681 940562 20830 0.022 1508 19322
> sda teo-1 0 1680 996214 15553 0.016 2192 13361
> sda teo-1 1 1674 1050168 28033 0.027 4011 24022
> sda teo-1 2 1665 808202 35292 0.044 8873 26419
> sda teo-2 0 1689 1013360 546 0.001 243 303
> sda teo-2 1 1672 1033806 29432 0.028 4363 25069
> sda teo-2 2 1667 1046100 31110 0.030 6395 24715
> sda teo-3 0 1625 922694 40891 0.044 22925 17966
> sda teo-3 1 1670 894480 27946 0.031 3045 24901
> sda teo-3 2 1658 1009366 28514 0.028 8887 19627
> sda teo-4 0 1674 977280 3605 0.004 3279 326
> sda teo-4 1 1677 960990 2861 0.003 1058 1803
> sda teo-4 2 1660 1016592 33894 0.033 7687 26207
> sda teo-5 0 1678 1033470 16996 0.016 2163 14833
> sda teo-5 1 1667 873077 25011 0.029 4393 20618
> sda teo-5 2 1672 1042020 24589 0.024 3858 20731
> sda teo-6 0 1680 962686 7255 0.008 1797 5458
> sda teo-6 1 1682 1055472 28602 0.027 5081 23521
> sda teo-6 2 1675 989244 19110 0.019 5620 13490
> sda teo-7 0 1618 1001000 51178 0.051 26011 25167
> sda teo-7 1 1666 1047310 34032 0.032 6520 27512
> sda teo-7 2 1676 982046 28788 0.029 2836 25952
> sda teo-8 0 1618 975930 52929 0.054 26386 26543
> sda teo-8 1 1678 1023796 16182 0.016 2191 13991
> sda teo-8 2 1633 981693 43520 0.044 18782 24738
> sda teo-9 0 1661 1029494 32423 0.031 5855 26568
> sda teo-9 1 1678 969400 17843 0.018 2763 15080
> sda teo-9 2 1679 1055060 32288 0.031 5607 26681
> mtdblock3 menu 0 180 278442 80547 0.289 65714 14833
> mtdblock3 menu 1 167 379840 109168 0.287 93460 15708
> mtdblock3 menu 2 155 373948 126262 0.338 107799 18463
> mtdblock3 teo 0 256 663190 29333 0.044 4836 24497
> mtdblock3 teo 1 257 808114 29394 0.036 2742 26652
> mtdblock3 teo 2 257 456150 25812 0.057 2468 23344
> mtdblock3 teo-m 0 257 624492 24678 0.040 2326 22352
> mtdblock3 teo-m 1 256 734548 31165 0.042 4307 26858
> mtdblock3 teo-m 2 257 812004 30479 0.038 3510 26969
> mtdblock3 teo-1 0 253 759708 34146 0.045 10081 24065
> mtdblock3 teo-1 1 254 730160 23105 0.032 7156 15949
> mtdblock3 teo-1 2 253 721558 32848 0.046 10028 22820
> mtdblock3 teo-2 0 257 666426 4948 0.007 782 4166
> mtdblock3 teo-2 1 256 682046 25126 0.037 4886 20240
> mtdblock3 teo-2 2 253 639950 36029 0.056 11074 24955
> mtdblock3 teo-3 0 252 709122 37198 0.052 12603 24595
> mtdblock3 teo-3 1 257 709680 30000 0.042 3670 26330
> mtdblock3 teo-3 2 254 540408 32696 0.061 9035 23661
> mtdblock3 teo-4 0 256 442356 28281 0.064 3785 24496
> mtdblock3 teo-4 1 254 588362 16934 0.029 5663 11271
> mtdblock3 teo-4 2 257 628776 28300 0.045 3667 24633
> mtdblock3 teo-5 0 250 762594 43008 0.056 16752 26256
> mtdblock3 teo-5 1 256 586098 29744 0.051 4035 25709
> mtdblock3 teo-5 2 262 903736 375 0.000 139 236
> mtdblock3 teo-6 0 250 795274 42742 0.054 17160 25582
> mtdblock3 teo-6 1 256 800172 28195 0.035 4016 24179
> mtdblock3 teo-6 2 257 523365 26260 0.050 2788 23472
> mtdblock3 teo-7 0 261 944626 31616 0.033 5090 26526
> mtdblock3 teo-7 1 258 647048 28997 0.045 2728 26269
> mtdblock3 teo-7 2 260 901858 18521 0.021 3698 14823
> mtdblock3 teo-8 0 256 914076 27272 0.030 6844 20428
> mtdblock3 teo-8 1 256 809113 31696 0.039 4672 27024
> mtdblock3 teo-8 2 252 798396 40223 0.050 13270 26953
> mtdblock3 teo-9 0 259 726886 30515 0.042 4002 26513
> mtdblock3 teo-9 1 262 892854 9411 0.011 1686 7725
> mtdblock3 teo-9 2 259 909528 16583 0.018 3054 13529
> nullb0 menu 0 107364 86086 20668 0.240 2604 18064
> nullb0 menu 1 107701 85072 20251 0.238 2808 17443
> nullb0 menu 2 107976 85178 20591 0.242 3063 17528
> nullb0 teo 0 105758 84464 28872 0.342 3429 25443
> nullb0 teo 1 106892 87122 30255 0.347 3722 26533
> nullb0 teo 2 107080 83374 27911 0.335 2918 24993
> nullb0 teo-m 0 107702 88337 29824 0.338 2956 26868
> nullb0 teo-m 1 108218 88130 29710 0.337 3038 26672
> nullb0 teo-m 2 106993 86866 30033 0.346 3664 26369
> nullb0 teo-1 0 106416 82936 23635 0.285 3186 20449
> nullb0 teo-1 1 107570 87425 27682 0.317 3220 24462
> nullb0 teo-1 2 107832 75656 19434 0.257 2775 16659
> nullb0 teo-2 0 106320 88182 28603 0.324 3213 25390
> nullb0 teo-2 1 107316 79968 21643 0.271 3081 18562
> nullb0 teo-2 2 108007 84018 25244 0.300 3034 22210
> nullb0 teo-3 0 106989 85160 24558 0.288 3601 20957
> nullb0 teo-3 1 107325 83582 24994 0.299 2963 22031
> nullb0 teo-3 2 107063 77066 18978 0.246 2913 16065
> nullb0 teo-4 0 107990 86156 25012 0.290 3404 21608
> nullb0 teo-4 1 107352 75750 15096 0.199 2671 12425
> nullb0 teo-4 2 107685 87644 26047 0.297 3825 22222
> nullb0 teo-5 0 106687 87788 13428 0.153 2937 10491
> nullb0 teo-5 1 107820 86442 25703 0.297 3686 22017
> nullb0 teo-5 2 108319 81498 22895 0.281 2713 20182
> nullb0 teo-6 0 107784 84364 24239 0.287 3459 20780
> nullb0 teo-6 1 108762 83854 24017 0.286 3289 20728
> nullb0 teo-6 2 106024 88238 27778 0.315 3253 24525
> nullb0 teo-7 0 106041 86354 30076 0.348 3420 26656
> nullb0 teo-7 1 108039 79564 26805 0.337 2924 23881
> nullb0 teo-7 2 107389 88264 30713 0.348 3156 27557
> nullb0 teo-8 0 106465 87924 29679 0.338 2958 26721
> nullb0 teo-8 1 107215 81348 27038 0.332 2916 24122
> nullb0 teo-8 2 108238 87070 29284 0.336 2971 26313
> nullb0 teo-9 0 107220 84508 28894 0.342 3411 25483
> nullb0 teo-9 1 107400 86920 29945 0.345 3579 26366
> nullb0 teo-9 2 107845 87686 30439 0.347 3784 26655
> mapper/dm-slow menu 0 19 85216 20060 0.235 2478 17582
> mapper/dm-slow teo 0 19 86454 28959 0.335 2860 26099
> mapper/dm-slow teo-m 0 19 84740 28056 0.331 2582 25474
> mapper/dm-slow teo-1 0 19 88698 28945 0.326 3210 25735
> mapper/dm-slow teo-2 0 19 83138 24326 0.293 2589 21737
> mapper/dm-slow teo-3 0 19 86416 27274 0.316 2998 24276
> mapper/dm-slow teo-4 0 19 89286 29369 0.329 3151 26218
> mapper/dm-slow teo-5 0 19 88516 29213 0.330 3163 26050
> mapper/dm-slow teo-6 0 19 87896 24675 0.281 3113 21562
> mapper/dm-slow teo-7 0 19 83986 27976 0.333 2788 25188
> mapper/dm-slow teo-8 0 19 88072 29667 0.337 2962 26705
> mapper/dm-slow teo-9 0 19 87260 29428 0.337 2923 26505
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update
2025-01-20 8:17 ` Aboorva Devarajan
@ 2025-01-20 16:26 ` Rafael J. Wysocki
0 siblings, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2025-01-20 16:26 UTC (permalink / raw)
To: Aboorva Devarajan
Cc: Christian Loehle, Rafael J. Wysocki, LKML, Daniel Lezcano,
Artem Bityutskiy, Linux PM
On Mon, Jan 20, 2025 at 9:17 AM Aboorva Devarajan
<aboorvad@linux.ibm.com> wrote:
>
> On Wed, 2025-01-15 at 14:52 +0000, Christian Loehle wrote:
> > On 1/13/25 18:32, Rafael J. Wysocki wrote:
> > > Hi Everyone,
> > >
> > > This supersedes
> > >
> > > https://lore.kernel.org/linux-pm/4953183.GXAFRqVoOG@rjwysocki.net/
> > >
> > > but because the majority of patches in it are new, I've decided to count
> > > version numbers back from 1.
> > >
> > > This addresses a relatively recently added inconsistency in behavior of the teo
> > > governor regarding the handling of very frequent wakeups handling (patch [7/9])
> > > and makes some other changes that may be regarded as cleanups.
> > >
> > > Please review.
> >
> > Hi Rafael,
> > that looks promising. I'll review the individual patches in detail now, but
> > I let a few tests run overnight and can report that there's no significant
> > behaviour change with the series on an arm64 (no polling state, rk3399), which
> > is my expected result.
> >
> > I'll get something running on a system with a polling state as well.
> > (I don't have a POWER system, so that will just be x86, adding Aboorva.)
>
>
> Christian,
>
> Thanks for adding me to the thread.
>
> Rafael,
>
> I did some tests with the patch on a Pseries Power10 system:
>
> Here are the system details:
>
> --------------------------------------------------------------------------------
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 48
> On-line CPU(s) list: 0-47
> Model name: POWER10 (architected), altivec supported
> Model: 2.0 (pvr 0080 0200)
> Thread(s) per core: 8
> Core(s) per socket: 6
> Socket(s): 1
> Physical sockets: 8
> Physical chips: 1
> Physical cores/chip: 10
>
> --------------------------------------------------------------------------------
> # cpupower idle-info
> CPUidle driver: pseries_idle
> CPUidle governor: menu
> analyzing CPU 5:
>
> Number of idle states: 2
> Available idle states: snooze Shared Cede
> snooze:
> Flags/Description: snooze
> Latency: 0
> Usage: 1411724
> Duration: 27481504
> Shared Cede:
> Flags/Description: Shared Cede
> Latency: 10
> Usage: 326573
> Duration: 31098864616
> --------------------------------------------------------------------------------
>
> How to infer the results:
>
> Above Diff (%) and Below Diff (%) represent the number of cpuidle misses,
> indicating how frequently the selected cpuidle state was either too deep or
> too shallow. So, these values should not be too high.
> --------------------------------------------------------------------------------
>
> The below test is done using a predictable timer and non-timer benchmark [1]:
>
> --------------------------------------------------------------------------------
> Menu Governor:
> --------------------------------------------------------------------------------
>
> With pipe wakeup (non-timer):
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 5.980 1656304 2 0.000121 0 0.000000
> 1 10.959 901972 1954 0.216636 0 0.000000
> --------------------------------------------------------------------------------------------------------
> 2 15.726 243971 237112 97.188600 0 0.000000
> 3 20.813 232069 227258 97.926910 0 0.000000
> 4 30.896 209884 206492 98.383869 0 0.000000
> 5 40.991 216704 213642 98.587013 0 0.000000
> 6 51.002 195632 192963 98.635704 0 0.000000
> 7 61.014 163726 161506 98.644076 0 0.000000
> 8 71.006 140739 138809 98.628667 0 0.000000
> 9 81.008 123386 120725 97.843353 0 0.000000
> 10 101.020 98974 81235 82.077111 0 0.000000
> --------------------------------------------------------------------------------------------------------
> 11 111.044 90018 1513 1.680775 12 0.013331
> 12 121.015 82704 189 0.228526 77 0.093103
> 13 131.028 76534 272 0.355398 321 0.419421
> 14 141.008 71610 698 0.974724 693 0.967742
> 15 151.021 66869 666 0.995977 656 0.981023
> 16 161.027 62709 611 0.974342 605 0.964774
> 17 171.033 59063 593 1.004013 593 1.004013
> 18 181.019 55819 571 1.022949 541 0.969204
> 19 191.016 52998 641 1.209480 628 1.184950
> 20 201.017 50353 551 1.094274 501 0.994975
> 21 251.054 40535 289 0.712964 398 0.981868
> 22 301.037 33966 252 0.741918 330 0.971560
> 23 351.038 29279 216 0.737730 294 1.004133
> 24 401.047 25765 190 0.737435 262 1.016883
> 25 451.060 23021 185 0.803614 187 0.812302
> 26 501.049 20831 150 0.720081 216 1.036916
> 27 1001.076 10951 77 0.703132 126 1.150580
>
> With timer wakeup:
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 7.590 1310772 0 0.000000 0 0.000000
> 1 12.631 789377 780 0.098812 0 0.000000
> 2 21.791 458001 52321 11.423774 0 0.000000
> 3 22.648 440752 36 0.008168 0 0.000000
> 4 32.644 305983 0 0.000000 0 0.000000
> 5 42.646 234305 0 0.000000 0 0.000000
> 6 52.647 189858 2 0.001053 0 0.000000
> 7 62.649 159561 10 0.006267 0 0.000000
> 8 72.644 137643 5 0.003633 1 0.000727
> 9 82.666 120963 5 0.004133 0 0.000000
> 10 102.654 97442 3 0.003079 610 0.626013
> 11 145.805 69937 441 0.630568 1345 1.923159
> 12 156.057 64511 75 0.116259 439 0.680504
> 13 166.047 60765 215 0.353822 534 0.878795
> 14 175.894 57564 178 0.309221 687 1.193454
> 15 185.933 54471 255 0.468139 638 1.171265
> 16 195.975 51403 98 0.190650 212 0.412427
> 17 206.062 49281 174 0.353077 577 1.170837
> 18 216.188 46980 33 0.070243 571 1.215411
> 19 226.346 44879 30 0.066846 543 1.209920
> 20 236.353 43081 27 0.062673 516 1.197744
> 21 286.158 35782 5 0.013974 154 0.430384
> 22 336.730 30531 11 0.036029 266 0.871246
> 23 386.730 26722 18 0.067360 232 0.868198
> 24 436.770 23797 9 0.037820 196 0.823633
> 25 487.229 21359 13 0.060864 229 1.072148
> 26 537.375 19557 13 0.066472 259 1.324334
> 27 1037.871 10638 12 0.112803 127 1.193833
>
> --------------------------------------------------------------------------------
> Teo governor:
> --------------------------------------------------------------------------------
>
> With pipe wakeup (non-timer):
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 5.972 1657561 6 0.000362 0 0.000000
> 1 10.964 907279 0 0.000000 0 0.000000
> 2 15.977 623681 0 0.000000 0 0.000000
> 3 20.980 475385 1 0.000210 0 0.000000
> 4 30.981 322151 0 0.000000 0 0.000000
> 5 40.975 243749 0 0.000000 0 0.000000
> 6 50.977 195989 0 0.000000 0 0.000000
> 7 60.981 163876 0 0.000000 0 0.000000
> 8 70.978 140818 0 0.000000 0 0.000000
> 9 80.976 123460 0 0.000000 1 0.000810
> 10 100.970 99038 0 0.000000 16 0.016155
> --------------------------------------------------------------------------------------------------------
> 11 111.027 106919 5388 5.039329 16873 15.781105
> --------------------------------------------------------------------------------------------------------
> 12 121.017 83074 381 0.458627 443 0.533260
> 13 131.024 76588 323 0.421737 373 0.487021
> 14 141.027 71592 695 0.970779 695 0.970779
> 15 151.023 66874 669 1.000389 662 0.989921
> 16 161.026 62719 607 0.967809 595 0.948676
> 17 171.027 59064 581 0.983679 574 0.971827
> 18 181.020 55817 561 1.005070 532 0.953115
> 19 191.020 52883 523 0.988976 446 0.843371
> 20 201.025 50509 670 1.326496 645 1.277000
> 21 251.037 40544 280 0.690608 409 1.008781
> 22 301.033 33988 263 0.773803 349 1.026833
> 23 351.036 29285 221 0.754653 299 1.021001
> 24 401.042 25777 203 0.787524 272 1.055204
> 25 451.040 23029 174 0.755569 238 1.033480
> 26 501.048 20838 157 0.753431 224 1.074959
> 27 1001.097 10949 79 0.721527 134 1.223856
>
> With timer wakeup:
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 7.541 1319205 0 0.000000 0 0.000000
> 1 12.546 794464 0 0.000000 0 0.000000
> 2 17.540 568954 0 0.000000 0 0.000000
> 3 22.572 442307 0 0.000000 0 0.000000
> 4 32.583 306443 0 0.000000 1 0.000326
> 5 42.597 233238 0 0.000000 0 0.000000
> 6 52.587 190067 0 0.000000 0 0.000000
> 7 62.590 159714 0 0.000000 1 0.000626
> 8 72.574 137755 0 0.000000 2 0.001452
> 9 82.581 121081 0 0.000000 0 0.000000
> 10 102.589 97491 0 0.000000 1912 1.961207
> 11 146.385 68906 47 0.068209 599 0.869300
> 12 156.548 64565 86 0.133199 670 1.037714
> 13 166.588 60562 100 0.165120 518 0.855322
> 14 176.676 57264 263 0.459276 642 1.121123
> 15 186.563 54262 293 0.539973 601 1.107589
> 16 195.986 51668 192 0.371603 526 1.018038
> 17 206.860 49028 97 0.197846 564 1.150363
> 18 216.899 46669 27 0.057854 460 0.985665
> 19 227.016 44528 22 0.049407 367 0.824201
> 20 237.055 42883 28 0.065294 507 1.182287
> 21 286.998 35665 9 0.025235 283 0.793495
> 22 337.410 30439 7 0.022997 264 0.867308
> 23 387.522 26652 18 0.067537 251 0.941768
> 24 437.570 23742 8 0.033696 221 0.930840
> 25 487.804 21293 10 0.046964 94 0.441460
> 26 537.884 19505 7 0.035888 243 1.245834
> 27 1038.863 10633 4 0.037619 135 1.269632
>
> --------------------------------------------------------------------------------
> Teo Governor with patch
> --------------------------------------------------------------------------------
>
> With pipe wakeup (non-timer):
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 5.959 1661754 5 0.000301 0 0.000000
> 1 10.963 907497 2 0.000220 0 0.000000
> 2 15.968 623957 2 0.000321 0 0.000000
> 3 20.970 475574 2 0.000421 0 0.000000
> 4 30.974 321718 2 0.000622 0 0.000000
> 5 40.974 243714 2 0.000821 0 0.000000
> 6 50.983 195931 2 0.001021 0 0.000000
> 7 60.974 163876 2 0.001220 1 0.000610
> 8 70.973 140810 2 0.001420 1 0.000710
> 9 80.988 123420 1 0.000810 4 0.003241
> 10 100.994 99014 2 0.002020 20 0.020199
> --------------------------------------------------------------------------------------------------------
> 11 111.023 135597 11864 8.749456 45561 33.600301 => This is observed even
> -------------------------------------------------------------------------------------------------------- without the patch,
> 12 121.035 82948 348 0.419540 352 0.424362 when the sleep interval
> 13 131.019 76342 82 0.107411 48 0.062875 is almost equal to the
> 14 141.028 70948 70 0.098664 57 0.080341 residency time of state1.
> 15 151.023 66278 81 0.122212 69 0.104107
> 16 161.021 62146 65 0.104592 51 0.082065
> 17 171.023 58509 64 0.109385 47 0.080330
> 18 181.026 55301 64 0.115730 48 0.086798
> 19 191.033 52407 67 0.127846 45 0.085866
> 20 201.024 49803 52 0.104411 48 0.096380
> 21 251.042 39911 39 0.097717 45 0.112751
> 22 301.040 33302 29 0.087082 40 0.120113
> 23 351.045 28572 37 0.129497 34 0.118998
> 24 401.057 25005 20 0.079984 27 0.107978
> 25 451.055 22246 21 0.094399 26 0.116875
> 26 501.053 20031 14 0.069892 25 0.124807
> 27 1001.099 10055 7 0.069617 15 0.149180
>
>
> With timer wakeup:
>
> --------------------------------------------------------------------------------------------------------
> Sleep Interval Total Usage Diff Total Above Diff Above Diff (%) Total Below Diff Below Diff (%)
> --------------------------------------------------------------------------------------------------------
> 0 7.566 1314872 0 0.000000 0 0.000000
> 1 12.553 794091 0 0.000000 0 0.000000
> 2 17.573 567627 0 0.000000 0 0.000000
> 3 22.631 441084 0 0.000000 0 0.000000
> 4 32.633 306095 0 0.000000 0 0.000000
> 5 42.631 234377 0 0.000000 0 0.000000
> 6 52.634 189899 0 0.000000 0 0.000000
> 7 62.642 159572 0 0.000000 0 0.000000
> 8 72.645 137619 0 0.000000 1 0.000727
> 9 82.616 121037 0 0.000000 1 0.000826
> 10 102.636 97423 0 0.000000 1047 1.074695
> 11 145.823 69245 43 0.062098 680 0.982020
> 12 155.912 64546 90 0.139435 478 0.740557
> 13 166.103 60709 219 0.360737 501 0.825248
> 14 176.036 57483 196 0.340970 639 1.111633
> 15 186.043 54448 250 0.459154 623 1.144211
> 16 195.552 51428 124 0.241114 230 0.447227
> 17 205.990 49270 141 0.286178 527 1.069616
> 18 216.300 46931 39 0.083101 545 1.161279
> 19 226.288 44884 23 0.051243 520 1.158542
> 20 236.372 43056 20 0.046451 493 1.145020
> 21 285.985 35724 1 0.002799 119 0.333109
> 22 336.636 30526 11 0.036035 262 0.858285
> 23 386.522 26772 17 0.063499 218 0.814284
> 24 436.749 23857 26 0.108983 208 0.871862
> 25 487.204 21358 9 0.042139 240 1.123701
> 26 537.312 19530 7 0.035842 235 1.203277
> 27 1038.147 10610 2 0.018850 119 1.121583
> --------------------------------------------------------------------------------
>
>
> I also did some tests with postgres (pgbench) - with the patch:
>
> +---------------------------+--------------------+------------------+
> | Metric | Shared Cede | Snooze |
> +---------------------------+--------------------+------------------+
> | Total Usage Difference | 119,453 | 20,472,846 |
> | Total Time Difference | 502.79 seconds | 324.90 seconds |
> | Total Above Difference | 74,500 (0.36%) | 0 |
> | Total Below Difference | 0 | 336,703 (1.64%) |
> +---------------------------+--------------------+------------------+
>
> % Above Diff = 0.36% of total usage
> % Below Diff = 1.64% of total usage
>
>
> Using both the deterministic micro-benchmark and pgbench, I observed that the
> teo governor with the patch as anticipated does not cause any noticable increase in the cpuidle
> state prediction miss on PowerPC (pseries) (% above and below diff).
>
> So, for the entire series:
>
> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Thank you, much appreciated!
> [1]: https://github.com/AboorvaDevarajan/linux-utils/tree/main/cpuidle/cpuidle_wakeup
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-01-20 16:26 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 18:32 [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Rafael J. Wysocki
2025-01-13 18:34 ` [PATCH v1 1/9] cpuidle: teo: Rearrange idle state lookup code Rafael J. Wysocki
2025-01-15 14:21 ` Christian Loehle
2025-01-13 18:36 ` [PATCH v1 2/9] cpuidle: teo: Reorder candidate state index checks Rafael J. Wysocki
2025-01-15 14:46 ` Christian Loehle
2025-01-15 15:54 ` Rafael J. Wysocki
2025-01-15 19:20 ` Christian Loehle
2025-01-15 20:48 ` Rafael J. Wysocki
2025-01-15 21:10 ` Rafael J. Wysocki
2025-01-16 12:22 ` Rafael J. Wysocki
2025-01-16 13:26 ` Christian Loehle
2025-01-16 13:27 ` Christian Loehle
2025-01-13 18:39 ` [PATCH v1 3/9] cpuidle: teo: Combine candidate state index checks against 0 Rafael J. Wysocki
2025-01-15 19:44 ` Christian Loehle
2025-01-13 18:40 ` [PATCH v1 4/9] cpuidle: teo: Drop local variable prev_intercept_idx Rafael J. Wysocki
2025-01-15 19:46 ` Christian Loehle
2025-01-13 18:41 ` [PATCH v1 5/9] cpuidle: teo: Clarify two code comments Rafael J. Wysocki
2025-01-15 19:43 ` Christian Loehle
2025-01-13 18:45 ` [PATCH v1 6/9] cpuidle: teo: Simplify counting events used for tick management Rafael J. Wysocki
2025-01-20 11:27 ` Christian Loehle
2025-01-13 18:48 ` [PATCH v1 7/9] cpuidle: teo: Skip getting the sleep length is wakeups are very frequent Rafael J. Wysocki
2025-01-20 12:08 ` Christian Loehle
2025-01-20 16:24 ` Rafael J. Wysocki
2025-01-13 18:50 ` [PATCH v1 8/9] cpuidle: teo: Simplify handling of total events count Rafael J. Wysocki
2025-01-20 12:10 ` Christian Loehle
2025-01-13 18:51 ` [PATCH v1 9/9] cpuidle: teo: Replace time_span_ns with a flag Rafael J. Wysocki
2025-01-20 12:16 ` Christian Loehle
2025-01-20 12:42 ` Rafael J. Wysocki
2025-01-15 14:52 ` [PATCH v1 0/9] cpuidle: teo: Cleanups and very frequent wakeups handling update Christian Loehle
2025-01-20 8:17 ` Aboorva Devarajan
2025-01-20 16:26 ` Rafael J. Wysocki
2025-01-20 16:24 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox