* [RFC/RFT][PATCH v1 1/2] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront
2023-08-01 19:35 [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
@ 2023-08-01 19:39 ` Rafael J. Wysocki
2023-08-01 19:40 ` [RFC/RFT][PATCH v1 2/2] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
2023-08-03 13:18 ` [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time Kajetan Puchalski
2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-08-01 19:39 UTC (permalink / raw)
To: Linux PM, Peter Zijlstra, Anna-Maria Behnsen
Cc: LKML, Frederic Weisbecker, Kajetan Puchalski
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because the cost of calling tick_nohz_get_sleep_length() may increase
in the future, reorder the code in teo_select() so it first uses the
statistics to pick up a candidate idle state and applies the utilization
heuristic to it and only then calls tick_nohz_get_sleep_length() to
obtain the sleep length value and refine the selection if necessary.
This change by itself does not cause tick_nohz_get_sleep_length() to
be called less often, but it prepares the code for subsequent changes
that will do so.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 102 ++++++++++++++++------------------------
1 file changed, 43 insertions(+), 59 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -306,15 +306,10 @@ static void teo_update(struct cpuidle_dr
cpu_data->total += PULSE;
}
-static bool teo_time_ok(u64 interval_ns)
+static bool teo_state_ok(int i, struct cpuidle_driver *drv)
{
- return !tick_nohz_tick_stopped() || interval_ns >= TICK_NSEC;
-}
-
-static s64 teo_middle_of_bin(int idx, struct cpuidle_driver *drv)
-{
- return (drv->states[idx].target_residency_ns +
- drv->states[idx+1].target_residency_ns) / 2;
+ return !tick_nohz_tick_stopped() ||
+ drv->states[i].target_residency_ns >= TICK_NSEC;
}
/**
@@ -354,6 +349,7 @@ static int teo_select(struct cpuidle_dri
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
+ ktime_t delta_tick = TICK_NSEC / 2;
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
unsigned int idx_recent_sum = 0;
@@ -363,7 +359,6 @@ static int teo_select(struct cpuidle_dri
int constraint_idx = 0;
int idx0 = 0, idx = -1;
bool alt_intercepts, alt_recent;
- ktime_t delta_tick;
bool cpu_utilized;
s64 duration_ns;
int i;
@@ -374,9 +369,11 @@ static int teo_select(struct cpuidle_dri
}
cpu_data->time_span_ns = local_clock();
-
- duration_ns = tick_nohz_get_sleep_length(&delta_tick);
- cpu_data->sleep_length_ns = duration_ns;
+ /*
+ * Set the expected sleep length to infinity in case of an early
+ * return.
+ */
+ cpu_data->sleep_length_ns = KTIME_MAX;
/* Check if there is any choice in the first place. */
if (drv->state_count < 2) {
@@ -384,11 +381,8 @@ static int teo_select(struct cpuidle_dri
goto out_tick;
}
- if (!dev->states_usage[0].disable) {
+ if (!dev->states_usage[0].disable)
idx = 0;
- if (drv->states[1].target_residency_ns > duration_ns)
- goto out_tick;
- }
cpu_utilized = teo_cpu_is_utilized(dev->cpu, cpu_data);
/*
@@ -398,7 +392,7 @@ static int teo_select(struct cpuidle_dri
*/
if (drv->state_count < 3 && cpu_utilized) {
/* The CPU is utilized, so assume a short idle duration. */
- duration_ns = teo_middle_of_bin(0, drv);
+ duration_ns = drv->states[1].target_residency_ns / 2;
/*
* If state 0 is enabled and it is not a polling one, select it
* right away unless the scheduler tick has been stopped, in
@@ -408,7 +402,7 @@ static int teo_select(struct cpuidle_dri
* anyway.
*/
if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
- teo_time_ok(duration_ns)) || dev->states_usage[1].disable) {
+ teo_state_ok(0, drv)) || dev->states_usage[1].disable) {
idx = 0;
goto out_tick;
}
@@ -417,13 +411,7 @@ static int teo_select(struct cpuidle_dri
goto end;
}
- /*
- * Find the deepest idle state whose target residency does not exceed
- * the current sleep length and the deepest idle state not deeper than
- * the former whose exit latency does not exceed the current latency
- * constraint. Compute the sums of metrics for early wakeup pattern
- * detection.
- */
+ /* Compute the sums of metrics for early wakeup pattern detection. */
for (i = 1; i < drv->state_count; i++) {
struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
struct cpuidle_state *s = &drv->states[i];
@@ -439,19 +427,15 @@ static int teo_select(struct cpuidle_dri
if (dev->states_usage[i].disable)
continue;
- if (idx < 0) {
- idx = i; /* first enabled state */
- idx0 = i;
- }
-
- if (s->target_residency_ns > duration_ns)
- break;
+ if (idx < 0)
+ idx0 = i; /* first enabled state */
idx = i;
if (s->exit_latency_ns <= latency_req)
constraint_idx = i;
+ /* Save the sums for the current state. */
idx_intercept_sum = intercept_sum;
idx_hit_sum = hit_sum;
idx_recent_sum = recent_sum;
@@ -479,13 +463,11 @@ static int teo_select(struct cpuidle_dri
* all of the deeper states, or the sum of the numbers of recent
* intercepts over all of the states shallower than the candidate one
* is greater than a half of the number of recent events taken into
- * account, the CPU is likely to wake up early, so find an alternative
- * idle state to select.
+ * account, a shallower idle state is likely to be a better choice.
*/
alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
alt_recent = idx_recent_sum > NR_RECENT / 2;
if (alt_recent || alt_intercepts) {
- s64 first_suitable_span_ns = duration_ns;
int first_suitable_idx = idx;
/*
@@ -494,44 +476,39 @@ static int teo_select(struct cpuidle_dri
* cases (both with respect to intercepts overall and with
* respect to the recent intercepts only) in the past.
*
- * Take the possible latency constraint and duration limitation
- * present if the tick has been stopped already into account.
+ * Take the possible duration limitation present if the tick
+ * has been stopped already into account.
*/
intercept_sum = 0;
recent_sum = 0;
for (i = idx - 1; i >= 0; i--) {
struct teo_bin *bin = &cpu_data->state_bins[i];
- s64 span_ns;
intercept_sum += bin->intercepts;
recent_sum += bin->recent;
- span_ns = teo_middle_of_bin(i, drv);
-
if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
(!alt_intercepts ||
2 * intercept_sum > idx_intercept_sum)) {
- if (teo_time_ok(span_ns) &&
- !dev->states_usage[i].disable) {
+ /*
+ * Use the current state unless it is too
+ * shallow or disabled, in which case take the
+ * first enabled state that is deep enough.
+ */
+ if (teo_state_ok(i, drv) &&
+ !dev->states_usage[i].disable)
idx = i;
- duration_ns = span_ns;
- } else {
- /*
- * The current state is too shallow or
- * disabled, so take the first enabled
- * deeper state with suitable time span.
- */
+ else
idx = first_suitable_idx;
- duration_ns = first_suitable_span_ns;
- }
+
break;
}
if (dev->states_usage[i].disable)
continue;
- if (!teo_time_ok(span_ns)) {
+ if (!teo_state_ok(i, drv)) {
/*
* The current state is too shallow, but if an
* alternative candidate state has been found,
@@ -543,7 +520,6 @@ static int teo_select(struct cpuidle_dri
break;
}
- first_suitable_span_ns = span_ns;
first_suitable_idx = i;
}
}
@@ -562,14 +538,22 @@ static int teo_select(struct cpuidle_dri
* not sufficiently large.
*/
if (cpu_utilized) {
- s64 span_ns;
+ i = teo_find_shallower_state(drv, dev, idx, KTIME_MAX, true);
+ if (teo_state_ok(i, drv))
+ idx = i;
+ }
- i = teo_find_shallower_state(drv, dev, idx, duration_ns, true);
- span_ns = teo_middle_of_bin(i, drv);
- if (teo_time_ok(span_ns)) {
+ duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+ cpu_data->sleep_length_ns = duration_ns;
+
+ /*
+ * If the closest expected timer is before the terget residency of the
+ * candidate state, a shallower one needs to be found.
+ */
+ if (drv->states[idx].target_residency_ns > duration_ns) {
+ i = teo_find_shallower_state(drv, dev, idx, duration_ns, false);
+ if (teo_state_ok(i, drv))
idx = i;
- duration_ns = span_ns;
- }
}
end:
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time
2023-08-01 19:35 [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
2023-08-01 19:39 ` [RFC/RFT][PATCH v1 1/2] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront Rafael J. Wysocki
2023-08-01 19:40 ` [RFC/RFT][PATCH v1 2/2] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
@ 2023-08-03 13:18 ` Kajetan Puchalski
2023-08-03 13:54 ` Rafael J. Wysocki
2 siblings, 1 reply; 5+ messages in thread
From: Kajetan Puchalski @ 2023-08-03 13:18 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, Peter Zijlstra, Anna-Maria Behnsen, LKML,
Frederic Weisbecker, kajetan.puchalski
Hi Rafael,
On Tue, Aug 01, 2023 at 09:35:15PM +0200, Rafael J. Wysocki wrote:
> Hi Folks,
>
> This is on top of the fixes series posted previously:
>
> https://lore.kernel.org/linux-pm/4515817.LvFx2qVVIh@kreacher/
>
> (I'll put it all into one git branch tomorrow).
>
> I started to play with the idea described here
>
> https://lore.kernel.org/linux-pm/CAJZ5v0hQh2Pg_uXxj8KBRw3oLS1WdsU+rUafBAAq7dRdbRwYSA@mail.gmail.com/
>
> and this is the result.
>
> Note that this is completely experimental, even though it doesn't kill any of
> the test boxes I've run it on.
>
> Patch [1/2] moves the tick_nohz_get_sleep_length() call in teo_select() after
> a preliminary idle state selection based on statistics and patch [2/2] adds
> checks to avoid it completely if the idle state selected so far is shallow
> enough.
>
> I would appreciate checking if this actually makes any difference.
>
> Thanks!
As mentioned in the other thread I did some testing with these two
patches on top as well, here are the results:
1. Geekbench 6
+---------------------------+---------------+-----------------+-------------------+
| metric | teo | teo_tick | teo_tick_rfc |
+---------------------------+---------------+-----------------+-------------------+
| multicore_score | 3320.9 (0.0%) | 3303.3 (-0.53%) | 3293.6 (-0.82%) |
| score | 1415.7 (0.0%) | 1417.7 (0.14%) | 1423.4 (0.54%) |
| CPU_total_power | 2421.3 (0.0%) | 2429.3 (0.33%) | 2442.2 (0.86%) |
| latency (AsyncTask #1) | 49.41μ (0.0%) | 51.07μ (3.36%) | 50.1μ (1.4%) |
| latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%) |
| latency (surfaceflinger) | 39.46μ (0.0%) | 36.94μ (-6.39%) | 35.79μ (-9.28%) |
+---------------------------+---------------+-----------------+-------------------+
Ie the big picture is all right, the latency either improves with these
patches or the spike in the previous patchset was an anomaly, either way
seems fine. Not sure where the change in the score is coming from but
for the record the line plots of the 3 iterations for both the tick
variants look the same while they're slightly distinct from the pure 'teo'
variant. It's still a below 1% gap so not the end of the world if
there's benefits elsewhere.
+-------------------+---------+------------+--------+
| kernel | cluster | idle_state | time |
+-------------------+---------+------------+--------+
| teo | little | 0.0 | 146.75 |
| teo_tick | little | 0.0 | 63.5 |
| teo_tick_rfc | little | 0.0 | 62.48 |
| teo | little | 1.0 | 53.75 |
| teo_tick | little | 1.0 | 146.78 |
| teo_tick_rfc | little | 1.0 | 147.14 |
+-------------------+---------+------------+--------+
The idle numbers look pretty much the same as the previous variant which
confirms that the change for the little cluster residency is caused by
the previous changes but also that these two patches don't affect it.
2. JetNews
+-----------------+---------------+----------------+-------------------+
| metric | teo | teo_tick | teo_tick_rfc |
+-----------------+---------------+----------------+-------------------+
| fps | 86.2 (0.0%) | 86.4 (0.16%) | 86.0 (-0.28%) |
| janks_pc | 0.8 (0.0%) | 0.8 (-0.66%) | 0.8 (-1.37%) |
| CPU_total_power | 185.2 (0.0%) | 178.2 (-3.76%) | 182.2 (-1.6%) |
+-----------------+---------------+----------------+-------------------+
Pretty much no change here, the power is still better than in base teo.
+-------------------+---------+------------+-------+
| kernel | cluster | idle_state | time |
+-------------------+---------+------------+-------+
| teo | mid | -1.0 | 21.63 |
| teo_tick | mid | -1.0 | 21.57 |
| teo_tick_rfc | mid | -1.0 | 17.66 |
| teo | big | -1.0 | 8.81 |
| teo_tick | big | -1.0 | 8.55 |
| teo_tick_rfc | big | -1.0 | 12.04 |
+-------------------+---------+------------+-------+
This part slightly stands out so could be worth noting. For some reason
the trace registers a few seconds less running time (-1 means 'not
idle') on the mid cores but a few seconds more on the big cores. This
wasn't the case for the 'teo_tick' variant before so looks like it's
caused by these two patches. Doesn't seem to be an issue though, just
interesting.
TLDR:
Does not blow up, looks okay :)
^ permalink raw reply [flat|nested] 5+ messages in thread