* [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic
@ 2024-06-28 9:59 Christian Loehle
2024-06-28 9:59 ` [PATCH 1/3] Revert: "cpuidle: teo: Introduce util-awareness" Christian Loehle
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Christian Loehle @ 2024-06-28 9:59 UTC (permalink / raw)
To: linux-pm, linux-kernel, rafael
Cc: vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann, Christian Loehle
Hi all,
so my investigation into teo lead to the following fixes.
1/3:
As discussed the utilization threshold is too high while
there are benefits in certain workloads, there are quite a few
regressions, too. Revert the Util-awareness patch.
This in itself leads to regressions, but part of it can be offset
by the later patches.
See
https://lore.kernel.org/lkml/CAKfTPtA6ZzRR-zMN7sodOW+N_P+GqwNv4tGR+aMB5VXRT2b5bg@mail.gmail.com/
2/3:
Remove the 'recent' intercept logic, see my findings in:
https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
I haven't found a way to salvage this properly, so I removed it.
The regular intercept seems to decay fast enough to not need this, but
we could change it if that turns out that we need this to be faster in
ramp-up and decaying.
3/3:
The rest of the intercept logic had issues, too.
See the commit.
Happy for anyone to take a look and test as well.
Some numbers for context, comparing:
- IO workload (intercept heavy).
- Timer workload very low utilization (check for deepest state)
- hackbench (high utilization)
- Geekbench 5 on Pixel6 (high utilization)
Tests 1 to 3 are on RK3399 with CONFIG_HZ=100.
target_residencies: 1, 900, 2000
1. IO workload, 5 runs, results sorted, in read IOPS.
fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
teo fixed v2:
/dev/nvme0n1
[4599, 4658, 4692, 4694, 4720]
/dev/mmcblk2
[5700, 5730, 5735, 5747, 5977]
/dev/mmcblk1
[2052, 2054, 2066, 2067, 2073]
teo mainline:
/dev/nvme0n1
[3793, 3825, 3846, 3865, 3964]
/dev/mmcblk2
[3831, 4110, 4154, 4203, 4228]
/dev/mmcblk1
[1559, 1564, 1596, 1611, 1618]
menu:
/dev/nvme0n1
[2571, 2630, 2804, 2813, 2917]
/dev/mmcblk2
[4181, 4260, 5062, 5260, 5329]
/dev/mmcblk1
[1567, 1581, 1585, 1603, 1769]
2. Timer workload (through IO for my convenience 😉 )
Results in read IOPS, fio same as above.
echo "0 2097152 zero" | dmsetup create dm-zeros
echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
(Each IO is delayed by timer of 50ms, should be mostly in state2, for 5s total)
teo fixed v2:
idle_state time
2.0 4.807025
-1.0 0.219766
0.0 0.072007
1.0 0.169570
3199 cpu_idle total
38 cpu_idle_miss
31 cpu_idle_miss above
7 cpu_idle_miss below
teo mainline:
idle_state time
1.0 4.897942
-1.0 0.095375
0.0 0.253581
3221 cpu_idle total
1269 cpu_idle_miss
22 cpu_idle_miss above
1247 cpu_idle_miss below
menu:
idle_state time
2.0 4.295546
-1.0 0.234164
1.0 0.356344
0.0 0.401507
3421 cpu_idle total
129 cpu_idle_miss
52 cpu_idle_miss above
77 cpu_idle_miss below
Residencies:
teo mainline isn't in state2 at all, teo fixed is more in state2 than menu, but
both are in state2 the vast majority of the time as expected.
tldr: overall teo fixed spends more time in state2 while having
fewer idle_miss than menu.
teo mainline was just way too aggressive at selecting shallow states.
3. Hackbench, 5 runs
for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
teo fixed v2:
Time: 4.937
Time: 4.898
Time: 4.871
Time: 4.833
Time: 4.898
teo mainline:
Time: 4.945
Time: 5.021
Time: 4.927
Time: 4.923
Time: 5.137
menu:
Time: 4.964
Time: 4.847
Time: 4.914
Time: 4.841
Time: 4.800
tldr: all comparable, teo mainline slightly worse
4. Geekbench 5 (multi-core) on Pixel6
(Device is cooled for each iteration separately)
teo mainline:
3113, 3068, 3079
mean 3086.66
teo revert util-awareness:
2947, 2941, 2952
mean 2946.66 (-4.54%)
teo fixed v2:
3032, 3066, 3019
mean 3039 (-1.54%)
Changes since v2:
- Reworded commits according to Dietmar's comments
- Dropped the KTIME_MAX as hit part from 3/3 according to Dietmar's
remark.
Changes since v1:
- Removed all non-fixes.
- Do a full revert of Util-awareness instead of increasing thresholds.
- Address Dietmar's comments.
https://lore.kernel.org/linux-kernel/20240606090050.327614-2-christian.loehle@arm.com/T/
Kind Regards,
Christian
Christian Loehle (3):
Revert: "cpuidle: teo: Introduce util-awareness"
cpuidle: teo: Remove recent intercepts metric
cpuidle: teo: Don't count non-existent intercepts
drivers/cpuidle/governors/teo.c | 189 +++++---------------------------
1 file changed, 27 insertions(+), 162 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] Revert: "cpuidle: teo: Introduce util-awareness"
2024-06-28 9:59 [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Christian Loehle
@ 2024-06-28 9:59 ` Christian Loehle
2024-06-28 9:59 ` [PATCH 2/3] cpuidle: teo: Remove recent intercepts metric Christian Loehle
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Christian Loehle @ 2024-06-28 9:59 UTC (permalink / raw)
To: linux-pm, linux-kernel, rafael
Cc: vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann, Christian Loehle
This reverts commit 9ce0f7c4bc64d820b02a1c53f7e8dba9539f942b.
Util-awareness was reported to be too aggressive in selecting shallower
states. Additionally a single threshold was found to not be suitable
for reasoning about sleep length as, for all practical purposes,
almost arbitrary sleep lengths are still possible for any load value.
Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness")
Reported-by: Qais Yousef <qyousef@layalina.io>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
drivers/cpuidle/governors/teo.c | 105 --------------------------------
1 file changed, 105 deletions(-)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 7244f71c59c5..b6a8ba6772e1 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -2,13 +2,8 @@
/*
* Timer events oriented CPU idle governor
*
- * TEO governor:
* Copyright (C) 2018 - 2021 Intel Corporation
* Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- *
- * Util-awareness mechanism:
- * Copyright (C) 2022 Arm Ltd.
- * Author: Kajetan Puchalski <kajetan.puchalski@arm.com>
*/
/**
@@ -104,56 +99,16 @@
* select the given idle state instead of the candidate one.
*
* 3. By default, select the candidate state.
- *
- * Util-awareness mechanism:
- *
- * The idea behind the util-awareness extension is that there are two distinct
- * scenarios for the CPU which should result in two different approaches to idle
- * state selection - utilized and not utilized.
- *
- * In this case, 'utilized' means that the average runqueue util of the CPU is
- * above a certain threshold.
- *
- * When the CPU is utilized while going into idle, more likely than not it will
- * be woken up to do more work soon and so a shallower idle state should be
- * selected to minimise latency and maximise performance. When the CPU is not
- * being utilized, the usual metrics-based approach to selecting the deepest
- * available idle state should be preferred to take advantage of the power
- * saving.
- *
- * In order to achieve this, the governor uses a utilization threshold.
- * The threshold is computed per-CPU as a percentage of the CPU's capacity
- * by bit shifting the capacity value. Based on testing, the shift of 6 (~1.56%)
- * seems to be getting the best results.
- *
- * Before selecting the next idle state, the governor compares the current CPU
- * util to the precomputed util threshold. If it's below, it defaults to the
- * TEO metrics mechanism. If it's above, the closest shallower idle state will
- * be selected instead, as long as is not a polling state.
*/
#include <linux/cpuidle.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
-#include <linux/sched.h>
#include <linux/sched/clock.h>
-#include <linux/sched/topology.h>
#include <linux/tick.h>
#include "gov.h"
-/*
- * The number of bits to shift the CPU's capacity by in order to determine
- * the utilized threshold.
- *
- * 6 was chosen based on testing as the number that achieved the best balance
- * of power and performance on average.
- *
- * The resulting threshold is high enough to not be triggered by background
- * noise and low enough to react quickly when activity starts to ramp up.
- */
-#define UTIL_THRESHOLD_SHIFT 6
-
/*
* The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
* is used for decreasing metrics on a regular basis.
@@ -188,7 +143,6 @@ struct teo_bin {
* @next_recent_idx: Index of the next @recent_idx entry to update.
* @recent_idx: Indices of bins corresponding to recent "intercepts".
* @tick_hits: Number of "hits" after TICK_NSEC.
- * @util_threshold: Threshold above which the CPU is considered utilized
*/
struct teo_cpu {
s64 time_span_ns;
@@ -198,28 +152,10 @@ struct teo_cpu {
int next_recent_idx;
int recent_idx[NR_RECENT];
unsigned int tick_hits;
- unsigned long util_threshold;
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
-/**
- * teo_cpu_is_utilized - Check if the CPU's util is above the threshold
- * @cpu: Target CPU
- * @cpu_data: Governor CPU data for the target CPU
- */
-#ifdef CONFIG_SMP
-static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data)
-{
- return sched_cpu_util(cpu) > cpu_data->util_threshold;
-}
-#else
-static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data)
-{
- return false;
-}
-#endif
-
/**
* teo_update - Update CPU metrics after wakeup.
* @drv: cpuidle driver containing state data.
@@ -386,7 +322,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
int constraint_idx = 0;
int idx0 = 0, idx = -1;
bool alt_intercepts, alt_recent;
- bool cpu_utilized;
s64 duration_ns;
int i;
@@ -411,32 +346,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
if (!dev->states_usage[0].disable)
idx = 0;
- cpu_utilized = teo_cpu_is_utilized(dev->cpu, cpu_data);
- /*
- * If the CPU is being utilized over the threshold and there are only 2
- * states to choose from, the metrics need not be considered, so choose
- * the shallowest non-polling state and exit.
- */
- if (drv->state_count < 3 && cpu_utilized) {
- /*
- * If state 0 is enabled and it is not a polling one, select it
- * right away unless the scheduler tick has been stopped, in
- * which case care needs to be taken to leave the CPU in a deep
- * enough state in case it is not woken up any time soon after
- * all. If state 1 is disabled, though, state 0 must be used
- * anyway.
- */
- if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
- teo_state_ok(0, drv)) || dev->states_usage[1].disable) {
- idx = 0;
- goto out_tick;
- }
- /* Assume that state 1 is not a polling one and use it. */
- idx = 1;
- duration_ns = drv->states[1].target_residency_ns;
- goto end;
- }
-
/* Compute the sums of metrics for early wakeup pattern detection. */
for (i = 1; i < drv->state_count; i++) {
struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
@@ -560,18 +469,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
if (idx > constraint_idx)
idx = constraint_idx;
- /*
- * If the CPU is being utilized over the threshold, choose a shallower
- * non-polling state to improve latency, unless the scheduler tick has
- * been stopped already and the shallower state's target residency is
- * not sufficiently large.
- */
- if (cpu_utilized) {
- i = teo_find_shallower_state(drv, dev, idx, KTIME_MAX, true);
- if (teo_state_ok(i, drv))
- idx = i;
- }
-
/*
* Skip the timers check if state 0 is the current candidate one,
* because an immediate non-timer wakeup is expected in that case.
@@ -667,11 +564,9 @@ static int teo_enable_device(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
- unsigned long max_capacity = arch_scale_cpu_capacity(dev->cpu);
int i;
memset(cpu_data, 0, sizeof(*cpu_data));
- cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT;
for (i = 0; i < NR_RECENT; i++)
cpu_data->recent_idx[i] = -1;
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] cpuidle: teo: Remove recent intercepts metric
2024-06-28 9:59 [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Christian Loehle
2024-06-28 9:59 ` [PATCH 1/3] Revert: "cpuidle: teo: Introduce util-awareness" Christian Loehle
@ 2024-06-28 9:59 ` Christian Loehle
2024-06-28 9:59 ` [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts Christian Loehle
2024-06-28 19:06 ` [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Rafael J. Wysocki
3 siblings, 0 replies; 14+ messages in thread
From: Christian Loehle @ 2024-06-28 9:59 UTC (permalink / raw)
To: linux-pm, linux-kernel, rafael
Cc: vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann, Christian Loehle
The logic for recent intercepts didn't work, there is an underflow
of the 'recent' value that can be observed during boot already, which
teo usually doesn't recover from, making the entire logic pointless.
Furthermore the recent intercepts also were never reset, thus not
actually being very 'recent'.
Having underflowed 'recent' values lead to teo always acting as if
we were in a scenario were expected sleep length based on timers is
too high and it therefore unnecessarily selecting shallower states.
Experiments show that the remaining 'intercept' logic is enough to
quickly react to scenarios in which teo cannot rely on the timer
expected sleep length.
See also here:
https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
drivers/cpuidle/governors/teo.c | 76 ++++++---------------------------
1 file changed, 13 insertions(+), 63 deletions(-)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index b6a8ba6772e1..200a3598cbcf 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -54,16 +54,12 @@
* shallower than the one whose bin is fallen into by the sleep length (these
* situations are referred to as "intercepts" below).
*
- * In addition to the metrics described above, the governor counts recent
- * intercepts (that is, intercepts that have occurred during the last
- * %NR_RECENT invocations of it for the given CPU) for each bin.
- *
* In order to select an idle state for a CPU, the governor takes the following
* steps (modulo the possible latency constraint that must be taken into account
* too):
*
* 1. Find the deepest CPU idle state whose target residency does not exceed
- * the current sleep length (the candidate idle state) and compute 3 sums as
+ * the current sleep length (the candidate idle state) and compute 2 sums as
* follows:
*
* - The sum of the "hits" and "intercepts" metrics for the candidate state
@@ -76,20 +72,15 @@
* idle long enough to avoid being intercepted if the sleep length had been
* equal to the current one).
*
- * - The sum of the numbers of recent intercepts for all of the idle states
- * shallower than the candidate one.
- *
- * 2. If the second sum is greater than the first one or the third sum is
- * greater than %NR_RECENT / 2, the CPU is likely to wake up early, so look
- * for an alternative idle state to select.
+ * 2. If the second sum is greater than the first one the CPU is likely to wake
+ * up early, so look for an alternative idle state to select.
*
* - Traverse the idle states shallower than the candidate one in the
* descending order.
*
- * - For each of them compute the sum of the "intercepts" metrics and the sum
- * of the numbers of recent intercepts over all of the idle states between
- * it and the candidate one (including the former and excluding the
- * latter).
+ * - For each of them compute the sum of the "intercepts" metrics over all
+ * of the idle states between it and the candidate one (including the
+ * former and excluding the latter).
*
* - If each of these sums that needs to be taken into account (because the
* check related to it has indicated that the CPU is likely to wake up
@@ -116,22 +107,14 @@
#define PULSE 1024
#define DECAY_SHIFT 3
-/*
- * Number of the most recent idle duration values to take into consideration for
- * the detection of recent early wakeup patterns.
- */
-#define NR_RECENT 9
-
/**
* struct teo_bin - Metrics used by the TEO cpuidle governor.
* @intercepts: The "intercepts" metric.
* @hits: The "hits" metric.
- * @recent: The number of recent "intercepts".
*/
struct teo_bin {
unsigned int intercepts;
unsigned int hits;
- unsigned int recent;
};
/**
@@ -140,8 +123,6 @@ struct teo_bin {
* @sleep_length_ns: Time till the closest timer event (at the selection time).
* @state_bins: Idle state data bins for this CPU.
* @total: Grand total of the "intercepts" and "hits" metrics for all bins.
- * @next_recent_idx: Index of the next @recent_idx entry to update.
- * @recent_idx: Indices of bins corresponding to recent "intercepts".
* @tick_hits: Number of "hits" after TICK_NSEC.
*/
struct teo_cpu {
@@ -149,8 +130,6 @@ struct teo_cpu {
s64 sleep_length_ns;
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
- int next_recent_idx;
- int recent_idx[NR_RECENT];
unsigned int tick_hits;
};
@@ -222,13 +201,6 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
}
}
- i = cpu_data->next_recent_idx++;
- if (cpu_data->next_recent_idx >= NR_RECENT)
- cpu_data->next_recent_idx = 0;
-
- if (cpu_data->recent_idx[i] >= 0)
- cpu_data->state_bins[cpu_data->recent_idx[i]].recent--;
-
/*
* If the deepest state's target residency is below the tick length,
* make a record of it to help teo_select() decide whether or not
@@ -255,14 +227,10 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration.
*/
- if (idx_timer == idx_duration) {
+ if (idx_timer == idx_duration)
cpu_data->state_bins[idx_timer].hits += PULSE;
- cpu_data->recent_idx[i] = -1;
- } else {
+ else
cpu_data->state_bins[idx_duration].intercepts += PULSE;
- cpu_data->state_bins[idx_duration].recent++;
- cpu_data->recent_idx[i] = idx_duration;
- }
end:
cpu_data->total += PULSE;
@@ -315,13 +283,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
unsigned int tick_intercept_sum = 0;
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
- unsigned int idx_recent_sum = 0;
- unsigned int recent_sum = 0;
unsigned int idx_hit_sum = 0;
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
- bool alt_intercepts, alt_recent;
s64 duration_ns;
int i;
@@ -357,7 +322,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
*/
intercept_sum += prev_bin->intercepts;
hit_sum += prev_bin->hits;
- recent_sum += prev_bin->recent;
if (dev->states_usage[i].disable)
continue;
@@ -373,7 +337,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
/* Save the sums for the current state. */
idx_intercept_sum = intercept_sum;
idx_hit_sum = hit_sum;
- idx_recent_sum = recent_sum;
}
/* Avoid unnecessary overhead. */
@@ -398,37 +361,28 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
- * all of the deeper states, or the sum of the numbers of recent
- * intercepts over all of the states shallower than the candidate one
- * is greater than a half of the number of recent events taken into
- * account, a shallower idle state is likely to be a better choice.
+ * all of the deeper states a shallower idle state is likely to be a
+ * better choice.
*/
- alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
- alt_recent = idx_recent_sum > NR_RECENT / 2;
- if (alt_recent || alt_intercepts) {
+ if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
int first_suitable_idx = idx;
/*
* Look for the deepest idle state whose target residency had
* not exceeded the idle duration in over a half of the relevant
- * cases (both with respect to intercepts overall and with
- * respect to the recent intercepts only) in the past.
+ * cases in the past.
*
* Take the possible duration limitation present if the tick
* has been stopped already into account.
*/
intercept_sum = 0;
- recent_sum = 0;
for (i = idx - 1; i >= 0; i--) {
struct teo_bin *bin = &cpu_data->state_bins[i];
intercept_sum += bin->intercepts;
- recent_sum += bin->recent;
- if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
- (!alt_intercepts ||
- 2 * intercept_sum > idx_intercept_sum)) {
+ if (2 * intercept_sum > idx_intercept_sum) {
/*
* Use the current state unless it is too
* shallow or disabled, in which case take the
@@ -564,13 +518,9 @@ static int teo_enable_device(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
- int i;
memset(cpu_data, 0, sizeof(*cpu_data));
- for (i = 0; i < NR_RECENT; i++)
- cpu_data->recent_idx[i] = -1;
-
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts
2024-06-28 9:59 [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Christian Loehle
2024-06-28 9:59 ` [PATCH 1/3] Revert: "cpuidle: teo: Introduce util-awareness" Christian Loehle
2024-06-28 9:59 ` [PATCH 2/3] cpuidle: teo: Remove recent intercepts metric Christian Loehle
@ 2024-06-28 9:59 ` Christian Loehle
2024-06-28 18:48 ` Rafael J. Wysocki
2024-06-28 19:06 ` [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Rafael J. Wysocki
3 siblings, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2024-06-28 9:59 UTC (permalink / raw)
To: linux-pm, linux-kernel, rafael
Cc: vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann, Christian Loehle
When bailing out early, teo will not query the sleep length anymore
since commit 6da8f9ba5a87 ("cpuidle: teo:
Skip tick_nohz_get_sleep_length() call in some cases") with an
expected sleep_length_ns value of KTIME_MAX.
This lead to state0 accumulating lots of 'intercepts' because
the actually measured sleep length was < KTIME_MAX, so query the sleep
length instead for teo to recognize if it still is in an
intercept-likely scenario without alternating between the two modes.
Fundamentally we can only do one of the two:
1. Skip sleep_length_ns query when we think intercept is likely
2. Have accurate data if sleep_length_ns is actually intercepted when
we believe it is currently intercepted.
Previously teo did the former while this patch chooses the latter as
the additional time it takes to query the sleep length was found to be
negligible and the variants of option 1 (count all unknowns as misses
or count all unknown as hits) had significant regressions (as misses
had lots of too shallow idle state selections and as hits had terrible
performance in intercept-heavy workloads).
Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
v3:
Drop counting KTIME_MAX as hit and reword commit accordingly
drivers/cpuidle/governors/teo.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 200a3598cbcf..c2d73507d23b 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -287,6 +287,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
+ int prev_intercept_idx;
s64 duration_ns;
int i;
@@ -364,6 +365,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* all of the deeper states a shallower idle state is likely to be a
* better choice.
*/
+ prev_intercept_idx = idx;
if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
int first_suitable_idx = idx;
@@ -415,6 +417,14 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
first_suitable_idx = i;
}
}
+ if (!idx && prev_intercept_idx) {
+ /*
+ * We have to query the sleep length here otherwise we don't
+ * know after wakeup if our guess was correct.
+ */
+ duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+ cpu_data->sleep_length_ns = duration_ns;
+ }
/*
* If there is a latency constraint, it may be necessary to select an
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts
2024-06-28 9:59 ` [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts Christian Loehle
@ 2024-06-28 18:48 ` Rafael J. Wysocki
2024-06-29 8:22 ` [PATCHv4 " Christian Loehle
0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2024-06-28 18:48 UTC (permalink / raw)
To: Christian Loehle
Cc: linux-pm, linux-kernel, rafael, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
On Fri, Jun 28, 2024 at 12:02 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> When bailing out early, teo will not query the sleep length anymore
> since commit 6da8f9ba5a87 ("cpuidle: teo:
> Skip tick_nohz_get_sleep_length() call in some cases") with an
> expected sleep_length_ns value of KTIME_MAX.
> This lead to state0 accumulating lots of 'intercepts' because
> the actually measured sleep length was < KTIME_MAX, so query the sleep
> length instead for teo to recognize if it still is in an
> intercept-likely scenario without alternating between the two modes.
>
> Fundamentally we can only do one of the two:
> 1. Skip sleep_length_ns query when we think intercept is likely
> 2. Have accurate data if sleep_length_ns is actually intercepted when
> we believe it is currently intercepted.
>
> Previously teo did the former while this patch chooses the latter as
> the additional time it takes to query the sleep length was found to be
> negligible and the variants of option 1 (count all unknowns as misses
> or count all unknown as hits) had significant regressions (as misses
> had lots of too shallow idle state selections and as hits had terrible
> performance in intercept-heavy workloads).
>
> Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> ---
> v3:
> Drop counting KTIME_MAX as hit and reword commit accordingly
>
> drivers/cpuidle/governors/teo.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index 200a3598cbcf..c2d73507d23b 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -287,6 +287,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> unsigned int hit_sum = 0;
> int constraint_idx = 0;
> int idx0 = 0, idx = -1;
> + int prev_intercept_idx;
> s64 duration_ns;
> int i;
>
> @@ -364,6 +365,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> * all of the deeper states a shallower idle state is likely to be a
> * better choice.
> */
> + prev_intercept_idx = idx;
> if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> int first_suitable_idx = idx;
>
> @@ -415,6 +417,14 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> first_suitable_idx = i;
> }
> }
> + if (!idx && prev_intercept_idx) {
> + /*
> + * We have to query the sleep length here otherwise we don't
> + * know after wakeup if our guess was correct.
> + */
> + duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> + cpu_data->sleep_length_ns = duration_ns;
This is going to select the shallowest state anyway AFAICS, so is it
useful to check constraint_idx in this case?
> + }
>
> /*
> * If there is a latency constraint, it may be necessary to select an
> --
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic
2024-06-28 9:59 [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Christian Loehle
` (2 preceding siblings ...)
2024-06-28 9:59 ` [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts Christian Loehle
@ 2024-06-28 19:06 ` Rafael J. Wysocki
2024-06-29 8:23 ` Christian Loehle
2024-08-05 14:53 ` Christian Loehle
3 siblings, 2 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2024-06-28 19:06 UTC (permalink / raw)
To: Christian Loehle
Cc: linux-pm, linux-kernel, rafael, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
On Fri, Jun 28, 2024 at 12:02 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> Hi all,
> so my investigation into teo lead to the following fixes.
>
> 1/3:
> As discussed the utilization threshold is too high while
> there are benefits in certain workloads, there are quite a few
> regressions, too. Revert the Util-awareness patch.
> This in itself leads to regressions, but part of it can be offset
> by the later patches.
> See
> https://lore.kernel.org/lkml/CAKfTPtA6ZzRR-zMN7sodOW+N_P+GqwNv4tGR+aMB5VXRT2b5bg@mail.gmail.com/
> 2/3:
> Remove the 'recent' intercept logic, see my findings in:
> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
> I haven't found a way to salvage this properly, so I removed it.
> The regular intercept seems to decay fast enough to not need this, but
> we could change it if that turns out that we need this to be faster in
> ramp-up and decaying.
> 3/3:
> The rest of the intercept logic had issues, too.
> See the commit.
>
> Happy for anyone to take a look and test as well.
>
> Some numbers for context, comparing:
> - IO workload (intercept heavy).
> - Timer workload very low utilization (check for deepest state)
> - hackbench (high utilization)
> - Geekbench 5 on Pixel6 (high utilization)
> Tests 1 to 3 are on RK3399 with CONFIG_HZ=100.
> target_residencies: 1, 900, 2000
>
> 1. IO workload, 5 runs, results sorted, in read IOPS.
> fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
>
> teo fixed v2:
> /dev/nvme0n1
> [4599, 4658, 4692, 4694, 4720]
> /dev/mmcblk2
> [5700, 5730, 5735, 5747, 5977]
> /dev/mmcblk1
> [2052, 2054, 2066, 2067, 2073]
>
> teo mainline:
> /dev/nvme0n1
> [3793, 3825, 3846, 3865, 3964]
> /dev/mmcblk2
> [3831, 4110, 4154, 4203, 4228]
> /dev/mmcblk1
> [1559, 1564, 1596, 1611, 1618]
>
> menu:
> /dev/nvme0n1
> [2571, 2630, 2804, 2813, 2917]
> /dev/mmcblk2
> [4181, 4260, 5062, 5260, 5329]
> /dev/mmcblk1
> [1567, 1581, 1585, 1603, 1769]
>
>
> 2. Timer workload (through IO for my convenience 😉 )
> Results in read IOPS, fio same as above.
> echo "0 2097152 zero" | dmsetup create dm-zeros
> echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
> (Each IO is delayed by timer of 50ms, should be mostly in state2, for 5s total)
>
> teo fixed v2:
> idle_state time
> 2.0 4.807025
> -1.0 0.219766
> 0.0 0.072007
> 1.0 0.169570
>
> 3199 cpu_idle total
> 38 cpu_idle_miss
> 31 cpu_idle_miss above
> 7 cpu_idle_miss below
>
> teo mainline:
> idle_state time
> 1.0 4.897942
> -1.0 0.095375
> 0.0 0.253581
>
> 3221 cpu_idle total
> 1269 cpu_idle_miss
> 22 cpu_idle_miss above
> 1247 cpu_idle_miss below
>
> menu:
> idle_state time
> 2.0 4.295546
> -1.0 0.234164
> 1.0 0.356344
> 0.0 0.401507
>
> 3421 cpu_idle total
> 129 cpu_idle_miss
> 52 cpu_idle_miss above
> 77 cpu_idle_miss below
>
> Residencies:
> teo mainline isn't in state2 at all, teo fixed is more in state2 than menu, but
> both are in state2 the vast majority of the time as expected.
>
> tldr: overall teo fixed spends more time in state2 while having
> fewer idle_miss than menu.
> teo mainline was just way too aggressive at selecting shallow states.
>
> 3. Hackbench, 5 runs
> for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
>
> teo fixed v2:
> Time: 4.937
> Time: 4.898
> Time: 4.871
> Time: 4.833
> Time: 4.898
>
> teo mainline:
> Time: 4.945
> Time: 5.021
> Time: 4.927
> Time: 4.923
> Time: 5.137
>
> menu:
> Time: 4.964
> Time: 4.847
> Time: 4.914
> Time: 4.841
> Time: 4.800
>
> tldr: all comparable, teo mainline slightly worse
>
> 4. Geekbench 5 (multi-core) on Pixel6
> (Device is cooled for each iteration separately)
> teo mainline:
> 3113, 3068, 3079
> mean 3086.66
>
> teo revert util-awareness:
> 2947, 2941, 2952
> mean 2946.66 (-4.54%)
>
> teo fixed v2:
> 3032, 3066, 3019
> mean 3039 (-1.54%)
>
>
> Changes since v2:
> - Reworded commits according to Dietmar's comments
> - Dropped the KTIME_MAX as hit part from 3/3 according to Dietmar's
> remark.
>
> Changes since v1:
> - Removed all non-fixes.
> - Do a full revert of Util-awareness instead of increasing thresholds.
> - Address Dietmar's comments.
> https://lore.kernel.org/linux-kernel/20240606090050.327614-2-christian.loehle@arm.com/T/
>
> Kind Regards,
> Christian
>
> Christian Loehle (3):
> Revert: "cpuidle: teo: Introduce util-awareness"
> cpuidle: teo: Remove recent intercepts metric
> cpuidle: teo: Don't count non-existent intercepts
>
> drivers/cpuidle/governors/teo.c | 189 +++++---------------------------
> 1 file changed, 27 insertions(+), 162 deletions(-)
>
> --
Patches [1-2/3] have been applied as 6.11 material.
Patch [3/3] looks like it may be improved slightly, see my reply to that patch.
Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCHv4 3/3] cpuidle: teo: Don't count non-existent intercepts
2024-06-28 18:48 ` Rafael J. Wysocki
@ 2024-06-29 8:22 ` Christian Loehle
2024-07-01 16:59 ` Rafael J. Wysocki
0 siblings, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2024-06-29 8:22 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-pm, linux-kernel, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
When bailing out early, teo will not query the sleep length anymore
since commit 6da8f9ba5a87 ("cpuidle: teo:
Skip tick_nohz_get_sleep_length() call in some cases") with an
expected sleep_length_ns value of KTIME_MAX.
This lead to state0 accumulating lots of 'intercepts' because
the actually measured sleep length was < KTIME_MAX, so query the sleep
length instead for teo to recognize if it still is in an
intercept-likely scenario without alternating between the two modes.
Fundamentally we can only do one of the two:
1. Skip sleep_length_ns query when we think intercept is likely
2. Have accurate data if sleep_length_ns is actually intercepted when
we believe it is currently intercepted.
Previously teo did the former while this patch chooses the latter as
the additional time it takes to query the sleep length was found to be
negligible and the variants of option 1 (count all unknowns as misses
or count all unknown as hits) had significant regressions (as misses
had lots of too shallow idle state selections and as hits had terrible
performance in intercept-heavy workloads).
Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
v4: Skip constraint check if intercept logic selects state0
drivers/cpuidle/governors/teo.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 200a3598cbcf..6dc44197a80e 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -287,6 +287,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
+ int prev_intercept_idx;
s64 duration_ns;
int i;
@@ -364,6 +365,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* all of the deeper states a shallower idle state is likely to be a
* better choice.
*/
+ prev_intercept_idx = idx;
if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
int first_suitable_idx = idx;
@@ -415,6 +417,15 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
first_suitable_idx = i;
}
}
+ if (!idx && prev_intercept_idx) {
+ /*
+ * We have to query the sleep length here otherwise we don't
+ * know after wakeup if our guess was correct.
+ */
+ duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+ cpu_data->sleep_length_ns = duration_ns;
+ goto out_tick;
+ }
/*
* If there is a latency constraint, it may be necessary to select an
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic
2024-06-28 19:06 ` [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Rafael J. Wysocki
@ 2024-06-29 8:23 ` Christian Loehle
2024-08-05 14:53 ` Christian Loehle
1 sibling, 0 replies; 14+ messages in thread
From: Christian Loehle @ 2024-06-29 8:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-pm, linux-kernel, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
On 6/28/24 20:06, Rafael J. Wysocki wrote:
> On Fri, Jun 28, 2024 at 12:02 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> Hi all,
>> so my investigation into teo lead to the following fixes.
>>
>> 1/3:
>> As discussed the utilization threshold is too high while
>> there are benefits in certain workloads, there are quite a few
>> regressions, too. Revert the Util-awareness patch.
>> This in itself leads to regressions, but part of it can be offset
>> by the later patches.
>> See
>> https://lore.kernel.org/lkml/CAKfTPtA6ZzRR-zMN7sodOW+N_P+GqwNv4tGR+aMB5VXRT2b5bg@mail.gmail.com/
>> 2/3:
>> Remove the 'recent' intercept logic, see my findings in:
>> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>> I haven't found a way to salvage this properly, so I removed it.
>> The regular intercept seems to decay fast enough to not need this, but
>> we could change it if that turns out that we need this to be faster in
>> ramp-up and decaying.
>> 3/3:
>> The rest of the intercept logic had issues, too.
>> See the commit.
>>
>> Happy for anyone to take a look and test as well.
>>
>> Some numbers for context, comparing:
>> - IO workload (intercept heavy).
>> - Timer workload very low utilization (check for deepest state)
>> - hackbench (high utilization)
>> - Geekbench 5 on Pixel6 (high utilization)
>> Tests 1 to 3 are on RK3399 with CONFIG_HZ=100.
>> target_residencies: 1, 900, 2000
>>
>> 1. IO workload, 5 runs, results sorted, in read IOPS.
>> fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
>>
>> teo fixed v2:
>> /dev/nvme0n1
>> [4599, 4658, 4692, 4694, 4720]
>> /dev/mmcblk2
>> [5700, 5730, 5735, 5747, 5977]
>> /dev/mmcblk1
>> [2052, 2054, 2066, 2067, 2073]
>>
>> teo mainline:
>> /dev/nvme0n1
>> [3793, 3825, 3846, 3865, 3964]
>> /dev/mmcblk2
>> [3831, 4110, 4154, 4203, 4228]
>> /dev/mmcblk1
>> [1559, 1564, 1596, 1611, 1618]
>>
>> menu:
>> /dev/nvme0n1
>> [2571, 2630, 2804, 2813, 2917]
>> /dev/mmcblk2
>> [4181, 4260, 5062, 5260, 5329]
>> /dev/mmcblk1
>> [1567, 1581, 1585, 1603, 1769]
>>
>>
>> 2. Timer workload (through IO for my convenience 😉 )
>> Results in read IOPS, fio same as above.
>> echo "0 2097152 zero" | dmsetup create dm-zeros
>> echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
>> (Each IO is delayed by timer of 50ms, should be mostly in state2, for 5s total)
>>
>> teo fixed v2:
>> idle_state time
>> 2.0 4.807025
>> -1.0 0.219766
>> 0.0 0.072007
>> 1.0 0.169570
>>
>> 3199 cpu_idle total
>> 38 cpu_idle_miss
>> 31 cpu_idle_miss above
>> 7 cpu_idle_miss below
>>
>> teo mainline:
>> idle_state time
>> 1.0 4.897942
>> -1.0 0.095375
>> 0.0 0.253581
>>
>> 3221 cpu_idle total
>> 1269 cpu_idle_miss
>> 22 cpu_idle_miss above
>> 1247 cpu_idle_miss below
>>
>> menu:
>> idle_state time
>> 2.0 4.295546
>> -1.0 0.234164
>> 1.0 0.356344
>> 0.0 0.401507
>>
>> 3421 cpu_idle total
>> 129 cpu_idle_miss
>> 52 cpu_idle_miss above
>> 77 cpu_idle_miss below
>>
>> Residencies:
>> teo mainline isn't in state2 at all, teo fixed is more in state2 than menu, but
>> both are in state2 the vast majority of the time as expected.
>>
>> tldr: overall teo fixed spends more time in state2 while having
>> fewer idle_miss than menu.
>> teo mainline was just way too aggressive at selecting shallow states.
>>
>> 3. Hackbench, 5 runs
>> for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
>>
>> teo fixed v2:
>> Time: 4.937
>> Time: 4.898
>> Time: 4.871
>> Time: 4.833
>> Time: 4.898
>>
>> teo mainline:
>> Time: 4.945
>> Time: 5.021
>> Time: 4.927
>> Time: 4.923
>> Time: 5.137
>>
>> menu:
>> Time: 4.964
>> Time: 4.847
>> Time: 4.914
>> Time: 4.841
>> Time: 4.800
>>
>> tldr: all comparable, teo mainline slightly worse
>>
>> 4. Geekbench 5 (multi-core) on Pixel6
>> (Device is cooled for each iteration separately)
>> teo mainline:
>> 3113, 3068, 3079
>> mean 3086.66
>>
>> teo revert util-awareness:
>> 2947, 2941, 2952
>> mean 2946.66 (-4.54%)
>>
>> teo fixed v2:
>> 3032, 3066, 3019
>> mean 3039 (-1.54%)
>>
>>
>> Changes since v2:
>> - Reworded commits according to Dietmar's comments
>> - Dropped the KTIME_MAX as hit part from 3/3 according to Dietmar's
>> remark.
>>
>> Changes since v1:
>> - Removed all non-fixes.
>> - Do a full revert of Util-awareness instead of increasing thresholds.
>> - Address Dietmar's comments.
>> https://lore.kernel.org/linux-kernel/20240606090050.327614-2-christian.loehle@arm.com/T/
>>
>> Kind Regards,
>> Christian
>>
>> Christian Loehle (3):
>> Revert: "cpuidle: teo: Introduce util-awareness"
>> cpuidle: teo: Remove recent intercepts metric
>> cpuidle: teo: Don't count non-existent intercepts
>>
>> drivers/cpuidle/governors/teo.c | 189 +++++---------------------------
>> 1 file changed, 27 insertions(+), 162 deletions(-)
>>
>> --
>
> Patches [1-2/3] have been applied as 6.11 material.
Thank you!
>
> Patch [3/3] looks like it may be improved slightly, see my reply to that patch.
>
> Thanks!
Good catch, sent v4 for 3/3.
Kind regards,
Christian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHv4 3/3] cpuidle: teo: Don't count non-existent intercepts
2024-06-29 8:22 ` [PATCHv4 " Christian Loehle
@ 2024-07-01 16:59 ` Rafael J. Wysocki
0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2024-07-01 16:59 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, linux-pm, linux-kernel, vincent.guittot,
qyousef, peterz, daniel.lezcano, ulf.hansson, anna-maria,
dsmythies, kajetan.puchalski, lukasz.luba, dietmar.eggemann
On Sat, Jun 29, 2024 at 10:22 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> When bailing out early, teo will not query the sleep length anymore
> since commit 6da8f9ba5a87 ("cpuidle: teo:
> Skip tick_nohz_get_sleep_length() call in some cases") with an
> expected sleep_length_ns value of KTIME_MAX.
> This lead to state0 accumulating lots of 'intercepts' because
> the actually measured sleep length was < KTIME_MAX, so query the sleep
> length instead for teo to recognize if it still is in an
> intercept-likely scenario without alternating between the two modes.
>
> Fundamentally we can only do one of the two:
> 1. Skip sleep_length_ns query when we think intercept is likely
> 2. Have accurate data if sleep_length_ns is actually intercepted when
> we believe it is currently intercepted.
>
> Previously teo did the former while this patch chooses the latter as
> the additional time it takes to query the sleep length was found to be
> negligible and the variants of option 1 (count all unknowns as misses
> or count all unknown as hits) had significant regressions (as misses
> had lots of too shallow idle state selections and as hits had terrible
> performance in intercept-heavy workloads).
>
> Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> ---
> v4: Skip constraint check if intercept logic selects state0
>
> drivers/cpuidle/governors/teo.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index 200a3598cbcf..6dc44197a80e 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -287,6 +287,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> unsigned int hit_sum = 0;
> int constraint_idx = 0;
> int idx0 = 0, idx = -1;
> + int prev_intercept_idx;
> s64 duration_ns;
> int i;
>
> @@ -364,6 +365,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> * all of the deeper states a shallower idle state is likely to be a
> * better choice.
> */
> + prev_intercept_idx = idx;
> if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> int first_suitable_idx = idx;
>
> @@ -415,6 +417,15 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> first_suitable_idx = i;
> }
> }
> + if (!idx && prev_intercept_idx) {
> + /*
> + * We have to query the sleep length here otherwise we don't
> + * know after wakeup if our guess was correct.
> + */
> + duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> + cpu_data->sleep_length_ns = duration_ns;
> + goto out_tick;
> + }
>
> /*
> * If there is a latency constraint, it may be necessary to select an
> --
Applied as 6.11 material, thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic
2024-06-28 19:06 ` [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Rafael J. Wysocki
2024-06-29 8:23 ` Christian Loehle
@ 2024-08-05 14:53 ` Christian Loehle
2024-08-05 14:58 ` [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric Christian Loehle
1 sibling, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2024-08-05 14:53 UTC (permalink / raw)
To: Rafael J. Wysocki, stable
Cc: linux-pm, linux-kernel, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
On 6/28/24 20:06, Rafael J. Wysocki wrote:
> On Fri, Jun 28, 2024 at 12:02 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> Hi all,
>> so my investigation into teo lead to the following fixes.
>>
>> 1/3:
>> As discussed the utilization threshold is too high while
>> there are benefits in certain workloads, there are quite a few
>> regressions, too. Revert the Util-awareness patch.
>> This in itself leads to regressions, but part of it can be offset
>> by the later patches.
>> See
>> https://lore.kernel.org/lkml/CAKfTPtA6ZzRR-zMN7sodOW+N_P+GqwNv4tGR+aMB5VXRT2b5bg@mail.gmail.com/
>> 2/3:
>> Remove the 'recent' intercept logic, see my findings in:
>> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>> I haven't found a way to salvage this properly, so I removed it.
>> The regular intercept seems to decay fast enough to not need this, but
>> we could change it if that turns out that we need this to be faster in
>> ramp-up and decaying.
>> 3/3:
>> The rest of the intercept logic had issues, too.
>> See the commit.
>>
>> Happy for anyone to take a look and test as well.
>>
>> Some numbers for context, comparing:
>> - IO workload (intercept heavy).
>> - Timer workload very low utilization (check for deepest state)
>> - hackbench (high utilization)
>> - Geekbench 5 on Pixel6 (high utilization)
>> Tests 1 to 3 are on RK3399 with CONFIG_HZ=100.
>> target_residencies: 1, 900, 2000
>>
>> 1. IO workload, 5 runs, results sorted, in read IOPS.
>> fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
>>
>> teo fixed v2:
>> /dev/nvme0n1
>> [4599, 4658, 4692, 4694, 4720]
>> /dev/mmcblk2
>> [5700, 5730, 5735, 5747, 5977]
>> /dev/mmcblk1
>> [2052, 2054, 2066, 2067, 2073]
>>
>> teo mainline:
>> /dev/nvme0n1
>> [3793, 3825, 3846, 3865, 3964]
>> /dev/mmcblk2
>> [3831, 4110, 4154, 4203, 4228]
>> /dev/mmcblk1
>> [1559, 1564, 1596, 1611, 1618]
>>
>> menu:
>> /dev/nvme0n1
>> [2571, 2630, 2804, 2813, 2917]
>> /dev/mmcblk2
>> [4181, 4260, 5062, 5260, 5329]
>> /dev/mmcblk1
>> [1567, 1581, 1585, 1603, 1769]
>>
>>
>> 2. Timer workload (through IO for my convenience 😉 )
>> Results in read IOPS, fio same as above.
>> echo "0 2097152 zero" | dmsetup create dm-zeros
>> echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
>> (Each IO is delayed by timer of 50ms, should be mostly in state2, for 5s total)
>>
>> teo fixed v2:
>> idle_state time
>> 2.0 4.807025
>> -1.0 0.219766
>> 0.0 0.072007
>> 1.0 0.169570
>>
>> 3199 cpu_idle total
>> 38 cpu_idle_miss
>> 31 cpu_idle_miss above
>> 7 cpu_idle_miss below
>>
>> teo mainline:
>> idle_state time
>> 1.0 4.897942
>> -1.0 0.095375
>> 0.0 0.253581
>>
>> 3221 cpu_idle total
>> 1269 cpu_idle_miss
>> 22 cpu_idle_miss above
>> 1247 cpu_idle_miss below
>>
>> menu:
>> idle_state time
>> 2.0 4.295546
>> -1.0 0.234164
>> 1.0 0.356344
>> 0.0 0.401507
>>
>> 3421 cpu_idle total
>> 129 cpu_idle_miss
>> 52 cpu_idle_miss above
>> 77 cpu_idle_miss below
>>
>> Residencies:
>> teo mainline isn't in state2 at all, teo fixed is more in state2 than menu, but
>> both are in state2 the vast majority of the time as expected.
>>
>> tldr: overall teo fixed spends more time in state2 while having
>> fewer idle_miss than menu.
>> teo mainline was just way too aggressive at selecting shallow states.
>>
>> 3. Hackbench, 5 runs
>> for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
>>
>> teo fixed v2:
>> Time: 4.937
>> Time: 4.898
>> Time: 4.871
>> Time: 4.833
>> Time: 4.898
>>
>> teo mainline:
>> Time: 4.945
>> Time: 5.021
>> Time: 4.927
>> Time: 4.923
>> Time: 5.137
>>
>> menu:
>> Time: 4.964
>> Time: 4.847
>> Time: 4.914
>> Time: 4.841
>> Time: 4.800
>>
>> tldr: all comparable, teo mainline slightly worse
>>
>> 4. Geekbench 5 (multi-core) on Pixel6
>> (Device is cooled for each iteration separately)
>> teo mainline:
>> 3113, 3068, 3079
>> mean 3086.66
>>
>> teo revert util-awareness:
>> 2947, 2941, 2952
>> mean 2946.66 (-4.54%)
>>
>> teo fixed v2:
>> 3032, 3066, 3019
>> mean 3039 (-1.54%)
>>
>>
>> Changes since v2:
>> - Reworded commits according to Dietmar's comments
>> - Dropped the KTIME_MAX as hit part from 3/3 according to Dietmar's
>> remark.
>>
>> Changes since v1:
>> - Removed all non-fixes.
>> - Do a full revert of Util-awareness instead of increasing thresholds.
>> - Address Dietmar's comments.
>> https://lore.kernel.org/linux-kernel/20240606090050.327614-2-christian.loehle@arm.com/T/
>>
>> Kind Regards,
>> Christian
>>
>> Christian Loehle (3):
>> Revert: "cpuidle: teo: Introduce util-awareness"
>> cpuidle: teo: Remove recent intercepts metric
>> cpuidle: teo: Don't count non-existent intercepts
>>
>> drivers/cpuidle/governors/teo.c | 189 +++++---------------------------
>> 1 file changed, 27 insertions(+), 162 deletions(-)
>>
>> --
>
> Patches [1-2/3] have been applied as 6.11 material.
>
> Patch [3/3] looks like it may be improved slightly, see my reply to that patch.
>
> Thanks!
Hi Rafael,
are you fine with this being backported to stable?
@stable
4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
449914398083 cpuidle: teo: Remove recent intercepts metric
0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
apply as-is to
linux-6.10.y
linux-6.6.y
for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
is relevant, I'll reply with a backport.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric
2024-08-05 14:53 ` Christian Loehle
@ 2024-08-05 14:58 ` Christian Loehle
2024-08-12 12:42 ` Greg KH
0 siblings, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2024-08-05 14:58 UTC (permalink / raw)
To: Rafael J. Wysocki, stable
Cc: linux-pm, linux-kernel, vincent.guittot, qyousef, peterz,
daniel.lezcano, ulf.hansson, anna-maria, dsmythies,
kajetan.puchalski, lukasz.luba, dietmar.eggemann
commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
The logic for recent intercepts didn't work, there is an underflow
of the 'recent' value that can be observed during boot already, which
teo usually doesn't recover from, making the entire logic pointless.
Furthermore the recent intercepts also were never reset, thus not
actually being very 'recent'.
Having underflowed 'recent' values lead to teo always acting as if
we were in a scenario were expected sleep length based on timers is
too high and it therefore unnecessarily selecting shallower states.
Experiments show that the remaining 'intercept' logic is enough to
quickly react to scenarios in which teo cannot rely on the timer
expected sleep length.
See also here:
https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
1 file changed, 14 insertions(+), 65 deletions(-)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index d9262db79cae..65475526bb11 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -54,16 +54,12 @@
* shallower than the one whose bin is fallen into by the sleep length (these
* situations are referred to as "intercepts" below).
*
- * In addition to the metrics described above, the governor counts recent
- * intercepts (that is, intercepts that have occurred during the last
- * %NR_RECENT invocations of it for the given CPU) for each bin.
- *
* In order to select an idle state for a CPU, the governor takes the following
* steps (modulo the possible latency constraint that must be taken into account
* too):
*
* 1. Find the deepest CPU idle state whose target residency does not exceed
- * the current sleep length (the candidate idle state) and compute 3 sums as
+ * the current sleep length (the candidate idle state) and compute 2 sums as
* follows:
*
* - The sum of the "hits" and "intercepts" metrics for the candidate state
@@ -76,20 +72,15 @@
* idle long enough to avoid being intercepted if the sleep length had been
* equal to the current one).
*
- * - The sum of the numbers of recent intercepts for all of the idle states
- * shallower than the candidate one.
- *
- * 2. If the second sum is greater than the first one or the third sum is
- * greater than %NR_RECENT / 2, the CPU is likely to wake up early, so look
- * for an alternative idle state to select.
+ * 2. If the second sum is greater than the first one the CPU is likely to wake
+ * up early, so look for an alternative idle state to select.
*
* - Traverse the idle states shallower than the candidate one in the
* descending order.
*
- * - For each of them compute the sum of the "intercepts" metrics and the sum
- * of the numbers of recent intercepts over all of the idle states between
- * it and the candidate one (including the former and excluding the
- * latter).
+ * - For each of them compute the sum of the "intercepts" metrics over all
+ * of the idle states between it and the candidate one (including the
+ * former and excluding the latter).
*
* - If each of these sums that needs to be taken into account (because the
* check related to it has indicated that the CPU is likely to wake up
@@ -114,22 +105,14 @@
#define PULSE 1024
#define DECAY_SHIFT 3
-/*
- * Number of the most recent idle duration values to take into consideration for
- * the detection of recent early wakeup patterns.
- */
-#define NR_RECENT 9
-
/**
* struct teo_bin - Metrics used by the TEO cpuidle governor.
* @intercepts: The "intercepts" metric.
* @hits: The "hits" metric.
- * @recent: The number of recent "intercepts".
*/
struct teo_bin {
unsigned int intercepts;
unsigned int hits;
- unsigned int recent;
};
/**
@@ -137,17 +120,13 @@ struct teo_bin {
* @time_span_ns: Time between idle state selection and post-wakeup update.
* @sleep_length_ns: Time till the closest timer event (at the selection time).
* @state_bins: Idle state data bins for this CPU.
- * @total: Grand total of the "intercepts" and "hits" mertics for all bins.
- * @next_recent_idx: Index of the next @recent_idx entry to update.
- * @recent_idx: Indices of bins corresponding to recent "intercepts".
+ * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
*/
struct teo_cpu {
s64 time_span_ns;
s64 sleep_length_ns;
struct teo_bin state_bins[CPUIDLE_STATE_MAX];
unsigned int total;
- int next_recent_idx;
- int recent_idx[NR_RECENT];
};
static DEFINE_PER_CPU(struct teo_cpu, teo_cpus);
@@ -216,27 +195,16 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
}
}
- i = cpu_data->next_recent_idx++;
- if (cpu_data->next_recent_idx >= NR_RECENT)
- cpu_data->next_recent_idx = 0;
-
- if (cpu_data->recent_idx[i] >= 0)
- cpu_data->state_bins[cpu_data->recent_idx[i]].recent--;
-
/*
* If the measured idle duration falls into the same bin as the sleep
* length, this is a "hit", so update the "hits" metric for that bin.
* Otherwise, update the "intercepts" metric for the bin fallen into by
* the measured idle duration.
*/
- if (idx_timer == idx_duration) {
+ if (idx_timer == idx_duration)
cpu_data->state_bins[idx_timer].hits += PULSE;
- cpu_data->recent_idx[i] = -1;
- } else {
+ else
cpu_data->state_bins[idx_duration].intercepts += PULSE;
- cpu_data->state_bins[idx_duration].recent++;
- cpu_data->recent_idx[i] = idx_duration;
- }
cpu_data->total += PULSE;
}
@@ -289,13 +257,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
unsigned int idx_intercept_sum = 0;
unsigned int intercept_sum = 0;
- unsigned int idx_recent_sum = 0;
- unsigned int recent_sum = 0;
unsigned int idx_hit_sum = 0;
unsigned int hit_sum = 0;
int constraint_idx = 0;
int idx0 = 0, idx = -1;
- bool alt_intercepts, alt_recent;
ktime_t delta_tick;
s64 duration_ns;
int i;
@@ -338,7 +303,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
*/
intercept_sum += prev_bin->intercepts;
hit_sum += prev_bin->hits;
- recent_sum += prev_bin->recent;
if (dev->states_usage[i].disable)
continue;
@@ -358,7 +322,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
idx_intercept_sum = intercept_sum;
idx_hit_sum = hit_sum;
- idx_recent_sum = recent_sum;
}
/* Avoid unnecessary overhead. */
@@ -373,42 +336,32 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one (idx) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
- * all of the deeper states, or the sum of the numbers of recent
- * intercepts over all of the states shallower than the candidate one
- * is greater than a half of the number of recent events taken into
- * account, the CPU is likely to wake up early, so find an alternative
- * idle state to select.
+ * all of the deeper states a shallower idle state is likely to be a
+ * better choice.
*/
- alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
- alt_recent = idx_recent_sum > NR_RECENT / 2;
- if (alt_recent || alt_intercepts) {
+ if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
s64 first_suitable_span_ns = duration_ns;
int first_suitable_idx = idx;
/*
* Look for the deepest idle state whose target residency had
* not exceeded the idle duration in over a half of the relevant
- * cases (both with respect to intercepts overall and with
- * respect to the recent intercepts only) in the past.
+ * cases in the past.
*
* Take the possible latency constraint and duration limitation
* present if the tick has been stopped already into account.
*/
intercept_sum = 0;
- recent_sum = 0;
for (i = idx - 1; i >= 0; i--) {
struct teo_bin *bin = &cpu_data->state_bins[i];
s64 span_ns;
intercept_sum += bin->intercepts;
- recent_sum += bin->recent;
span_ns = teo_middle_of_bin(i, drv);
- if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
- (!alt_intercepts ||
- 2 * intercept_sum > idx_intercept_sum)) {
+ if (2 * intercept_sum > idx_intercept_sum) {
if (teo_time_ok(span_ns) &&
!dev->states_usage[i].disable) {
idx = i;
@@ -508,13 +461,9 @@ static int teo_enable_device(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
- int i;
memset(cpu_data, 0, sizeof(*cpu_data));
- for (i = 0; i < NR_RECENT; i++)
- cpu_data->recent_idx[i] = -1;
-
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric
2024-08-05 14:58 ` [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric Christian Loehle
@ 2024-08-12 12:42 ` Greg KH
2024-08-13 13:18 ` Christian Loehle
0 siblings, 1 reply; 14+ messages in thread
From: Greg KH @ 2024-08-12 12:42 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, stable, linux-pm, linux-kernel,
vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann
On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
>
> The logic for recent intercepts didn't work, there is an underflow
> of the 'recent' value that can be observed during boot already, which
> teo usually doesn't recover from, making the entire logic pointless.
> Furthermore the recent intercepts also were never reset, thus not
> actually being very 'recent'.
>
> Having underflowed 'recent' values lead to teo always acting as if
> we were in a scenario were expected sleep length based on timers is
> too high and it therefore unnecessarily selecting shallower states.
>
> Experiments show that the remaining 'intercept' logic is enough to
> quickly react to scenarios in which teo cannot rely on the timer
> expected sleep length.
>
> See also here:
> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>
> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
> 1 file changed, 14 insertions(+), 65 deletions(-)
We can't just take a 6.1.y backport without newer kernels also having
this fix. Can you resend this as backports for all relevant kernels
please?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric
2024-08-12 12:42 ` Greg KH
@ 2024-08-13 13:18 ` Christian Loehle
2024-08-14 11:40 ` Greg KH
0 siblings, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2024-08-13 13:18 UTC (permalink / raw)
To: Greg KH
Cc: Rafael J. Wysocki, stable, linux-pm, linux-kernel,
vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann
On 8/12/24 13:42, Greg KH wrote:
> On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
>> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
>>
>> The logic for recent intercepts didn't work, there is an underflow
>> of the 'recent' value that can be observed during boot already, which
>> teo usually doesn't recover from, making the entire logic pointless.
>> Furthermore the recent intercepts also were never reset, thus not
>> actually being very 'recent'.
>>
>> Having underflowed 'recent' values lead to teo always acting as if
>> we were in a scenario were expected sleep length based on timers is
>> too high and it therefore unnecessarily selecting shallower states.
>>
>> Experiments show that the remaining 'intercept' logic is enough to
>> quickly react to scenarios in which teo cannot rely on the timer
>> expected sleep length.
>>
>> See also here:
>> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
>>
>> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
>> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
>> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>> drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
>> 1 file changed, 14 insertions(+), 65 deletions(-)
>
> We can't just take a 6.1.y backport without newer kernels also having
> this fix. Can you resend this as backports for all relevant kernels
> please?
Hi Greg,
the email thread might've looked a bit strange to you but as I wrote
in a previous reply:
https://lore.kernel.org/linux-pm/20240628095955.34096-1-christian.loehle@arm.com/T/#ma5bcd00c4b0ffa1fc34e8d7fa237b8de4ee8a25c
@stable
4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
449914398083 cpuidle: teo: Remove recent intercepts metric
0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
apply as-is to
linux-6.10.y
linux-6.6.y
for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
is relevant, I'll reply with a backport.
Ideally I would wait for an Ack from Rafael though.
Hopefully that makes more sense then.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric
2024-08-13 13:18 ` Christian Loehle
@ 2024-08-14 11:40 ` Greg KH
0 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2024-08-14 11:40 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, stable, linux-pm, linux-kernel,
vincent.guittot, qyousef, peterz, daniel.lezcano, ulf.hansson,
anna-maria, dsmythies, kajetan.puchalski, lukasz.luba,
dietmar.eggemann
On Tue, Aug 13, 2024 at 02:18:53PM +0100, Christian Loehle wrote:
> On 8/12/24 13:42, Greg KH wrote:
> > On Mon, Aug 05, 2024 at 03:58:09PM +0100, Christian Loehle wrote:
> >> commit 449914398083148f93d070a8aace04f9ec296ce3 upstream.
> >>
> >> The logic for recent intercepts didn't work, there is an underflow
> >> of the 'recent' value that can be observed during boot already, which
> >> teo usually doesn't recover from, making the entire logic pointless.
> >> Furthermore the recent intercepts also were never reset, thus not
> >> actually being very 'recent'.
> >>
> >> Having underflowed 'recent' values lead to teo always acting as if
> >> we were in a scenario were expected sleep length based on timers is
> >> too high and it therefore unnecessarily selecting shallower states.
> >>
> >> Experiments show that the remaining 'intercept' logic is enough to
> >> quickly react to scenarios in which teo cannot rely on the timer
> >> expected sleep length.
> >>
> >> See also here:
> >> https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/
> >>
> >> Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment")
> >> Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com
> >> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >> drivers/cpuidle/governors/teo.c | 79 ++++++---------------------------
> >> 1 file changed, 14 insertions(+), 65 deletions(-)
> >
> > We can't just take a 6.1.y backport without newer kernels also having
> > this fix. Can you resend this as backports for all relevant kernels
> > please?
>
> Hi Greg,
> the email thread might've looked a bit strange to you but as I wrote
> in a previous reply:
> https://lore.kernel.org/linux-pm/20240628095955.34096-1-christian.loehle@arm.com/T/#ma5bcd00c4b0ffa1fc34e8d7fa237b8de4ee8a25c
> @stable
> 4b20b07ce72f cpuidle: teo: Don't count non-existent intercepts
> 449914398083 cpuidle: teo: Remove recent intercepts metric
> 0a2998fa48f0 Revert: "cpuidle: teo: Introduce util-awareness"
> apply as-is to
> linux-6.10.y
> linux-6.6.y
> for linux-6.1.y only 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
> is relevant, I'll reply with a backport.
Please send all of these as a patch series for the relevent branches so
we know exactly what is going on...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-08-14 11:40 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 9:59 [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Christian Loehle
2024-06-28 9:59 ` [PATCH 1/3] Revert: "cpuidle: teo: Introduce util-awareness" Christian Loehle
2024-06-28 9:59 ` [PATCH 2/3] cpuidle: teo: Remove recent intercepts metric Christian Loehle
2024-06-28 9:59 ` [PATCH 3/3] cpuidle: teo: Don't count non-existent intercepts Christian Loehle
2024-06-28 18:48 ` Rafael J. Wysocki
2024-06-29 8:22 ` [PATCHv4 " Christian Loehle
2024-07-01 16:59 ` Rafael J. Wysocki
2024-06-28 19:06 ` [PATCHv3 0/3] cpuidle: teo: Fixing utilization and intercept logic Rafael J. Wysocki
2024-06-29 8:23 ` Christian Loehle
2024-08-05 14:53 ` Christian Loehle
2024-08-05 14:58 ` [PATCH 6.1.y] cpuidle: teo: Remove recent intercepts metric Christian Loehle
2024-08-12 12:42 ` Greg KH
2024-08-13 13:18 ` Christian Loehle
2024-08-14 11:40 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).