[PATCH v2 0/2] Simplify Util

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Simplify Util_est
@ 2023-12-01 16:16 Vincent Guittot
  2023-12-01 16:16 ` [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) Vincent Guittot
  2023-12-01 16:16 ` [PATCH v2 2/2] sched/fair: Simplify util_est Vincent Guittot
  0 siblings, 2 replies; 8+ messages in thread
From: Vincent Guittot @ 2023-12-01 16:16 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng, qyousef,
	linux-kernel, linux-doc
  Cc: lukasz.luba, hongyan.xia2, yizhou.tang, Vincent Guittot

Following comment in [1], I prepared a patch to remove UTIL_EST_FASTUP.
This enables us to simplify util_est behavior as proposed in patch 2.

Changes since v2:
- Add Chinese translation
- Add Tag
- Remove remaining ref to ue.enqueued and move some defines

[1] https://lore.kernel.org/lkml/CAKfTPtCAZWp7tRgTpwJmyEAkyN65acmYrfu9naEUpBZVWNTcQA@mail.gmail.com/

Vincent Guittot (2):
  sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  sched/fair: Simplify util_est

 Documentation/scheduler/schedutil.rst         |  7 +-
 .../zh_CN/scheduler/schedutil.rst             |  7 +-
 include/linux/sched.h                         | 49 +++--------
 kernel/sched/debug.c                          |  7 +-
 kernel/sched/fair.c                           | 86 +++++++------------
 kernel/sched/features.h                       |  1 -
 kernel/sched/pelt.h                           |  4 +-
 7 files changed, 55 insertions(+), 106 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  2023-12-01 16:16 [PATCH v2 0/2] Simplify Util_est Vincent Guittot
@ 2023-12-01 16:16 ` Vincent Guittot
  2023-12-02  2:41   ` Yanteng Si
  2023-12-07  3:44   ` Alex Shi
  2023-12-01 16:16 ` [PATCH v2 2/2] sched/fair: Simplify util_est Vincent Guittot
  1 sibling, 2 replies; 8+ messages in thread
From: Vincent Guittot @ 2023-12-01 16:16 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng, qyousef,
	linux-kernel, linux-doc
  Cc: lukasz.luba, hongyan.xia2, yizhou.tang, Vincent Guittot

sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature
in order to check for possibly related regressions. After 3 years, it has
never been used and no regression has been reported. Let remove it
and make fast increase a permanent behavior.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
---
 Documentation/scheduler/schedutil.rst                    | 7 +++----
 Documentation/translations/zh_CN/scheduler/schedutil.rst | 7 +++----
 kernel/sched/fair.c                                      | 8 +++-----
 kernel/sched/features.h                                  | 1 -
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/Documentation/scheduler/schedutil.rst b/Documentation/scheduler/schedutil.rst
index 32c7d69fc86c..803fba8fc714 100644
--- a/Documentation/scheduler/schedutil.rst
+++ b/Documentation/scheduler/schedutil.rst
@@ -90,8 +90,8 @@ For more detail see:
  - Documentation/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
 
 
-UTIL_EST / UTIL_EST_FASTUP
-==========================
+UTIL_EST
+========
 
 Because periodic tasks have their averages decayed while they sleep, even
 though when running their expected utilization will be the same, they suffer a
@@ -99,8 +99,7 @@ though when running their expected utilization will be the same, they suffer a
 
 To alleviate this (a default enabled option) UTIL_EST drives an Infinite
 Impulse Response (IIR) EWMA with the 'running' value on dequeue -- when it is
-highest. A further default enabled option UTIL_EST_FASTUP modifies the IIR
-filter to instantly increase and only decay on decrease.
+highest. UTIL_EST filters to instantly increase and only decay on decrease.
 
 A further runqueue wide sum (of runnable tasks) is maintained of:
 
diff --git a/Documentation/translations/zh_CN/scheduler/schedutil.rst b/Documentation/translations/zh_CN/scheduler/schedutil.rst
index d1ea68007520..7c8d87f21c42 100644
--- a/Documentation/translations/zh_CN/scheduler/schedutil.rst
+++ b/Documentation/translations/zh_CN/scheduler/schedutil.rst
@@ -89,16 +89,15 @@ r_cpu被定义为当前CPU的最高性能水平与系统中任何其它CPU的最
  - Documentation/translations/zh_CN/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
 
 
-UTIL_EST / UTIL_EST_FASTUP
-==========================
+UTIL_EST
+========
 
 由于周期性任务的平均数在睡眠时会衰减，而在运行时其预期利用率会和睡眠前相同，
 因此它们在再次运行后会面临（DVFS）的上涨。
 
 为了缓解这个问题，（一个默认使能的编译选项）UTIL_EST驱动一个无限脉冲响应
 （Infinite Impulse Response，IIR）的EWMA，“运行”值在出队时是最高的。
-另一个默认使能的编译选项UTIL_EST_FASTUP修改了IIR滤波器，使其允许立即增加，
-仅在利用率下降时衰减。
+UTIL_EST滤波使其在遇到更高值时立刻增加，而遇到低值时会缓慢衰减。
 
 进一步，运行队列的（可运行任务的）利用率之和由下式计算：
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bcea3d55d95d..e94d65da8d66 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4870,11 +4870,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	 * to smooth utilization decreases.
 	 */
 	ue.enqueued = task_util(p);
-	if (sched_feat(UTIL_EST_FASTUP)) {
-		if (ue.ewma < ue.enqueued) {
-			ue.ewma = ue.enqueued;
-			goto done;
-		}
+	if (ue.ewma < ue.enqueued) {
+		ue.ewma = ue.enqueued;
+		goto done;
 	}
 
 	/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3ddf84de430..143f55df890b 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -83,7 +83,6 @@ SCHED_FEAT(WA_BIAS, true)
  * UtilEstimation. Use estimated CPU utilization.
  */
 SCHED_FEAT(UTIL_EST, true)
-SCHED_FEAT(UTIL_EST_FASTUP, true)
 
 SCHED_FEAT(LATENCY_WARN, false)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  2023-12-01 16:16 ` [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) Vincent Guittot
@ 2023-12-02  2:41   ` Yanteng Si
  2023-12-07  3:44   ` Alex Shi
  1 sibling, 0 replies; 8+ messages in thread
From: Yanteng Si @ 2023-12-02  2:41 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, corbet, alexs,
	qyousef, linux-kernel, linux-doc
  Cc: lukasz.luba, hongyan.xia2, yizhou.tang


在 2023/12/2 00:16, Vincent Guittot 写道:
> sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature
> in order to check for possibly related regressions. After 3 years, it has
> never been used and no regression has been reported. Let remove it
> and make fast increase a permanent behavior.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
> Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>

For Chinese translation,


Reviewed-by: Yanteng Si <siyanteng@loongson.cn>


Thanks,

Yanteng

> ---
>   Documentation/scheduler/schedutil.rst                    | 7 +++----
>   Documentation/translations/zh_CN/scheduler/schedutil.rst | 7 +++----
>   kernel/sched/fair.c                                      | 8 +++-----
>   kernel/sched/features.h                                  | 1 -
>   4 files changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/scheduler/schedutil.rst b/Documentation/scheduler/schedutil.rst
> index 32c7d69fc86c..803fba8fc714 100644
> --- a/Documentation/scheduler/schedutil.rst
> +++ b/Documentation/scheduler/schedutil.rst
> @@ -90,8 +90,8 @@ For more detail see:
>    - Documentation/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
>   
>   
> -UTIL_EST / UTIL_EST_FASTUP
> -==========================
> +UTIL_EST
> +========
>   
>   Because periodic tasks have their averages decayed while they sleep, even
>   though when running their expected utilization will be the same, they suffer a
> @@ -99,8 +99,7 @@ though when running their expected utilization will be the same, they suffer a
>   
>   To alleviate this (a default enabled option) UTIL_EST drives an Infinite
>   Impulse Response (IIR) EWMA with the 'running' value on dequeue -- when it is
> -highest. A further default enabled option UTIL_EST_FASTUP modifies the IIR
> -filter to instantly increase and only decay on decrease.
> +highest. UTIL_EST filters to instantly increase and only decay on decrease.
>   
>   A further runqueue wide sum (of runnable tasks) is maintained of:
>   
> diff --git a/Documentation/translations/zh_CN/scheduler/schedutil.rst b/Documentation/translations/zh_CN/scheduler/schedutil.rst
> index d1ea68007520..7c8d87f21c42 100644
> --- a/Documentation/translations/zh_CN/scheduler/schedutil.rst
> +++ b/Documentation/translations/zh_CN/scheduler/schedutil.rst
> @@ -89,16 +89,15 @@ r_cpu被定义为当前CPU的最高性能水平与系统中任何其它CPU的最
>    - Documentation/translations/zh_CN/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
>   
>   
> -UTIL_EST / UTIL_EST_FASTUP
> -==========================
> +UTIL_EST
> +========
>   
>   由于周期性任务的平均数在睡眠时会衰减，而在运行时其预期利用率会和睡眠前相同，
>   因此它们在再次运行后会面临（DVFS）的上涨。
>   
>   为了缓解这个问题，（一个默认使能的编译选项）UTIL_EST驱动一个无限脉冲响应
>   （Infinite Impulse Response，IIR）的EWMA，“运行”值在出队时是最高的。
> -另一个默认使能的编译选项UTIL_EST_FASTUP修改了IIR滤波器，使其允许立即增加，
> -仅在利用率下降时衰减。
> +UTIL_EST滤波使其在遇到更高值时立刻增加，而遇到低值时会缓慢衰减。
>   
>   进一步，运行队列的（可运行任务的）利用率之和由下式计算：
>   
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bcea3d55d95d..e94d65da8d66 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4870,11 +4870,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>   	 * to smooth utilization decreases.
>   	 */
>   	ue.enqueued = task_util(p);
> -	if (sched_feat(UTIL_EST_FASTUP)) {
> -		if (ue.ewma < ue.enqueued) {
> -			ue.ewma = ue.enqueued;
> -			goto done;
> -		}
> +	if (ue.ewma < ue.enqueued) {
> +		ue.ewma = ue.enqueued;
> +		goto done;
>   	}
>   
>   	/*
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index a3ddf84de430..143f55df890b 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -83,7 +83,6 @@ SCHED_FEAT(WA_BIAS, true)
>    * UtilEstimation. Use estimated CPU utilization.
>    */
>   SCHED_FEAT(UTIL_EST, true)
> -SCHED_FEAT(UTIL_EST_FASTUP, true)
>   
>   SCHED_FEAT(LATENCY_WARN, false)
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  2023-12-01 16:16 ` [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) Vincent Guittot
  2023-12-02  2:41   ` Yanteng Si
@ 2023-12-07  3:44   ` Alex Shi
  1 sibling, 0 replies; 8+ messages in thread
From: Alex Shi @ 2023-12-07  3:44 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng, qyousef,
	linux-kernel, linux-doc, lukasz.luba, hongyan.xia2, yizhou.tang

Nice cleanup.

Reviewed-by: Alex Shi <alexs@kernel.org>

On Sat, Dec 2, 2023 at 12:17 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature
> in order to check for possibly related regressions. After 3 years, it has
> never been used and no regression has been reported. Let remove it
> and make fast increase a permanent behavior.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
> Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
> ---
>  Documentation/scheduler/schedutil.rst                    | 7 +++----
>  Documentation/translations/zh_CN/scheduler/schedutil.rst | 7 +++----
>  kernel/sched/fair.c                                      | 8 +++-----
>  kernel/sched/features.h                                  | 1 -
>  4 files changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/scheduler/schedutil.rst b/Documentation/scheduler/schedutil.rst
> index 32c7d69fc86c..803fba8fc714 100644
> --- a/Documentation/scheduler/schedutil.rst
> +++ b/Documentation/scheduler/schedutil.rst
> @@ -90,8 +90,8 @@ For more detail see:
>   - Documentation/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
>
>
> -UTIL_EST / UTIL_EST_FASTUP
> -==========================
> +UTIL_EST
> +========
>
>  Because periodic tasks have their averages decayed while they sleep, even
>  though when running their expected utilization will be the same, they suffer a
> @@ -99,8 +99,7 @@ though when running their expected utilization will be the same, they suffer a
>
>  To alleviate this (a default enabled option) UTIL_EST drives an Infinite
>  Impulse Response (IIR) EWMA with the 'running' value on dequeue -- when it is
> -highest. A further default enabled option UTIL_EST_FASTUP modifies the IIR
> -filter to instantly increase and only decay on decrease.
> +highest. UTIL_EST filters to instantly increase and only decay on decrease.
>
>  A further runqueue wide sum (of runnable tasks) is maintained of:
>
> diff --git a/Documentation/translations/zh_CN/scheduler/schedutil.rst b/Documentation/translations/zh_CN/scheduler/schedutil.rst
> index d1ea68007520..7c8d87f21c42 100644
> --- a/Documentation/translations/zh_CN/scheduler/schedutil.rst
> +++ b/Documentation/translations/zh_CN/scheduler/schedutil.rst
> @@ -89,16 +89,15 @@ r_cpu被定义为当前CPU的最高性能水平与系统中任何其它CPU的最
>   - Documentation/translations/zh_CN/scheduler/sched-capacity.rst:"1. CPU Capacity + 2. Task utilization"
>
>
> -UTIL_EST / UTIL_EST_FASTUP
> -==========================
> +UTIL_EST
> +========
>
>  由于周期性任务的平均数在睡眠时会衰减，而在运行时其预期利用率会和睡眠前相同，
>  因此它们在再次运行后会面临（DVFS）的上涨。
>
>  为了缓解这个问题，（一个默认使能的编译选项）UTIL_EST驱动一个无限脉冲响应
>  （Infinite Impulse Response，IIR）的EWMA，“运行”值在出队时是最高的。
> -另一个默认使能的编译选项UTIL_EST_FASTUP修改了IIR滤波器，使其允许立即增加，
> -仅在利用率下降时衰减。
> +UTIL_EST滤波使其在遇到更高值时立刻增加，而遇到低值时会缓慢衰减。
>
>  进一步，运行队列的（可运行任务的）利用率之和由下式计算：
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bcea3d55d95d..e94d65da8d66 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4870,11 +4870,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>          * to smooth utilization decreases.
>          */
>         ue.enqueued = task_util(p);
> -       if (sched_feat(UTIL_EST_FASTUP)) {
> -               if (ue.ewma < ue.enqueued) {
> -                       ue.ewma = ue.enqueued;
> -                       goto done;
> -               }
> +       if (ue.ewma < ue.enqueued) {
> +               ue.ewma = ue.enqueued;
> +               goto done;
>         }
>
>         /*
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index a3ddf84de430..143f55df890b 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -83,7 +83,6 @@ SCHED_FEAT(WA_BIAS, true)
>   * UtilEstimation. Use estimated CPU utilization.
>   */
>  SCHED_FEAT(UTIL_EST, true)
> -SCHED_FEAT(UTIL_EST_FASTUP, true)
>
>  SCHED_FEAT(LATENCY_WARN, false)
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] sched/fair: Simplify util_est
  2023-12-01 16:16 [PATCH v2 0/2] Simplify Util_est Vincent Guittot
  2023-12-01 16:16 ` [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) Vincent Guittot
@ 2023-12-01 16:16 ` Vincent Guittot
  2023-12-02 23:38   ` Qais Yousef
  2023-12-07  3:46   ` Alex Shi
  1 sibling, 2 replies; 8+ messages in thread
From: Vincent Guittot @ 2023-12-01 16:16 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng, qyousef,
	linux-kernel, linux-doc
  Cc: lukasz.luba, hongyan.xia2, yizhou.tang, Vincent Guittot

With UTIL_EST_FASTUP now being permanent, we can take advantage of the
fact that the ewma jumps directly to a higher utilization at dequeue to
simplify util_est and remove the enqueued field.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
---
 include/linux/sched.h | 49 +++++++-------------------
 kernel/sched/debug.c  |  7 ++--
 kernel/sched/fair.c   | 82 ++++++++++++++++---------------------------
 kernel/sched/pelt.h   |  4 +--
 4 files changed, 48 insertions(+), 94 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8d258162deb0..03bfe9ab2951 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -415,42 +415,6 @@ struct load_weight {
 	u32				inv_weight;
 };
 
-/**
- * struct util_est - Estimation utilization of FAIR tasks
- * @enqueued: instantaneous estimated utilization of a task/cpu
- * @ewma:     the Exponential Weighted Moving Average (EWMA)
- *            utilization of a task
- *
- * Support data structure to track an Exponential Weighted Moving Average
- * (EWMA) of a FAIR task's utilization. New samples are added to the moving
- * average each time a task completes an activation. Sample's weight is chosen
- * so that the EWMA will be relatively insensitive to transient changes to the
- * task's workload.
- *
- * The enqueued attribute has a slightly different meaning for tasks and cpus:
- * - task:   the task's util_avg at last task dequeue time
- * - cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPU
- * Thus, the util_est.enqueued of a task represents the contribution on the
- * estimated utilization of the CPU where that task is currently enqueued.
- *
- * Only for tasks we track a moving average of the past instantaneous
- * estimated utilization. This allows to absorb sporadic drops in utilization
- * of an otherwise almost periodic task.
- *
- * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_avg
- * updates. When a task is dequeued, its util_est should not be updated if its
- * util_avg has not been updated in the meantime.
- * This information is mapped into the MSB bit of util_est.enqueued at dequeue
- * time. Since max value of util_est.enqueued for a task is 1024 (PELT util_avg
- * for a task) it is safe to use MSB.
- */
-struct util_est {
-	unsigned int			enqueued;
-	unsigned int			ewma;
-#define UTIL_EST_WEIGHT_SHIFT		2
-#define UTIL_AVG_UNCHANGED		0x80000000
-} __attribute__((__aligned__(sizeof(u64))));
-
 /*
  * The load/runnable/util_avg accumulates an infinite geometric series
  * (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c).
@@ -505,9 +469,20 @@ struct sched_avg {
 	unsigned long			load_avg;
 	unsigned long			runnable_avg;
 	unsigned long			util_avg;
-	struct util_est			util_est;
+	unsigned int			util_est;
 } ____cacheline_aligned;
 
+/*
+ * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_avg
+ * updates. When a task is dequeued, its util_est should not be updated if its
+ * util_avg has not been updated in the meantime.
+ * This information is mapped into the MSB bit of util_est at dequeue time.
+ * Since max value of util_est for a task is 1024 (PELT util_avg for a task)
+ * it is safe to use MSB.
+ */
+#define UTIL_EST_WEIGHT_SHIFT		2
+#define UTIL_AVG_UNCHANGED		0x80000000
+
 struct sched_statistics {
 #ifdef CONFIG_SCHEDSTATS
 	u64				wait_start;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 168eecc209b4..8d5d98a5834d 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -684,8 +684,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 			cfs_rq->avg.runnable_avg);
 	SEQ_printf(m, "  .%-30s: %lu\n", "util_avg",
 			cfs_rq->avg.util_avg);
-	SEQ_printf(m, "  .%-30s: %u\n", "util_est_enqueued",
-			cfs_rq->avg.util_est.enqueued);
+	SEQ_printf(m, "  .%-30s: %u\n", "util_est",
+			cfs_rq->avg.util_est);
 	SEQ_printf(m, "  .%-30s: %ld\n", "removed.load_avg",
 			cfs_rq->removed.load_avg);
 	SEQ_printf(m, "  .%-30s: %ld\n", "removed.util_avg",
@@ -1075,8 +1075,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 	P(se.avg.runnable_avg);
 	P(se.avg.util_avg);
 	P(se.avg.last_update_time);
-	P(se.avg.util_est.ewma);
-	PM(se.avg.util_est.enqueued, ~UTIL_AVG_UNCHANGED);
+	PM(se.avg.util_est, ~UTIL_AVG_UNCHANGED);
 #endif
 #ifdef CONFIG_UCLAMP_TASK
 	__PS("uclamp.min", p->uclamp_req[UCLAMP_MIN].value);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e94d65da8d66..823dd76d0546 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4781,9 +4781,7 @@ static inline unsigned long task_runnable(struct task_struct *p)
 
 static inline unsigned long _task_util_est(struct task_struct *p)
 {
-	struct util_est ue = READ_ONCE(p->se.avg.util_est);
-
-	return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
+	return READ_ONCE(p->se.avg.util_est) & ~UTIL_AVG_UNCHANGED;
 }
 
 static inline unsigned long task_util_est(struct task_struct *p)
@@ -4800,9 +4798,9 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
 		return;
 
 	/* Update root cfs_rq's estimated utilization */
-	enqueued  = cfs_rq->avg.util_est.enqueued;
+	enqueued  = cfs_rq->avg.util_est;
 	enqueued += _task_util_est(p);
-	WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
+	WRITE_ONCE(cfs_rq->avg.util_est, enqueued);
 
 	trace_sched_util_est_cfs_tp(cfs_rq);
 }
@@ -4816,34 +4814,20 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
 		return;
 
 	/* Update root cfs_rq's estimated utilization */
-	enqueued  = cfs_rq->avg.util_est.enqueued;
+	enqueued  = cfs_rq->avg.util_est;
 	enqueued -= min_t(unsigned int, enqueued, _task_util_est(p));
-	WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
+	WRITE_ONCE(cfs_rq->avg.util_est, enqueued);
 
 	trace_sched_util_est_cfs_tp(cfs_rq);
 }
 
 #define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
 
-/*
- * Check if a (signed) value is within a specified (unsigned) margin,
- * based on the observation that:
- *
- *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
- *
- * NOTE: this only works when value + margin < INT_MAX.
- */
-static inline bool within_margin(int value, int margin)
-{
-	return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
-}
-
 static inline void util_est_update(struct cfs_rq *cfs_rq,
 				   struct task_struct *p,
 				   bool task_sleep)
 {
-	long last_ewma_diff, last_enqueued_diff;
-	struct util_est ue;
+	unsigned int ewma, dequeued, last_ewma_diff;
 
 	if (!sched_feat(UTIL_EST))
 		return;
@@ -4855,23 +4839,25 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	if (!task_sleep)
 		return;
 
+	/* Get current estimate of utilization */
+	ewma = READ_ONCE(p->se.avg.util_est);
+
 	/*
 	 * If the PELT values haven't changed since enqueue time,
 	 * skip the util_est update.
 	 */
-	ue = p->se.avg.util_est;
-	if (ue.enqueued & UTIL_AVG_UNCHANGED)
+	if (ewma & UTIL_AVG_UNCHANGED)
 		return;
 
-	last_enqueued_diff = ue.enqueued;
+	/* Get utilization at dequeue */
+	dequeued = task_util(p);
 
 	/*
 	 * Reset EWMA on utilization increases, the moving average is used only
 	 * to smooth utilization decreases.
 	 */
-	ue.enqueued = task_util(p);
-	if (ue.ewma < ue.enqueued) {
-		ue.ewma = ue.enqueued;
+	if (ewma <= dequeued) {
+		ewma = dequeued;
 		goto done;
 	}
 
@@ -4879,27 +4865,22 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	 * Skip update of task's estimated utilization when its members are
 	 * already ~1% close to its last activation value.
 	 */
-	last_ewma_diff = ue.enqueued - ue.ewma;
-	last_enqueued_diff -= ue.enqueued;
-	if (within_margin(last_ewma_diff, UTIL_EST_MARGIN)) {
-		if (!within_margin(last_enqueued_diff, UTIL_EST_MARGIN))
-			goto done;
-
-		return;
-	}
+	last_ewma_diff = ewma - dequeued;
+	if (last_ewma_diff < UTIL_EST_MARGIN)
+		goto done;
 
 	/*
 	 * To avoid overestimation of actual task utilization, skip updates if
 	 * we cannot grant there is idle time in this CPU.
 	 */
-	if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq))))
+	if (dequeued > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq))))
 		return;
 
 	/*
 	 * To avoid underestimate of task utilization, skip updates of EWMA if
 	 * we cannot grant that thread got all CPU time it wanted.
 	 */
-	if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p))
+	if ((dequeued + UTIL_EST_MARGIN) < task_runnable(p))
 		goto done;
 
 
@@ -4907,25 +4888,24 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	 * Update Task's estimated utilization
 	 *
 	 * When *p completes an activation we can consolidate another sample
-	 * of the task size. This is done by storing the current PELT value
-	 * as ue.enqueued and by using this value to update the Exponential
-	 * Weighted Moving Average (EWMA):
+	 * of the task size. This is done by using this value to update the
+	 * Exponential Weighted Moving Average (EWMA):
 	 *
 	 *  ewma(t) = w *  task_util(p) + (1-w) * ewma(t-1)
 	 *          = w *  task_util(p) +         ewma(t-1)  - w * ewma(t-1)
 	 *          = w * (task_util(p) -         ewma(t-1)) +     ewma(t-1)
-	 *          = w * (      last_ewma_diff            ) +     ewma(t-1)
-	 *          = w * (last_ewma_diff  +  ewma(t-1) / w)
+	 *          = w * (      -last_ewma_diff           ) +     ewma(t-1)
+	 *          = w * (-last_ewma_diff +  ewma(t-1) / w)
 	 *
 	 * Where 'w' is the weight of new samples, which is configured to be
 	 * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT)
 	 */
-	ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
-	ue.ewma  += last_ewma_diff;
-	ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
+	ewma <<= UTIL_EST_WEIGHT_SHIFT;
+	ewma  -= last_ewma_diff;
+	ewma >>= UTIL_EST_WEIGHT_SHIFT;
 done:
-	ue.enqueued |= UTIL_AVG_UNCHANGED;
-	WRITE_ONCE(p->se.avg.util_est, ue);
+	ewma |= UTIL_AVG_UNCHANGED;
+	WRITE_ONCE(p->se.avg.util_est, ewma);
 
 	trace_sched_util_est_se_tp(&p->se);
 }
@@ -7653,16 +7633,16 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
 	if (sched_feat(UTIL_EST)) {
 		unsigned long util_est;
 
-		util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
+		util_est = READ_ONCE(cfs_rq->avg.util_est);
 
 		/*
 		 * During wake-up @p isn't enqueued yet and doesn't contribute
-		 * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued.
+		 * to any cpu_rq(cpu)->cfs.avg.util_est.
 		 * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p
 		 * has been enqueued.
 		 *
 		 * During exec (@dst_cpu = -1) @p is enqueued and does
-		 * contribute to cpu_rq(cpu)->cfs.util_est.enqueued.
+		 * contribute to cpu_rq(cpu)->cfs.util_est.
 		 * Remove it to "simulate" cpu_util without @p's contribution.
 		 *
 		 * Despite the task_on_rq_queued(@p) check there is still a
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 3a0e0dc28721..9e1083465fbc 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -52,13 +52,13 @@ static inline void cfs_se_util_change(struct sched_avg *avg)
 		return;
 
 	/* Avoid store if the flag has been already reset */
-	enqueued = avg->util_est.enqueued;
+	enqueued = avg->util_est;
 	if (!(enqueued & UTIL_AVG_UNCHANGED))
 		return;
 
 	/* Reset flag to report util_avg has been updated */
 	enqueued &= ~UTIL_AVG_UNCHANGED;
-	WRITE_ONCE(avg->util_est.enqueued, enqueued);
+	WRITE_ONCE(avg->util_est, enqueued);
 }
 
 static inline u64 rq_clock_pelt(struct rq *rq)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: Simplify util_est
  2023-12-01 16:16 ` [PATCH v2 2/2] sched/fair: Simplify util_est Vincent Guittot
@ 2023-12-02 23:38   ` Qais Yousef
  2023-12-04  9:54     ` Vincent Guittot
  2023-12-07  3:46   ` Alex Shi
  1 sibling, 1 reply; 8+ messages in thread
From: Qais Yousef @ 2023-12-02 23:38 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng,
	linux-kernel, linux-doc, lukasz.luba, hongyan.xia2, yizhou.tang

On 12/01/23 17:16, Vincent Guittot wrote:

>  /*
>   * The load/runnable/util_avg accumulates an infinite geometric series
>   * (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c).
> @@ -505,9 +469,20 @@ struct sched_avg {
>  	unsigned long			load_avg;
>  	unsigned long			runnable_avg;
>  	unsigned long			util_avg;
> -	struct util_est			util_est;
> +	unsigned int			util_est;
>  } ____cacheline_aligned;

unsigned long would be better?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: Simplify util_est
  2023-12-02 23:38   ` Qais Yousef
@ 2023-12-04  9:54     ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2023-12-04  9:54 UTC (permalink / raw)
  To: Qais Yousef
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng,
	linux-kernel, linux-doc, lukasz.luba, hongyan.xia2, yizhou.tang

On Sun, 3 Dec 2023 at 00:38, Qais Yousef <qyousef@layalina.io> wrote:
>
> On 12/01/23 17:16, Vincent Guittot wrote:
>
> >  /*
> >   * The load/runnable/util_avg accumulates an infinite geometric series
> >   * (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c).
> > @@ -505,9 +469,20 @@ struct sched_avg {
> >       unsigned long                   load_avg;
> >       unsigned long                   runnable_avg;
> >       unsigned long                   util_avg;
> > -     struct util_est                 util_est;
> > +     unsigned int                    util_est;
> >  } ____cacheline_aligned;
>
> unsigned long would be better?

I thought about changing it to unsigned long but I prefered to keep
using the same type as before for the ewma as we don't need to extend
it

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: Simplify util_est
  2023-12-01 16:16 ` [PATCH v2 2/2] sched/fair: Simplify util_est Vincent Guittot
  2023-12-02 23:38   ` Qais Yousef
@ 2023-12-07  3:46   ` Alex Shi
  1 sibling, 0 replies; 8+ messages in thread
From: Alex Shi @ 2023-12-07  3:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, vschneid, corbet, alexs, siyanteng, qyousef,
	linux-kernel, linux-doc, lukasz.luba, hongyan.xia2, yizhou.tang

Looks good to me.

Reviewed-by: Alex Shi <alexs@kernel.org>

On Sat, Dec 2, 2023 at 12:17 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> With UTIL_EST_FASTUP now being permanent, we can take advantage of the
> fact that the ewma jumps directly to a higher utilization at dequeue to
> simplify util_est and remove the enqueued field.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
> ---
>  include/linux/sched.h | 49 +++++++-------------------
>  kernel/sched/debug.c  |  7 ++--
>  kernel/sched/fair.c   | 82 ++++++++++++++++---------------------------
>  kernel/sched/pelt.h   |  4 +--
>  4 files changed, 48 insertions(+), 94 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8d258162deb0..03bfe9ab2951 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -415,42 +415,6 @@ struct load_weight {
>         u32                             inv_weight;
>  };
>
> -/**
> - * struct util_est - Estimation utilization of FAIR tasks
> - * @enqueued: instantaneous estimated utilization of a task/cpu
> - * @ewma:     the Exponential Weighted Moving Average (EWMA)
> - *            utilization of a task
> - *
> - * Support data structure to track an Exponential Weighted Moving Average
> - * (EWMA) of a FAIR task's utilization. New samples are added to the moving
> - * average each time a task completes an activation. Sample's weight is chosen
> - * so that the EWMA will be relatively insensitive to transient changes to the
> - * task's workload.
> - *
> - * The enqueued attribute has a slightly different meaning for tasks and cpus:
> - * - task:   the task's util_avg at last task dequeue time
> - * - cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPU
> - * Thus, the util_est.enqueued of a task represents the contribution on the
> - * estimated utilization of the CPU where that task is currently enqueued.
> - *
> - * Only for tasks we track a moving average of the past instantaneous
> - * estimated utilization. This allows to absorb sporadic drops in utilization
> - * of an otherwise almost periodic task.
> - *
> - * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_avg
> - * updates. When a task is dequeued, its util_est should not be updated if its
> - * util_avg has not been updated in the meantime.
> - * This information is mapped into the MSB bit of util_est.enqueued at dequeue
> - * time. Since max value of util_est.enqueued for a task is 1024 (PELT util_avg
> - * for a task) it is safe to use MSB.
> - */
> -struct util_est {
> -       unsigned int                    enqueued;
> -       unsigned int                    ewma;
> -#define UTIL_EST_WEIGHT_SHIFT          2
> -#define UTIL_AVG_UNCHANGED             0x80000000
> -} __attribute__((__aligned__(sizeof(u64))));
> -
>  /*
>   * The load/runnable/util_avg accumulates an infinite geometric series
>   * (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c).
> @@ -505,9 +469,20 @@ struct sched_avg {
>         unsigned long                   load_avg;
>         unsigned long                   runnable_avg;
>         unsigned long                   util_avg;
> -       struct util_est                 util_est;
> +       unsigned int                    util_est;
>  } ____cacheline_aligned;
>
> +/*
> + * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_avg
> + * updates. When a task is dequeued, its util_est should not be updated if its
> + * util_avg has not been updated in the meantime.
> + * This information is mapped into the MSB bit of util_est at dequeue time.
> + * Since max value of util_est for a task is 1024 (PELT util_avg for a task)
> + * it is safe to use MSB.
> + */
> +#define UTIL_EST_WEIGHT_SHIFT          2
> +#define UTIL_AVG_UNCHANGED             0x80000000
> +
>  struct sched_statistics {
>  #ifdef CONFIG_SCHEDSTATS
>         u64                             wait_start;
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 168eecc209b4..8d5d98a5834d 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -684,8 +684,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
>                         cfs_rq->avg.runnable_avg);
>         SEQ_printf(m, "  .%-30s: %lu\n", "util_avg",
>                         cfs_rq->avg.util_avg);
> -       SEQ_printf(m, "  .%-30s: %u\n", "util_est_enqueued",
> -                       cfs_rq->avg.util_est.enqueued);
> +       SEQ_printf(m, "  .%-30s: %u\n", "util_est",
> +                       cfs_rq->avg.util_est);
>         SEQ_printf(m, "  .%-30s: %ld\n", "removed.load_avg",
>                         cfs_rq->removed.load_avg);
>         SEQ_printf(m, "  .%-30s: %ld\n", "removed.util_avg",
> @@ -1075,8 +1075,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
>         P(se.avg.runnable_avg);
>         P(se.avg.util_avg);
>         P(se.avg.last_update_time);
> -       P(se.avg.util_est.ewma);
> -       PM(se.avg.util_est.enqueued, ~UTIL_AVG_UNCHANGED);
> +       PM(se.avg.util_est, ~UTIL_AVG_UNCHANGED);
>  #endif
>  #ifdef CONFIG_UCLAMP_TASK
>         __PS("uclamp.min", p->uclamp_req[UCLAMP_MIN].value);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e94d65da8d66..823dd76d0546 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4781,9 +4781,7 @@ static inline unsigned long task_runnable(struct task_struct *p)
>
>  static inline unsigned long _task_util_est(struct task_struct *p)
>  {
> -       struct util_est ue = READ_ONCE(p->se.avg.util_est);
> -
> -       return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
> +       return READ_ONCE(p->se.avg.util_est) & ~UTIL_AVG_UNCHANGED;
>  }
>
>  static inline unsigned long task_util_est(struct task_struct *p)
> @@ -4800,9 +4798,9 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
>                 return;
>
>         /* Update root cfs_rq's estimated utilization */
> -       enqueued  = cfs_rq->avg.util_est.enqueued;
> +       enqueued  = cfs_rq->avg.util_est;
>         enqueued += _task_util_est(p);
> -       WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> +       WRITE_ONCE(cfs_rq->avg.util_est, enqueued);
>
>         trace_sched_util_est_cfs_tp(cfs_rq);
>  }
> @@ -4816,34 +4814,20 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
>                 return;
>
>         /* Update root cfs_rq's estimated utilization */
> -       enqueued  = cfs_rq->avg.util_est.enqueued;
> +       enqueued  = cfs_rq->avg.util_est;
>         enqueued -= min_t(unsigned int, enqueued, _task_util_est(p));
> -       WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> +       WRITE_ONCE(cfs_rq->avg.util_est, enqueued);
>
>         trace_sched_util_est_cfs_tp(cfs_rq);
>  }
>
>  #define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
>
> -/*
> - * Check if a (signed) value is within a specified (unsigned) margin,
> - * based on the observation that:
> - *
> - *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
> - *
> - * NOTE: this only works when value + margin < INT_MAX.
> - */
> -static inline bool within_margin(int value, int margin)
> -{
> -       return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> -}
> -
>  static inline void util_est_update(struct cfs_rq *cfs_rq,
>                                    struct task_struct *p,
>                                    bool task_sleep)
>  {
> -       long last_ewma_diff, last_enqueued_diff;
> -       struct util_est ue;
> +       unsigned int ewma, dequeued, last_ewma_diff;
>
>         if (!sched_feat(UTIL_EST))
>                 return;
> @@ -4855,23 +4839,25 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>         if (!task_sleep)
>                 return;
>
> +       /* Get current estimate of utilization */
> +       ewma = READ_ONCE(p->se.avg.util_est);
> +
>         /*
>          * If the PELT values haven't changed since enqueue time,
>          * skip the util_est update.
>          */
> -       ue = p->se.avg.util_est;
> -       if (ue.enqueued & UTIL_AVG_UNCHANGED)
> +       if (ewma & UTIL_AVG_UNCHANGED)
>                 return;
>
> -       last_enqueued_diff = ue.enqueued;
> +       /* Get utilization at dequeue */
> +       dequeued = task_util(p);
>
>         /*
>          * Reset EWMA on utilization increases, the moving average is used only
>          * to smooth utilization decreases.
>          */
> -       ue.enqueued = task_util(p);
> -       if (ue.ewma < ue.enqueued) {
> -               ue.ewma = ue.enqueued;
> +       if (ewma <= dequeued) {
> +               ewma = dequeued;
>                 goto done;
>         }
>
> @@ -4879,27 +4865,22 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>          * Skip update of task's estimated utilization when its members are
>          * already ~1% close to its last activation value.
>          */
> -       last_ewma_diff = ue.enqueued - ue.ewma;
> -       last_enqueued_diff -= ue.enqueued;
> -       if (within_margin(last_ewma_diff, UTIL_EST_MARGIN)) {
> -               if (!within_margin(last_enqueued_diff, UTIL_EST_MARGIN))
> -                       goto done;
> -
> -               return;
> -       }
> +       last_ewma_diff = ewma - dequeued;
> +       if (last_ewma_diff < UTIL_EST_MARGIN)
> +               goto done;
>
>         /*
>          * To avoid overestimation of actual task utilization, skip updates if
>          * we cannot grant there is idle time in this CPU.
>          */
> -       if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq))))
> +       if (dequeued > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq))))
>                 return;
>
>         /*
>          * To avoid underestimate of task utilization, skip updates of EWMA if
>          * we cannot grant that thread got all CPU time it wanted.
>          */
> -       if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p))
> +       if ((dequeued + UTIL_EST_MARGIN) < task_runnable(p))
>                 goto done;
>
>
> @@ -4907,25 +4888,24 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>          * Update Task's estimated utilization
>          *
>          * When *p completes an activation we can consolidate another sample
> -        * of the task size. This is done by storing the current PELT value
> -        * as ue.enqueued and by using this value to update the Exponential
> -        * Weighted Moving Average (EWMA):
> +        * of the task size. This is done by using this value to update the
> +        * Exponential Weighted Moving Average (EWMA):
>          *
>          *  ewma(t) = w *  task_util(p) + (1-w) * ewma(t-1)
>          *          = w *  task_util(p) +         ewma(t-1)  - w * ewma(t-1)
>          *          = w * (task_util(p) -         ewma(t-1)) +     ewma(t-1)
> -        *          = w * (      last_ewma_diff            ) +     ewma(t-1)
> -        *          = w * (last_ewma_diff  +  ewma(t-1) / w)
> +        *          = w * (      -last_ewma_diff           ) +     ewma(t-1)
> +        *          = w * (-last_ewma_diff +  ewma(t-1) / w)
>          *
>          * Where 'w' is the weight of new samples, which is configured to be
>          * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT)
>          */
> -       ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
> -       ue.ewma  += last_ewma_diff;
> -       ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
> +       ewma <<= UTIL_EST_WEIGHT_SHIFT;
> +       ewma  -= last_ewma_diff;
> +       ewma >>= UTIL_EST_WEIGHT_SHIFT;
>  done:
> -       ue.enqueued |= UTIL_AVG_UNCHANGED;
> -       WRITE_ONCE(p->se.avg.util_est, ue);
> +       ewma |= UTIL_AVG_UNCHANGED;
> +       WRITE_ONCE(p->se.avg.util_est, ewma);
>
>         trace_sched_util_est_se_tp(&p->se);
>  }
> @@ -7653,16 +7633,16 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost)
>         if (sched_feat(UTIL_EST)) {
>                 unsigned long util_est;
>
> -               util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> +               util_est = READ_ONCE(cfs_rq->avg.util_est);
>
>                 /*
>                  * During wake-up @p isn't enqueued yet and doesn't contribute
> -                * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued.
> +                * to any cpu_rq(cpu)->cfs.avg.util_est.
>                  * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p
>                  * has been enqueued.
>                  *
>                  * During exec (@dst_cpu = -1) @p is enqueued and does
> -                * contribute to cpu_rq(cpu)->cfs.util_est.enqueued.
> +                * contribute to cpu_rq(cpu)->cfs.util_est.
>                  * Remove it to "simulate" cpu_util without @p's contribution.
>                  *
>                  * Despite the task_on_rq_queued(@p) check there is still a
> diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
> index 3a0e0dc28721..9e1083465fbc 100644
> --- a/kernel/sched/pelt.h
> +++ b/kernel/sched/pelt.h
> @@ -52,13 +52,13 @@ static inline void cfs_se_util_change(struct sched_avg *avg)
>                 return;
>
>         /* Avoid store if the flag has been already reset */
> -       enqueued = avg->util_est.enqueued;
> +       enqueued = avg->util_est;
>         if (!(enqueued & UTIL_AVG_UNCHANGED))
>                 return;
>
>         /* Reset flag to report util_avg has been updated */
>         enqueued &= ~UTIL_AVG_UNCHANGED;
> -       WRITE_ONCE(avg->util_est.enqueued, enqueued);
> +       WRITE_ONCE(avg->util_est, enqueued);
>  }
>
>  static inline u64 rq_clock_pelt(struct rq *rq)
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-07  3:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-01 16:16 [PATCH v2 0/2] Simplify Util_est Vincent Guittot
2023-12-01 16:16 ` [PATCH v2 1/2] sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) Vincent Guittot
2023-12-02  2:41   ` Yanteng Si
2023-12-07  3:44   ` Alex Shi
2023-12-01 16:16 ` [PATCH v2 2/2] sched/fair: Simplify util_est Vincent Guittot
2023-12-02 23:38   ` Qais Yousef
2023-12-04  9:54     ` Vincent Guittot
2023-12-07  3:46   ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox