[PATCH V2 0/2] sched/fair: Some improvements to feec()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2 0/2] sched/fair: Some improvements to feec()
@ 2024-06-24  8:20 Xuewen Yan
  2024-06-24  8:20 ` [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity Xuewen Yan
  2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
  0 siblings, 2 replies; 20+ messages in thread
From: Xuewen Yan @ 2024-06-24  8:20 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, juri.lelli,
	qyousef
  Cc: rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

Patch 1 clamp the util with per-CPU instand of per-PD when calc pd_busy_time.

Patch 2 use actual_cpu_capacity when judging util_fits_cpu to cover the case
which rq-uclamp-max would cause some corner case wrong.

Changelog:
V1:
https://lore.kernel.org/all/20240606070645.3295-1-xuewen.yan@unisoc.com/

v2: 
(1) remove the eenv->pd_cap capping in eenv_pd_busy_time() for patch-1;
(2) change commit-message for patch-1;
(3) add patch 2.

Xuewen Yan (2):
  sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()

 kernel/sched/fair.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-24  8:20 [PATCH V2 0/2] sched/fair: Some improvements to feec() Xuewen Yan
@ 2024-06-24  8:20 ` Xuewen Yan
  2024-06-25 13:04   ` Vincent Guittot
  2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
  1 sibling, 1 reply; 20+ messages in thread
From: Xuewen Yan @ 2024-06-24  8:20 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, juri.lelli,
	qyousef
  Cc: rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
changed the PD's util from per-CPU to per-PD capping. But because
the effective_cpu_util() would return a util which maybe bigger
than the actual_cpu_capacity, this could cause the pd_busy_time
calculation errors.
So clamp the cpu_busy_time with the eenv->cpu_cap, which is
the actual_cpu_capacity.

Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
---
V2:
- change commit message.
- remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
- add Tested-by.
---
 kernel/sched/fair.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..5ca6396ef0b7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
 				     struct cpumask *pd_cpus,
 				     struct task_struct *p)
 {
-	unsigned long busy_time = 0;
 	int cpu;
 
+	eenv->pd_busy_time = 0;
+
 	for_each_cpu(cpu, pd_cpus) {
 		unsigned long util = cpu_util(cpu, p, -1, 0);
 
-		busy_time += effective_cpu_util(cpu, util, NULL, NULL);
+		util = effective_cpu_util(cpu, util, NULL, NULL);
+		util = min(eenv->cpu_cap, util);
+		eenv->pd_busy_time += util;
 	}
-
-	eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
 }
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-06-24  8:20 [PATCH V2 0/2] sched/fair: Some improvements to feec() Xuewen Yan
  2024-06-24  8:20 ` [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity Xuewen Yan
@ 2024-06-24  8:20 ` Xuewen Yan
  2024-06-25  8:46   ` Vincent Guittot
  2024-06-28  1:28   ` Qais Yousef
  1 sibling, 2 replies; 20+ messages in thread
From: Xuewen Yan @ 2024-06-24  8:20 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, juri.lelli,
	qyousef
  Cc: rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
introduced get_actual_cpu_capacity(), and it had aggregated the
different pressures applied on the capacity of CPUs.
And in util_fits_cpu(), it would return true when uclamp_max
is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
is bigger than actual_cpu_capacity.

So use actual_cpu_capacity everywhere in util_fits_cpu() to
cover all cases.

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/sched/fair.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5ca6396ef0b7..9c16ae192217 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
 				int cpu)
 {
 	unsigned long capacity = capacity_of(cpu);
-	unsigned long capacity_orig;
+	unsigned long capacity_actual;
 	bool fits, uclamp_max_fits;
 
 	/*
@@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
 		return fits;
 
 	/*
-	 * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
+	 * We must use actual_cpu_capacity() for comparing against uclamp_min and
 	 * uclamp_max. We only care about capacity pressure (by using
 	 * capacity_of()) for comparing against the real util.
 	 *
 	 * If a task is boosted to 1024 for example, we don't want a tiny
 	 * pressure to skew the check whether it fits a CPU or not.
 	 *
-	 * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
-	 * should fit a little cpu even if there's some pressure.
+	 * Similarly if a task is capped to actual_cpu_capacity, it should fit
+	 * the cpu even if there's some pressure.
 	 *
 	 * Only exception is for HW or cpufreq pressure since it has a direct impact
 	 * on available OPP of the system.
@@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 * For uclamp_max, we can tolerate a drop in performance level as the
 	 * goal is to cap the task. So it's okay if it's getting less.
 	 */
-	capacity_orig = arch_scale_cpu_capacity(cpu);
+	capacity_actual = get_actual_cpu_capacity(cpu);
 
 	/*
 	 * We want to force a task to fit a cpu as implied by uclamp_max.
@@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 *     uclamp_max request.
 	 *
 	 *   which is what we're enforcing here. A task always fits if
-	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
+	 *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
 	 *   the normal upmigration rules should withhold still.
 	 *
 	 *   Only exception is when we are on max capacity, then we need to be
@@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
 	 *     2. The system is being saturated when we're operating near
 	 *        max capacity, it doesn't make sense to block overutilized.
 	 */
-	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
-	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+	uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
+	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
 	fits = fits || uclamp_max_fits;
 
 	/*
@@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
 	 * handle the case uclamp_min > uclamp_max.
 	 */
 	uclamp_min = min(uclamp_min, uclamp_max);
-	if (fits && (util < uclamp_min) &&
-	    (uclamp_min > get_actual_cpu_capacity(cpu)))
+	if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
 		return -1;
 
 	return fits;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
@ 2024-06-25  8:46   ` Vincent Guittot
  2024-06-28  1:28   ` Qais Yousef
  1 sibling, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2024-06-25  8:46 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: dietmar.eggemann, mingo, peterz, juri.lelli, qyousef, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On Mon, 24 Jun 2024 at 10:21, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> introduced get_actual_cpu_capacity(), and it had aggregated the
> different pressures applied on the capacity of CPUs.
> And in util_fits_cpu(), it would return true when uclamp_max
> is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max

s/althought/although/

> is bigger than actual_cpu_capacity.
>
> So use actual_cpu_capacity everywhere in util_fits_cpu() to
> cover all cases.
>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  kernel/sched/fair.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5ca6396ef0b7..9c16ae192217 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
>                                 int cpu)
>  {
>         unsigned long capacity = capacity_of(cpu);
> -       unsigned long capacity_orig;
> +       unsigned long capacity_actual;
>         bool fits, uclamp_max_fits;
>
>         /*
> @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
>                 return fits;
>
>         /*
> -        * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> +        * We must use actual_cpu_capacity() for comparing against uclamp_min and
>          * uclamp_max. We only care about capacity pressure (by using
>          * capacity_of()) for comparing against the real util.

actual_cpu_capacity() includes capacity pressure so this comment needs
to be changed.

>          *
>          * If a task is boosted to 1024 for example, we don't want a tiny
>          * pressure to skew the check whether it fits a CPU or not.

In think that the pressure in this case is about DL, RT and Irq but
not OPP capping because of thermal pressure for example

Also, the returned value is not a boolean: fit or not fit but a
ternary for the case where it doesn't fit the uclamp min. In the later
case, the CPU is still considered as a candidate

>          *
> -        * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> -        * should fit a little cpu even if there's some pressure.
> +        * Similarly if a task is capped to actual_cpu_capacity, it should fit
> +        * the cpu even if there's some pressure.
>          *
>          * Only exception is for HW or cpufreq pressure since it has a direct impact
>          * on available OPP of the system.
> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>          * For uclamp_max, we can tolerate a drop in performance level as the
>          * goal is to cap the task. So it's okay if it's getting less.
>          */
> -       capacity_orig = arch_scale_cpu_capacity(cpu);
> +       capacity_actual = get_actual_cpu_capacity(cpu);
>
>         /*
>          * We want to force a task to fit a cpu as implied by uclamp_max.
> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>          *     uclamp_max request.
>          *
>          *   which is what we're enforcing here. A task always fits if
> -        *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> +        *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>          *   the normal upmigration rules should withhold still.
>          *
>          *   Only exception is when we are on max capacity, then we need to be
> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>          *     2. The system is being saturated when we're operating near
>          *        max capacity, it doesn't make sense to block overutilized.
>          */
> -       uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> -       uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +       uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> +       uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
>         fits = fits || uclamp_max_fits;
>
>         /*
> @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
>          * handle the case uclamp_min > uclamp_max.
>          */
>         uclamp_min = min(uclamp_min, uclamp_max);
> -       if (fits && (util < uclamp_min) &&
> -           (uclamp_min > get_actual_cpu_capacity(cpu)))
> +       if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
>                 return -1;
>
>         return fits;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-24  8:20 ` [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity Xuewen Yan
@ 2024-06-25 13:04   ` Vincent Guittot
  2024-06-27  2:02     ` Xuewen Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2024-06-25 13:04 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: dietmar.eggemann, mingo, peterz, juri.lelli, qyousef, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> changed the PD's util from per-CPU to per-PD capping. But because
> the effective_cpu_util() would return a util which maybe bigger
> than the actual_cpu_capacity, this could cause the pd_busy_time
> calculation errors.

I'm still not convinced that this is an error. Your example used for v1 is :

The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
of cpufreq-limit, the cpu_actual_cap = 512.

Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
effective_cpu_util(4) = 1024;
effective_cpu_util(5) = 1024;
effective_cpu_util(6) = 256;
effective_cpu_util(7) = 0;

so env->pd_busy_time = 2304

Even if effective_cpu_util(4) = 1024; is above the current max compute
capacity of 512, this also means that activity of cpu4 will run twice
longer . If you cap effective_cpu_util(4) to 512 you miss the
information that it will run twice longer at the selected OPP. The
extreme case being:
effective_cpu_util(4) = 1024;
effective_cpu_util(5) = 1024;
effective_cpu_util(6) = 1024;
effective_cpu_util(7) = 1024;

in this case env->pd_busy_time = 4096

If we cap, we can't make any difference between the 2 cases

Do you have more details about the problem you are facing ?



> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> the actual_cpu_capacity.
>
> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Tested-by: Christian Loehle <christian.loehle@arm.com>
> ---
> V2:
> - change commit message.
> - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
> - add Tested-by.
> ---
>  kernel/sched/fair.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..5ca6396ef0b7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
>                                      struct cpumask *pd_cpus,
>                                      struct task_struct *p)
>  {
> -       unsigned long busy_time = 0;
>         int cpu;
>
> +       eenv->pd_busy_time = 0;
> +
>         for_each_cpu(cpu, pd_cpus) {
>                 unsigned long util = cpu_util(cpu, p, -1, 0);
>
> -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> +               util = effective_cpu_util(cpu, util, NULL, NULL);
> +               util = min(eenv->cpu_cap, util);
> +               eenv->pd_busy_time += util;
>         }
> -
> -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
>  }
>
>  /*
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-25 13:04   ` Vincent Guittot
@ 2024-06-27  2:02     ` Xuewen Yan
  2024-06-27 16:15       ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: Xuewen Yan @ 2024-06-27  2:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, qyousef,
	rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, linux-kernel

On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
> >
> > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > changed the PD's util from per-CPU to per-PD capping. But because
> > the effective_cpu_util() would return a util which maybe bigger
> > than the actual_cpu_capacity, this could cause the pd_busy_time
> > calculation errors.
>
> I'm still not convinced that this is an error. Your example used for v1 is :
>
> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> of cpufreq-limit, the cpu_actual_cap = 512.
>
> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> effective_cpu_util(4) = 1024;
> effective_cpu_util(5) = 1024;
> effective_cpu_util(6) = 256;
> effective_cpu_util(7) = 0;
>
> so env->pd_busy_time = 2304
>
> Even if effective_cpu_util(4) = 1024; is above the current max compute
> capacity of 512, this also means that activity of cpu4 will run twice
> longer . If you cap effective_cpu_util(4) to 512 you miss the
> information that it will run twice longer at the selected OPP. The
> extreme case being:
> effective_cpu_util(4) = 1024;
> effective_cpu_util(5) = 1024;
> effective_cpu_util(6) = 1024;
> effective_cpu_util(7) = 1024;
>
> in this case env->pd_busy_time = 4096
>
> If we cap, we can't make any difference between the 2 cases
>
> Do you have more details about the problem you are facing ?

Because of the cpufreq-limit, the opp was also limited, and when compute_energy:

energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;

Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
cost instead of the max freq's cost.
So the energy is determined by pd_busy_time.

Still the example above:

The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
of cpufreq-limit, the cpu_actual_cap = 512.

Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
effective_cpu_util(4) = 1024;
effective_cpu_util(5) = 1024;
effective_cpu_util(6) = 256;
effective_cpu_util(7) = 0;

Before the patch:
env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
the actual_cpu_cap.

After the patch:
cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
cpu_util(6) = min(256, eenv->cpu_cap) = 256;
cpu_util(7) = 0;
env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.

As a result, without this patch, the energy is bigger than actual_energy.

And even if cpu4 would run twice longer, the energy may not be equal.
Because:
 *             ps->power * cpu_max_freq
*   cpu_nrg = ------------------------ * cpu_util           (3)
*               ps->freq * scale_cpu

the ps->power = cfv2, and then:

*                  cv2 * cpu_max_freq
*   cpu_nrg = ------------------------ * cpu_util           (3)
*                    scale_cpu

because the limited-freq's voltage is not equal to the max-freq's voltage.

>
>
>
> > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > the actual_cpu_capacity.
> >
> > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > Tested-by: Christian Loehle <christian.loehle@arm.com>
> > ---
> > V2:
> > - change commit message.
> > - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
> > - add Tested-by.
> > ---
> >  kernel/sched/fair.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..5ca6396ef0b7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> >                                      struct cpumask *pd_cpus,
> >                                      struct task_struct *p)
> >  {
> > -       unsigned long busy_time = 0;
> >         int cpu;
> >
> > +       eenv->pd_busy_time = 0;
> > +
> >         for_each_cpu(cpu, pd_cpus) {
> >                 unsigned long util = cpu_util(cpu, p, -1, 0);
> >
> > -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > +               util = effective_cpu_util(cpu, util, NULL, NULL);
> > +               util = min(eenv->cpu_cap, util);
> > +               eenv->pd_busy_time += util;
> >         }
> > -
> > -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> >  }
> >
> >  /*
> > --
> > 2.25.1
> >
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-27  2:02     ` Xuewen Yan
@ 2024-06-27 16:15       ` Vincent Guittot
  2024-06-28  1:39         ` Qais Yousef
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Vincent Guittot @ 2024-06-27 16:15 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, qyousef,
	rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, linux-kernel

On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
> > >
> > > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > changed the PD's util from per-CPU to per-PD capping. But because
> > > the effective_cpu_util() would return a util which maybe bigger
> > > than the actual_cpu_capacity, this could cause the pd_busy_time
> > > calculation errors.
> >
> > I'm still not convinced that this is an error. Your example used for v1 is :
> >
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit, the cpu_actual_cap = 512.
> >
> > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 256;
> > effective_cpu_util(7) = 0;
> >
> > so env->pd_busy_time = 2304
> >
> > Even if effective_cpu_util(4) = 1024; is above the current max compute
> > capacity of 512, this also means that activity of cpu4 will run twice
> > longer . If you cap effective_cpu_util(4) to 512 you miss the
> > information that it will run twice longer at the selected OPP. The
> > extreme case being:
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 1024;
> > effective_cpu_util(7) = 1024;
> >
> > in this case env->pd_busy_time = 4096
> >
> > If we cap, we can't make any difference between the 2 cases
> >
> > Do you have more details about the problem you are facing ?
>
> Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
>
> energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
>
> Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
> cost instead of the max freq's cost.
> So the energy is determined by pd_busy_time.
>
> Still the example above:
>
> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> of cpufreq-limit, the cpu_actual_cap = 512.
>
> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> effective_cpu_util(4) = 1024;
> effective_cpu_util(5) = 1024;
> effective_cpu_util(6) = 256;
> effective_cpu_util(7) = 0;
>
> Before the patch:
> env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
> However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
> the actual_cpu_cap.
>
> After the patch:
> cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
> cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
> cpu_util(6) = min(256, eenv->cpu_cap) = 256;
> cpu_util(7) = 0;
> env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.
>
> As a result, without this patch, the energy is bigger than actual_energy.
>
> And even if cpu4 would run twice longer, the energy may not be equal.
> Because:
>  *             ps->power * cpu_max_freq
> *   cpu_nrg = ------------------------ * cpu_util           (3)
> *               ps->freq * scale_cpu
>
> the ps->power = cfv2, and then:
>
> *                  cv2 * cpu_max_freq
> *   cpu_nrg = ------------------------ * cpu_util           (3)
> *                    scale_cpu
>
> because the limited-freq's voltage is not equal to the max-freq's voltage.

I'm still struggling to understand why it's wrong. If the frequency is
capped, we will never go above this limited frequency and its
associated voltage so there is no reason to consider max-freq's
voltage. If there is more things to do than the actual capacity can do
per unit of time then we will run more than 1 unit of time.

nrg of PD = /Sum(cpu) ps->power * cpu-running-time

ps->power is fixed because of the limited frequency constraint

we estimate cpu-running-time = utilization / ps->performance
with
- utilization = util_avg
- performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
ps->performance

Up to now we were assuming that utilization was always lower than
performance otherwise the system was overutilized andwe fallback in
performance mode. But when the frequency of a cpu is limited by
userspace or thermal mitigation, the utilization can become higher
than the limited capacity which can be translated by cpu will run
longer.

>
> >
> >
> >
> > > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > > the actual_cpu_capacity.
> > >
> > > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > > Tested-by: Christian Loehle <christian.loehle@arm.com>
> > > ---
> > > V2:
> > > - change commit message.
> > > - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
> > > - add Tested-by.
> > > ---
> > >  kernel/sched/fair.c | 9 +++++----
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 8a5b1ae0aa55..5ca6396ef0b7 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> > >                                      struct cpumask *pd_cpus,
> > >                                      struct task_struct *p)
> > >  {
> > > -       unsigned long busy_time = 0;
> > >         int cpu;
> > >
> > > +       eenv->pd_busy_time = 0;
> > > +
> > >         for_each_cpu(cpu, pd_cpus) {
> > >                 unsigned long util = cpu_util(cpu, p, -1, 0);
> > >
> > > -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > > +               util = effective_cpu_util(cpu, util, NULL, NULL);
> > > +               util = min(eenv->cpu_cap, util);
> > > +               eenv->pd_busy_time += util;
> > >         }
> > > -
> > > -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> > >  }
> > >
> > >  /*
> > > --
> > > 2.25.1
> > >
> > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
  2024-06-25  8:46   ` Vincent Guittot
@ 2024-06-28  1:28   ` Qais Yousef
  2024-07-01 12:13     ` Xuewen Yan
  2024-07-02 13:25     ` Vincent Guittot
  1 sibling, 2 replies; 20+ messages in thread
From: Qais Yousef @ 2024-06-28  1:28 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: vincent.guittot, dietmar.eggemann, mingo, peterz, juri.lelli,
	rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On 06/24/24 16:20, Xuewen Yan wrote:
> Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> introduced get_actual_cpu_capacity(), and it had aggregated the
> different pressures applied on the capacity of CPUs.
> And in util_fits_cpu(), it would return true when uclamp_max
> is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> is bigger than actual_cpu_capacity.
> 
> So use actual_cpu_capacity everywhere in util_fits_cpu() to
> cover all cases.
> 
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  kernel/sched/fair.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5ca6396ef0b7..9c16ae192217 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
>  				int cpu)
>  {
>  	unsigned long capacity = capacity_of(cpu);
> -	unsigned long capacity_orig;
> +	unsigned long capacity_actual;
>  	bool fits, uclamp_max_fits;
>  
>  	/*
> @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
>  		return fits;
>  
>  	/*
> -	 * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> +	 * We must use actual_cpu_capacity() for comparing against uclamp_min and
>  	 * uclamp_max. We only care about capacity pressure (by using
>  	 * capacity_of()) for comparing against the real util.
>  	 *
>  	 * If a task is boosted to 1024 for example, we don't want a tiny
>  	 * pressure to skew the check whether it fits a CPU or not.
>  	 *
> -	 * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> -	 * should fit a little cpu even if there's some pressure.
> +	 * Similarly if a task is capped to actual_cpu_capacity, it should fit
> +	 * the cpu even if there's some pressure.

This statement is not clear now. We need to be specific since
actual_cpu_capacity() includes thermal pressure and cpufreq limits.

/even if there's some pressure/even if there is non OPP based pressure ie: RT,
DL or irq/?

>  	 *
>  	 * Only exception is for HW or cpufreq pressure since it has a direct impact
>  	 * on available OPP of the system.
> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 * For uclamp_max, we can tolerate a drop in performance level as the
>  	 * goal is to cap the task. So it's okay if it's getting less.
>  	 */
> -	capacity_orig = arch_scale_cpu_capacity(cpu);
> +	capacity_actual = get_actual_cpu_capacity(cpu);
>  
>  	/*
>  	 * We want to force a task to fit a cpu as implied by uclamp_max.
> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 *     uclamp_max request.
>  	 *
>  	 *   which is what we're enforcing here. A task always fits if
> -	 *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> +	 *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>  	 *   the normal upmigration rules should withhold still.
>  	 *
>  	 *   Only exception is when we are on max capacity, then we need to be
> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>  	 *     2. The system is being saturated when we're operating near
>  	 *        max capacity, it doesn't make sense to block overutilized.
>  	 */
> -	uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> -	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +	uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);

We should use capacity_orig here. We are checking if the CPU is the max
capacity CPU.

> +	uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
>  	fits = fits || uclamp_max_fits;
>  
>  	/*
> @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
>  	 * handle the case uclamp_min > uclamp_max.
>  	 */
>  	uclamp_min = min(uclamp_min, uclamp_max);
> -	if (fits && (util < uclamp_min) &&
> -	    (uclamp_min > get_actual_cpu_capacity(cpu)))
> +	if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
>  		return -1;
>  
>  	return fits;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-27 16:15       ` Vincent Guittot
@ 2024-06-28  1:39         ` Qais Yousef
  2024-07-01 12:00         ` Xuewen Yan
  2024-07-04 17:01         ` Pierre Gondois
  2 siblings, 0 replies; 20+ messages in thread
From: Qais Yousef @ 2024-06-28  1:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Xuewen Yan, Xuewen Yan, dietmar.eggemann, mingo, peterz,
	juri.lelli, rostedt, bsegall, mgorman, bristot, vschneid,
	christian.loehle, vincent.donnefort, ke.wang, di.shen,
	linux-kernel

On 06/27/24 18:15, Vincent Guittot wrote:
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
> > > >
> > > > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > changed the PD's util from per-CPU to per-PD capping. But because
> > > > the effective_cpu_util() would return a util which maybe bigger
> > > > than the actual_cpu_capacity, this could cause the pd_busy_time
> > > > calculation errors.
> > >
> > > I'm still not convinced that this is an error. Your example used for v1 is :
> > >
> > > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > > of cpufreq-limit, the cpu_actual_cap = 512.
> > >
> > > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 256;
> > > effective_cpu_util(7) = 0;
> > >
> > > so env->pd_busy_time = 2304
> > >
> > > Even if effective_cpu_util(4) = 1024; is above the current max compute
> > > capacity of 512, this also means that activity of cpu4 will run twice
> > > longer . If you cap effective_cpu_util(4) to 512 you miss the
> > > information that it will run twice longer at the selected OPP. The
> > > extreme case being:
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 1024;
> > > effective_cpu_util(7) = 1024;
> > >
> > > in this case env->pd_busy_time = 4096
> > >
> > > If we cap, we can't make any difference between the 2 cases
> > >
> > > Do you have more details about the problem you are facing ?
> >
> > Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
> >
> > energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
> >
> > Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
> > cost instead of the max freq's cost.
> > So the energy is determined by pd_busy_time.
> >
> > Still the example above:
> >
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit, the cpu_actual_cap = 512.
> >
> > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 256;
> > effective_cpu_util(7) = 0;
> >
> > Before the patch:
> > env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
> > However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
> > the actual_cpu_cap.
> >
> > After the patch:
> > cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(6) = min(256, eenv->cpu_cap) = 256;
> > cpu_util(7) = 0;
> > env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.
> >
> > As a result, without this patch, the energy is bigger than actual_energy.
> >
> > And even if cpu4 would run twice longer, the energy may not be equal.
> > Because:
> >  *             ps->power * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *               ps->freq * scale_cpu
> >
> > the ps->power = cfv2, and then:
> >
> > *                  cv2 * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *                    scale_cpu
> >
> > because the limited-freq's voltage is not equal to the max-freq's voltage.
> 
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
> 
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
> 
> ps->power is fixed because of the limited frequency constraint
> 
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
> 
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode. But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.

I might have contributed a bit to this confusion. So my apologies.

We certainly want to remove this limit and I thought to be consistent with
current code behavior it might be good to have this patch. But as Vincent
pointed out it actually moves us backward.

So I think this is no longer needed too. We want to actually make the pd
capping removed too - but I haven't looked at the detail if this can be done
without side effects. We need to be more accurate in estimating the runtime and
capping in general hides information about how long the cpu/pd will be busy
for.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-27 16:15       ` Vincent Guittot
  2024-06-28  1:39         ` Qais Yousef
@ 2024-07-01 12:00         ` Xuewen Yan
  2024-07-04 17:01         ` Pierre Gondois
  2 siblings, 0 replies; 20+ messages in thread
From: Xuewen Yan @ 2024-07-01 12:00 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, qyousef,
	rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, linux-kernel

On Fri, Jun 28, 2024 at 12:15 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
> > > >
> > > > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > changed the PD's util from per-CPU to per-PD capping. But because
> > > > the effective_cpu_util() would return a util which maybe bigger
> > > > than the actual_cpu_capacity, this could cause the pd_busy_time
> > > > calculation errors.
> > >
> > > I'm still not convinced that this is an error. Your example used for v1 is :
> > >
> > > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > > of cpufreq-limit, the cpu_actual_cap = 512.
> > >
> > > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 256;
> > > effective_cpu_util(7) = 0;
> > >
> > > so env->pd_busy_time = 2304
> > >
> > > Even if effective_cpu_util(4) = 1024; is above the current max compute
> > > capacity of 512, this also means that activity of cpu4 will run twice
> > > longer . If you cap effective_cpu_util(4) to 512 you miss the
> > > information that it will run twice longer at the selected OPP. The
> > > extreme case being:
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 1024;
> > > effective_cpu_util(7) = 1024;
> > >
> > > in this case env->pd_busy_time = 4096
> > >
> > > If we cap, we can't make any difference between the 2 cases
> > >
> > > Do you have more details about the problem you are facing ?
> >
> > Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
> >
> > energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
> >
> > Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
> > cost instead of the max freq's cost.
> > So the energy is determined by pd_busy_time.
> >
> > Still the example above:
> >
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit, the cpu_actual_cap = 512.
> >
> > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 256;
> > effective_cpu_util(7) = 0;
> >
> > Before the patch:
> > env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
> > However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
> > the actual_cpu_cap.
> >
> > After the patch:
> > cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(6) = min(256, eenv->cpu_cap) = 256;
> > cpu_util(7) = 0;
> > env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.
> >
> > As a result, without this patch, the energy is bigger than actual_energy.
> >
> > And even if cpu4 would run twice longer, the energy may not be equal.
> > Because:
> >  *             ps->power * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *               ps->freq * scale_cpu
> >
> > the ps->power = cfv2, and then:
> >
> > *                  cv2 * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *                    scale_cpu
> >
> > because the limited-freq's voltage is not equal to the max-freq's voltage.
>
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
>
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
>
> ps->power is fixed because of the limited frequency constraint
>
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
>
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode.

Well, with patch2/2,  this patch is no longer needed.
But if we want to remove the restriction of feec() on rd->overutilized
later, this patch should be reconsidered.

> But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.
>
> >
> > >
> > >
> > >
> > > > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > > > the actual_cpu_capacity.
> > > >
> > > > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > > > Tested-by: Christian Loehle <christian.loehle@arm.com>
> > > > ---
> > > > V2:
> > > > - change commit message.
> > > > - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
> > > > - add Tested-by.
> > > > ---
> > > >  kernel/sched/fair.c | 9 +++++----
> > > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 8a5b1ae0aa55..5ca6396ef0b7 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> > > >                                      struct cpumask *pd_cpus,
> > > >                                      struct task_struct *p)
> > > >  {
> > > > -       unsigned long busy_time = 0;
> > > >         int cpu;
> > > >
> > > > +       eenv->pd_busy_time = 0;
> > > > +
> > > >         for_each_cpu(cpu, pd_cpus) {
> > > >                 unsigned long util = cpu_util(cpu, p, -1, 0);
> > > >
> > > > -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > > > +               util = effective_cpu_util(cpu, util, NULL, NULL);
> > > > +               util = min(eenv->cpu_cap, util);
> > > > +               eenv->pd_busy_time += util;
> > > >         }
> > > > -
> > > > -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> > > >  }
> > > >
> > > >  /*
> > > > --
> > > > 2.25.1
> > > >
> > > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-06-28  1:28   ` Qais Yousef
@ 2024-07-01 12:13     ` Xuewen Yan
  2024-07-03 11:46       ` Qais Yousef
  2024-07-02 13:25     ` Vincent Guittot
  1 sibling, 1 reply; 20+ messages in thread
From: Xuewen Yan @ 2024-07-01 12:13 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Xuewen Yan, vincent.guittot, dietmar.eggemann, mingo, peterz,
	juri.lelli, rostedt, bsegall, mgorman, bristot, vschneid,
	christian.loehle, vincent.donnefort, ke.wang, di.shen,
	linux-kernel

On Fri, Jun 28, 2024 at 9:28 AM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 06/24/24 16:20, Xuewen Yan wrote:
> > Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> > introduced get_actual_cpu_capacity(), and it had aggregated the
> > different pressures applied on the capacity of CPUs.
> > And in util_fits_cpu(), it would return true when uclamp_max
> > is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> > is bigger than actual_cpu_capacity.
> >
> > So use actual_cpu_capacity everywhere in util_fits_cpu() to
> > cover all cases.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > ---
> >  kernel/sched/fair.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5ca6396ef0b7..9c16ae192217 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
> >                               int cpu)
> >  {
> >       unsigned long capacity = capacity_of(cpu);
> > -     unsigned long capacity_orig;
> > +     unsigned long capacity_actual;
> >       bool fits, uclamp_max_fits;
> >
> >       /*
> > @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
> >               return fits;
> >
> >       /*
> > -      * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> > +      * We must use actual_cpu_capacity() for comparing against uclamp_min and
> >        * uclamp_max. We only care about capacity pressure (by using
> >        * capacity_of()) for comparing against the real util.
> >        *
> >        * If a task is boosted to 1024 for example, we don't want a tiny
> >        * pressure to skew the check whether it fits a CPU or not.
> >        *
> > -      * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> > -      * should fit a little cpu even if there's some pressure.
> > +      * Similarly if a task is capped to actual_cpu_capacity, it should fit
> > +      * the cpu even if there's some pressure.
>
> This statement is not clear now. We need to be specific since
> actual_cpu_capacity() includes thermal pressure and cpufreq limits.
>
> /even if there's some pressure/even if there is non OPP based pressure ie: RT,
> DL or irq/?
>
> >        *
> >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> >        * on available OPP of the system.
> > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >        * goal is to cap the task. So it's okay if it's getting less.
> >        */
> > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > +     capacity_actual = get_actual_cpu_capacity(cpu);
> >
> >       /*
> >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     uclamp_max request.
> >        *
> >        *   which is what we're enforcing here. A task always fits if
> > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> >        *   the normal upmigration rules should withhold still.
> >        *
> >        *   Only exception is when we are on max capacity, then we need to be
> > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     2. The system is being saturated when we're operating near
> >        *        max capacity, it doesn't make sense to block overutilized.
> >        */
> > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>
> We should use capacity_orig here. We are checking if the CPU is the max
> capacity CPU.

Maybe we could remove the uclamp_max_fits = (capacity_orig ==
SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
and just judge the uclamp_max <= capacity_actual?

-     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) &&
(uclamp_max == SCHED_CAPACITY_SCALE);
-     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
+     uclamp_max_fits =  (uclamp_max <= capacity_actual);


>
> > +     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
> >       fits = fits || uclamp_max_fits;
> >
> >       /*
> > @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * handle the case uclamp_min > uclamp_max.
> >        */
> >       uclamp_min = min(uclamp_min, uclamp_max);
> > -     if (fits && (util < uclamp_min) &&
> > -         (uclamp_min > get_actual_cpu_capacity(cpu)))
> > +     if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
> >               return -1;
> >
> >       return fits;
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-06-28  1:28   ` Qais Yousef
  2024-07-01 12:13     ` Xuewen Yan
@ 2024-07-02 13:25     ` Vincent Guittot
  2024-07-03 11:54       ` Qais Yousef
  1 sibling, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2024-07-02 13:25 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On Fri, 28 Jun 2024 at 03:28, Qais Yousef <qyousef@layalina.io> wrote:
>
> On 06/24/24 16:20, Xuewen Yan wrote:
> > Commit f1f8d0a22422 ("sched/cpufreq: Take cpufreq feedback into account")
> > introduced get_actual_cpu_capacity(), and it had aggregated the
> > different pressures applied on the capacity of CPUs.
> > And in util_fits_cpu(), it would return true when uclamp_max
> > is smaller than SCHED_CAPACITY_SCALE, althought the uclamp_max
> > is bigger than actual_cpu_capacity.
> >
> > So use actual_cpu_capacity everywhere in util_fits_cpu() to
> > cover all cases.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > ---
> >  kernel/sched/fair.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5ca6396ef0b7..9c16ae192217 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4980,7 +4980,7 @@ static inline int util_fits_cpu(unsigned long util,
> >                               int cpu)
> >  {
> >       unsigned long capacity = capacity_of(cpu);
> > -     unsigned long capacity_orig;
> > +     unsigned long capacity_actual;
> >       bool fits, uclamp_max_fits;
> >
> >       /*
> > @@ -4992,15 +4992,15 @@ static inline int util_fits_cpu(unsigned long util,
> >               return fits;
> >
> >       /*
> > -      * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
> > +      * We must use actual_cpu_capacity() for comparing against uclamp_min and
> >        * uclamp_max. We only care about capacity pressure (by using
> >        * capacity_of()) for comparing against the real util.
> >        *
> >        * If a task is boosted to 1024 for example, we don't want a tiny
> >        * pressure to skew the check whether it fits a CPU or not.
> >        *
> > -      * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
> > -      * should fit a little cpu even if there's some pressure.
> > +      * Similarly if a task is capped to actual_cpu_capacity, it should fit
> > +      * the cpu even if there's some pressure.
>
> This statement is not clear now. We need to be specific since
> actual_cpu_capacity() includes thermal pressure and cpufreq limits.
>
> /even if there's some pressure/even if there is non OPP based pressure ie: RT,
> DL or irq/?
>
> >        *
> >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> >        * on available OPP of the system.
> > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * For uclamp_max, we can tolerate a drop in performance level as the
> >        * goal is to cap the task. So it's okay if it's getting less.
> >        */
> > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > +     capacity_actual = get_actual_cpu_capacity(cpu);
> >
> >       /*
> >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     uclamp_max request.
> >        *
> >        *   which is what we're enforcing here. A task always fits if
> > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> >        *   the normal upmigration rules should withhold still.
> >        *
> >        *   Only exception is when we are on max capacity, then we need to be
> > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     2. The system is being saturated when we're operating near
> >        *        max capacity, it doesn't make sense to block overutilized.
> >        */
> > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>
> We should use capacity_orig here. We are checking if the CPU is the max
> capacity CPU.

I was also wondering what would be the best choice there. If we
consider that we have only one performance domain with all max
capacity cpus then I agree that we should keep capacity_orig as we
can't find a better cpu that would fit. But is it always true that all
max cpu are tied to the same perf domain ?

>
> > +     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_actual);
> >       fits = fits || uclamp_max_fits;
> >
> >       /*
> > @@ -5086,8 +5086,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        * handle the case uclamp_min > uclamp_max.
> >        */
> >       uclamp_min = min(uclamp_min, uclamp_max);
> > -     if (fits && (util < uclamp_min) &&
> > -         (uclamp_min > get_actual_cpu_capacity(cpu)))
> > +     if (fits && (util < uclamp_min) && (uclamp_min > capacity_actual))
> >               return -1;
> >
> >       return fits;
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-01 12:13     ` Xuewen Yan
@ 2024-07-03 11:46       ` Qais Yousef
  0 siblings, 0 replies; 20+ messages in thread
From: Qais Yousef @ 2024-07-03 11:46 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Xuewen Yan, vincent.guittot, dietmar.eggemann, mingo, peterz,
	juri.lelli, rostedt, bsegall, mgorman, bristot, vschneid,
	christian.loehle, vincent.donnefort, ke.wang, di.shen,
	linux-kernel

On 07/01/24 20:13, Xuewen Yan wrote:

> > >        *
> > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > >        * on available OPP of the system.
> > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > >        * goal is to cap the task. So it's okay if it's getting less.
> > >        */
> > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > >
> > >       /*
> > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     uclamp_max request.
> > >        *
> > >        *   which is what we're enforcing here. A task always fits if
> > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > >        *   the normal upmigration rules should withhold still.
> > >        *
> > >        *   Only exception is when we are on max capacity, then we need to be
> > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     2. The system is being saturated when we're operating near
> > >        *        max capacity, it doesn't make sense to block overutilized.
> > >        */
> > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> >
> > We should use capacity_orig here. We are checking if the CPU is the max
> > capacity CPU.
> 
> Maybe we could remove the uclamp_max_fits = (capacity_orig ==
> SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> and just judge the uclamp_max <= capacity_actual?
> 
> -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) &&
> (uclamp_max == SCHED_CAPACITY_SCALE);
> -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> +     uclamp_max_fits =  (uclamp_max <= capacity_actual);

If capacity_orig = 1024, capacity_actual = 1024, uclamp_max = 1024 (which is
the common case), then overutilized will never trigger for big CPU, no?

We can't 'force' fit a task on the biggest core, and fits_capacity() should be
our sole judge here if we fit or not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-02 13:25     ` Vincent Guittot
@ 2024-07-03 11:54       ` Qais Yousef
  2024-07-03 14:54         ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: Qais Yousef @ 2024-07-03 11:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On 07/02/24 15:25, Vincent Guittot wrote:

> > >        *
> > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > >        * on available OPP of the system.
> > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > >        * goal is to cap the task. So it's okay if it's getting less.
> > >        */
> > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > >
> > >       /*
> > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     uclamp_max request.
> > >        *
> > >        *   which is what we're enforcing here. A task always fits if
> > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > >        *   the normal upmigration rules should withhold still.
> > >        *
> > >        *   Only exception is when we are on max capacity, then we need to be
> > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > >        *     2. The system is being saturated when we're operating near
> > >        *        max capacity, it doesn't make sense to block overutilized.
> > >        */
> > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> >
> > We should use capacity_orig here. We are checking if the CPU is the max
> > capacity CPU.
> 
> I was also wondering what would be the best choice there. If we
> consider that we have only one performance domain with all max
> capacity cpus then I agree that we should keep capacity_orig as we
> can't find a better cpu that would fit. But is it always true that all
> max cpu are tied to the same perf domain ?

Hmm I could be not thinking straight today. But the purpose of this check is to
ensure overutilized can trigger for the common case where a task will always
fit the max capacity cpu (whether it's on a single pd or multiple ones). For
that corner case fits_capacity() should be the only fitness check otherwise
overutilized will never trigger by default.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-03 11:54       ` Qais Yousef
@ 2024-07-03 14:54         ` Vincent Guittot
  2024-07-04 23:56           ` Qais Yousef
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2024-07-03 14:54 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
>
> On 07/02/24 15:25, Vincent Guittot wrote:
>
> > > >        *
> > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > >        * on available OPP of the system.
> > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > >        */
> > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > >
> > > >       /*
> > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        *     uclamp_max request.
> > > >        *
> > > >        *   which is what we're enforcing here. A task always fits if
> > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > >        *   the normal upmigration rules should withhold still.
> > > >        *
> > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > >        *     2. The system is being saturated when we're operating near
> > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > >        */
> > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > >
> > > We should use capacity_orig here. We are checking if the CPU is the max
> > > capacity CPU.
> >
> > I was also wondering what would be the best choice there. If we
> > consider that we have only one performance domain with all max
> > capacity cpus then I agree that we should keep capacity_orig as we
> > can't find a better cpu that would fit. But is it always true that all
> > max cpu are tied to the same perf domain ?
>
> Hmm I could be not thinking straight today. But the purpose of this check is to
> ensure overutilized can trigger for the common case where a task will always
> fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> that corner case fits_capacity() should be the only fitness check otherwise
> overutilized will never trigger by default.

Ok, so I messed up several use cases but in fact both are similar ...

if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
SCHED_CAPACITY_SCALE
then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
(uclamp_max == SCHED_CAPACITY_SCALE) is false
and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
capacity_actual); is also false because (uclamp_max <=
capacity_actual) is always false

if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
SCHED_CAPACITY_SCALE
then we are the same as with capacity_orig

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-06-27 16:15       ` Vincent Guittot
  2024-06-28  1:39         ` Qais Yousef
  2024-07-01 12:00         ` Xuewen Yan
@ 2024-07-04 17:01         ` Pierre Gondois
  2024-07-05  0:10           ` Qais Yousef
  2 siblings, 1 reply; 20+ messages in thread
From: Pierre Gondois @ 2024-07-04 17:01 UTC (permalink / raw)
  To: Vincent Guittot, Xuewen Yan
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, qyousef,
	rostedt, bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, linux-kernel

Hello,

On 6/27/24 18:15, Vincent Guittot wrote:
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>>
>> On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
>> <vincent.guittot@linaro.org> wrote:
>>>
>>> On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>>>>
>>>> Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
>>>> changed the PD's util from per-CPU to per-PD capping. But because
>>>> the effective_cpu_util() would return a util which maybe bigger
>>>> than the actual_cpu_capacity, this could cause the pd_busy_time
>>>> calculation errors.
>>>
>>> I'm still not convinced that this is an error. Your example used for v1 is :
>>>
>>> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
>>> of cpufreq-limit, the cpu_actual_cap = 512.
>>>
>>> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
>>> effective_cpu_util(4) = 1024;
>>> effective_cpu_util(5) = 1024;
>>> effective_cpu_util(6) = 256;
>>> effective_cpu_util(7) = 0;
>>>
>>> so env->pd_busy_time = 2304

IIUC, with this configuration:
Before patch:
- env->pd_busy_time = 2304
After patch:
- env->pd_busy_time = 512 + 512 + 256 + 0 = 1280

But when computing the energy for the task to place:
compute_energy()
{
   if (dst_cpu >= 0)
     busy_time = min(eenv->pd_cap, busy_time + eenv->task_busy_time);
}

the contribution of the task might be truncated.
E.g.:
Trying to place a task on CPU7 with task_busy_time=300.
Then in compute_energy():

Before patch:
busy_time = min(2048, 2304 + 300)
busy_time = 2048
-> busy_time delta = 2048 - 2304 = 256

After patch:
busy_time = min(2048, 1280 + 300)
busy_time = 1580
-> busy_time delta = 1580 - 1280 = 300

>>>
>>> Even if effective_cpu_util(4) = 1024; is above the current max compute
>>> capacity of 512, this also means that activity of cpu4 will run twice
>>> longer . If you cap effective_cpu_util(4) to 512 you miss the
>>> information that it will run twice longer at the selected OPP. The
>>> extreme case being:
>>> effective_cpu_util(4) = 1024;
>>> effective_cpu_util(5) = 1024;
>>> effective_cpu_util(6) = 1024;
>>> effective_cpu_util(7) = 1024;
>>>
>>> in this case env->pd_busy_time = 4096
>>>
>>> If we cap, we can't make any difference between the 2 cases
>>>
>>> Do you have more details about the problem you are facing ?
>>
>> Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
>>
>> energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
>>
>> Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
>> cost instead of the max freq's cost.
>> So the energy is determined by pd_busy_time.
>>
>> Still the example above:
>>
>> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
>> of cpufreq-limit, the cpu_actual_cap = 512.
>>
>> Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
>> effective_cpu_util(4) = 1024;
>> effective_cpu_util(5) = 1024;
>> effective_cpu_util(6) = 256;
>> effective_cpu_util(7) = 0;
>>
>> Before the patch:
>> env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
>> However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
>> the actual_cpu_cap.
>>
>> After the patch:
>> cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
>> cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
>> cpu_util(6) = min(256, eenv->cpu_cap) = 256;
>> cpu_util(7) = 0;
>> env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.

If we take a similar example, on a pd with 2 CPUs and:
- effective_cpu_util(0) = 1024
- effective_cpu_util(1) = 0
and:
- eenv->cpu_cap = 100
- eenv->pd_cap = 100 + 100 = 200

Before the patch:
- env->pd_busy_time = min(1024 + 0, eenv->pd_cap) = 200

After the patch:
- cpu_util(0) = min(1024, eenv->cpu_cap) = 100
- cpu_util(1) = min(0, eenv->cpu_cap) = 0
and:
- env->pd_busy_time = min(100 + 0, eenv->pd_cap) = 100

-------------

Same example while adding additional empty CPUs:
- effective_cpu_util(0) = 1024
- effective_cpu_util(1) = 0
- effective_cpu_util(2) = 0
- effective_cpu_util(3) = 0
and:
- eenv->cpu_cap = 100
- eenv->pd_cap = 100 * 4 = 400

Before the patch:
- env->pd_busy_time = min(1024 + 0 + 0 + 0, eenv->pd_cap) = 400

After the patch:
- cpu_util(0) = min(1024, eenv->cpu_cap) = 100
- cpu_util(1) = min(0, eenv->cpu_cap) = 0
- ...
and:
- env->pd_busy_time = min(100 + 0 + 0 + 0, eenv->pd_cap) = 100

-------------

What seems strange is that the more we add empty CPUs in a pd,
the more energy is spent in the first computation, which seems
indeed incorrect (unless I missed something).

>>
>> As a result, without this patch, the energy is bigger than actual_energy.
>>
>> And even if cpu4 would run twice longer, the energy may not be equal.
>> Because:
>>   *             ps->power * cpu_max_freq
>> *   cpu_nrg = ------------------------ * cpu_util           (3)
>> *               ps->freq * scale_cpu
>>
>> the ps->power = cfv2, and then:
>>
>> *                  cv2 * cpu_max_freq
>> *   cpu_nrg = ------------------------ * cpu_util           (3)
>> *                    scale_cpu
>>
>> because the limited-freq's voltage is not equal to the max-freq's voltage.
> 
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
> 
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
> 
> ps->power is fixed because of the limited frequency constraint
> 
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
> 
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode. But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.

I thought the EAS was comparing instantaneous power and not energy,
i.e. how the energy computation is done:

   *             ps->power * cpu_max_freq
   *   cpu_nrg = ------------------------ * cpu_util           (3)
   *               ps->freq * scale_cpu

cpu_nrg should have the same dimension as ps->power (i.e. energy/s).
  From this PoV, the energy computation should not take into account how
much time a task is expected to run. But it might be a side discussion,

Regards,
Pierre


> 
>>
>>>
>>>
>>>
>>>> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
>>>> the actual_cpu_capacity.
>>>>
>>>> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
>>>> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
>>>> Tested-by: Christian Loehle <christian.loehle@arm.com>
>>>> ---
>>>> V2:
>>>> - change commit message.
>>>> - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
>>>> - add Tested-by.
>>>> ---
>>>>   kernel/sched/fair.c | 9 +++++----
>>>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 8a5b1ae0aa55..5ca6396ef0b7 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
>>>>                                       struct cpumask *pd_cpus,
>>>>                                       struct task_struct *p)
>>>>   {
>>>> -       unsigned long busy_time = 0;
>>>>          int cpu;
>>>>
>>>> +       eenv->pd_busy_time = 0;
>>>> +
>>>>          for_each_cpu(cpu, pd_cpus) {
>>>>                  unsigned long util = cpu_util(cpu, p, -1, 0);
>>>>
>>>> -               busy_time += effective_cpu_util(cpu, util, NULL, NULL);
>>>> +               util = effective_cpu_util(cpu, util, NULL, NULL);
>>>> +               util = min(eenv->cpu_cap, util);
>>>> +               eenv->pd_busy_time += util;
>>>>          }
>>>> -
>>>> -       eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
>>>>   }
>>>>
>>>>   /*
>>>> --
>>>> 2.25.1
>>>>
>>>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-03 14:54         ` Vincent Guittot
@ 2024-07-04 23:56           ` Qais Yousef
  2024-07-05  6:25             ` Dietmar Eggemann
  2024-07-16 11:56             ` Xuewen Yan
  0 siblings, 2 replies; 20+ messages in thread
From: Qais Yousef @ 2024-07-04 23:56 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Xuewen Yan, dietmar.eggemann, mingo, peterz, juri.lelli, rostedt,
	bsegall, mgorman, bristot, vschneid, christian.loehle,
	vincent.donnefort, ke.wang, di.shen, xuewen.yan94, linux-kernel

On 07/03/24 16:54, Vincent Guittot wrote:
> On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
> >
> > On 07/02/24 15:25, Vincent Guittot wrote:
> >
> > > > >        *
> > > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > > >        * on available OPP of the system.
> > > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > > >        */
> > > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > > >
> > > > >       /*
> > > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        *     uclamp_max request.
> > > > >        *
> > > > >        *   which is what we're enforcing here. A task always fits if
> > > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > > >        *   the normal upmigration rules should withhold still.
> > > > >        *
> > > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > > >        *     2. The system is being saturated when we're operating near
> > > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > > >        */
> > > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > >
> > > > We should use capacity_orig here. We are checking if the CPU is the max
> > > > capacity CPU.
> > >
> > > I was also wondering what would be the best choice there. If we
> > > consider that we have only one performance domain with all max
> > > capacity cpus then I agree that we should keep capacity_orig as we
> > > can't find a better cpu that would fit. But is it always true that all
> > > max cpu are tied to the same perf domain ?
> >
> > Hmm I could be not thinking straight today. But the purpose of this check is to
> > ensure overutilized can trigger for the common case where a task will always
> > fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> > that corner case fits_capacity() should be the only fitness check otherwise
> > overutilized will never trigger by default.
> 
> Ok, so I messed up several use cases but in fact both are similar ...
> 
> if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
> SCHED_CAPACITY_SCALE
> then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
> (uclamp_max == SCHED_CAPACITY_SCALE) is false
> and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
> capacity_actual); is also false because (uclamp_max <=
> capacity_actual) is always false
> 
> if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
> SCHED_CAPACITY_SCALE
> then we are the same as with capacity_orig

Right. The condition is becoming less readable, but you're right it doesn't
change functionality.

Xuewen, could you put something in the commit message please to remind us in
the future that we thought about this and it is fine?

Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
  2024-07-04 17:01         ` Pierre Gondois
@ 2024-07-05  0:10           ` Qais Yousef
  0 siblings, 0 replies; 20+ messages in thread
From: Qais Yousef @ 2024-07-05  0:10 UTC (permalink / raw)
  To: Pierre Gondois
  Cc: Vincent Guittot, Xuewen Yan, Xuewen Yan, dietmar.eggemann, mingo,
	peterz, juri.lelli, rostedt, bsegall, mgorman, bristot, vschneid,
	christian.loehle, vincent.donnefort, ke.wang, di.shen,
	linux-kernel, Quentin Perret

On 07/04/24 19:01, Pierre Gondois wrote:

> I thought the EAS was comparing instantaneous power and not energy,
> i.e. how the energy computation is done:
> 
>   *             ps->power * cpu_max_freq
>   *   cpu_nrg = ------------------------ * cpu_util           (3)
>   *               ps->freq * scale_cpu
> 
> cpu_nrg should have the same dimension as ps->power (i.e. energy/s).
>  From this PoV, the energy computation should not take into account how
> much time a task is expected to run. But it might be a side discussion,

I had this discussion with Quentin recently as I indicated in another reply on
this thread I think we do have inaccuracies here in terms of how we try to
represent the running time (or busy time) of the cpu/pd.

AFAIU this is supposed to be computing energy, but we don't explicitly multiply
with any time value and there's an assumed time multiplication with
unspecified period. Maybe it's PELT HF, maybe it's something else. But I think
we do have sources of inaccuracies here, but I need to analyse and dig more.

The cpu_util/cpu_cap, or sum_util/pd->cap is assumed to represent the
percentage of time we are busy during this unspecified period

	cpu_nrg = ps->power x (cpu_util/cpu_cap) * T

where T is a constant value of some period, hence it is omitted from all
calculations. I'm of course thinking this is not best; but sure keeps it
a 'simple energy model' :)

I'm not looking in this area at the moment, but if someone is, it'd be great to
consider the impact of this properly.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-04 23:56           ` Qais Yousef
@ 2024-07-05  6:25             ` Dietmar Eggemann
  2024-07-16 11:56             ` Xuewen Yan
  1 sibling, 0 replies; 20+ messages in thread
From: Dietmar Eggemann @ 2024-07-05  6:25 UTC (permalink / raw)
  To: Qais Yousef, Vincent Guittot
  Cc: Xuewen Yan, mingo, peterz, juri.lelli, rostedt, bsegall, mgorman,
	bristot, vschneid, christian.loehle, vincent.donnefort, ke.wang,
	di.shen, xuewen.yan94, linux-kernel

On 05/07/2024 01:56, Qais Yousef wrote:
> On 07/03/24 16:54, Vincent Guittot wrote:
>> On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
>>>
>>> On 07/02/24 15:25, Vincent Guittot wrote:
>>>
>>>>>>        *
>>>>>>        * Only exception is for HW or cpufreq pressure since it has a direct impact
>>>>>>        * on available OPP of the system.
>>>>>> @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        * For uclamp_max, we can tolerate a drop in performance level as the
>>>>>>        * goal is to cap the task. So it's okay if it's getting less.
>>>>>>        */
>>>>>> -     capacity_orig = arch_scale_cpu_capacity(cpu);
>>>>>> +     capacity_actual = get_actual_cpu_capacity(cpu);
>>>>>>
>>>>>>       /*
>>>>>>        * We want to force a task to fit a cpu as implied by uclamp_max.
>>>>>> @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        *     uclamp_max request.
>>>>>>        *
>>>>>>        *   which is what we're enforcing here. A task always fits if
>>>>>> -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
>>>>>> +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
>>>>>>        *   the normal upmigration rules should withhold still.
>>>>>>        *
>>>>>>        *   Only exception is when we are on max capacity, then we need to be
>>>>>> @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
>>>>>>        *     2. The system is being saturated when we're operating near
>>>>>>        *        max capacity, it doesn't make sense to block overutilized.
>>>>>>        */
>>>>>> -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>>>>>> -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
>>>>>> +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>>>>>
>>>>> We should use capacity_orig here. We are checking if the CPU is the max
>>>>> capacity CPU.
>>>>
>>>> I was also wondering what would be the best choice there. If we
>>>> consider that we have only one performance domain with all max
>>>> capacity cpus then I agree that we should keep capacity_orig as we
>>>> can't find a better cpu that would fit. But is it always true that all
>>>> max cpu are tied to the same perf domain ?

This should be the case. Perf domains follow Frequency Domains (FD). So
even if we have the most powerful CPUs in 2 different FDs, only the CPUs
in the FD with the higher max OPP get SCHED_CAPACITY_SCALE assigned,
i.e. only they become big CPUs.

>>> Hmm I could be not thinking straight today. But the purpose of this check is to
>>> ensure overutilized can trigger for the common case where a task will always
>>> fit the max capacity cpu (whether it's on a single pd or multiple ones). For
>>> that corner case fits_capacity() should be the only fitness check otherwise
>>> overutilized will never trigger by default.
>>
>> Ok, so I messed up several use cases but in fact both are similar ...
>>
>> if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
>> SCHED_CAPACITY_SCALE
>> then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
>> (uclamp_max == SCHED_CAPACITY_SCALE) is false
>> and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
>> capacity_actual); is also false because (uclamp_max <=
>> capacity_actual) is always false
>>
>> if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
>> SCHED_CAPACITY_SCALE
>> then we are the same as with capacity_orig
> 
> Right. The condition is becoming less readable, but you're right it doesn't
> change functionality.

+1. 'capacity_actual < SCHED_CAPACITY_SCALE' on big CPUs just make them
non-big CPUs.

> Xuewen, could you put something in the commit message please to remind us in
> the future that we thought about this and it is fine?
> 
> Thanks!
> 
> --
> Qais Yousef


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu()
  2024-07-04 23:56           ` Qais Yousef
  2024-07-05  6:25             ` Dietmar Eggemann
@ 2024-07-16 11:56             ` Xuewen Yan
  1 sibling, 0 replies; 20+ messages in thread
From: Xuewen Yan @ 2024-07-16 11:56 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Vincent Guittot, Xuewen Yan, dietmar.eggemann, mingo, peterz,
	juri.lelli, rostedt, bsegall, mgorman, bristot, vschneid,
	christian.loehle, vincent.donnefort, ke.wang, di.shen,
	linux-kernel

On Fri, Jul 5, 2024 at 7:56 AM Qais Yousef <qyousef@layalina.io> wrote:
>
> On 07/03/24 16:54, Vincent Guittot wrote:
> > On Wed, 3 Jul 2024 at 13:54, Qais Yousef <qyousef@layalina.io> wrote:
> > >
> > > On 07/02/24 15:25, Vincent Guittot wrote:
> > >
> > > > > >        *
> > > > > >        * Only exception is for HW or cpufreq pressure since it has a direct impact
> > > > > >        * on available OPP of the system.
> > > > > > @@ -5011,7 +5011,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        * For uclamp_max, we can tolerate a drop in performance level as the
> > > > > >        * goal is to cap the task. So it's okay if it's getting less.
> > > > > >        */
> > > > > > -     capacity_orig = arch_scale_cpu_capacity(cpu);
> > > > > > +     capacity_actual = get_actual_cpu_capacity(cpu);
> > > > > >
> > > > > >       /*
> > > > > >        * We want to force a task to fit a cpu as implied by uclamp_max.
> > > > > > @@ -5039,7 +5039,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        *     uclamp_max request.
> > > > > >        *
> > > > > >        *   which is what we're enforcing here. A task always fits if
> > > > > > -      *   uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
> > > > > > +      *   uclamp_max <= capacity_actual. But when uclamp_max > capacity_actual,
> > > > > >        *   the normal upmigration rules should withhold still.
> > > > > >        *
> > > > > >        *   Only exception is when we are on max capacity, then we need to be
> > > > > > @@ -5050,8 +5050,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > > > >        *     2. The system is being saturated when we're operating near
> > > > > >        *        max capacity, it doesn't make sense to block overutilized.
> > > > > >        */
> > > > > > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > > > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > > > > +     uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > > >
> > > > > We should use capacity_orig here. We are checking if the CPU is the max
> > > > > capacity CPU.
> > > >
> > > > I was also wondering what would be the best choice there. If we
> > > > consider that we have only one performance domain with all max
> > > > capacity cpus then I agree that we should keep capacity_orig as we
> > > > can't find a better cpu that would fit. But is it always true that all
> > > > max cpu are tied to the same perf domain ?
> > >
> > > Hmm I could be not thinking straight today. But the purpose of this check is to
> > > ensure overutilized can trigger for the common case where a task will always
> > > fit the max capacity cpu (whether it's on a single pd or multiple ones). For
> > > that corner case fits_capacity() should be the only fitness check otherwise
> > > overutilized will never trigger by default.
> >
> > Ok, so I messed up several use cases but in fact both are similar ...
> >
> > if capacity_actual != SCHED_CAPACITY_SCALE and uclamp_max ==
> > SCHED_CAPACITY_SCALE
> > then uclamp_max_fits = (capacity_actual == SCHED_CAPACITY_SCALE) &&
> > (uclamp_max == SCHED_CAPACITY_SCALE) is false
> > and uclamp_max_fits = !uclamp_max_fits && (uclamp_max <=
> > capacity_actual); is also false because (uclamp_max <=
> > capacity_actual) is always false
> >
> > if capacity_actual == SCHED_CAPACITY_SCALE and uclamp_max ==
> > SCHED_CAPACITY_SCALE
> > then we are the same as with capacity_orig
>
> Right. The condition is becoming less readable, but you're right it doesn't
> change functionality.
>
> Xuewen, could you put something in the commit message please to remind us in
> the future that we thought about this and it is fine?

Sorry for the late reply. I will read all the emails carefully.
I am also studying the definition and usage of this function carefully.
I will send patch v3 as soon as I fully understand it.

Thanks!

>
> Thanks!
>
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-07-16 11:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24  8:20 [PATCH V2 0/2] sched/fair: Some improvements to feec() Xuewen Yan
2024-06-24  8:20 ` [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity Xuewen Yan
2024-06-25 13:04   ` Vincent Guittot
2024-06-27  2:02     ` Xuewen Yan
2024-06-27 16:15       ` Vincent Guittot
2024-06-28  1:39         ` Qais Yousef
2024-07-01 12:00         ` Xuewen Yan
2024-07-04 17:01         ` Pierre Gondois
2024-07-05  0:10           ` Qais Yousef
2024-06-24  8:20 ` [PATCH V2 2/2] sched/fair: Use actual_cpu_capacity everywhere in util_fits_cpu() Xuewen Yan
2024-06-25  8:46   ` Vincent Guittot
2024-06-28  1:28   ` Qais Yousef
2024-07-01 12:13     ` Xuewen Yan
2024-07-03 11:46       ` Qais Yousef
2024-07-02 13:25     ` Vincent Guittot
2024-07-03 11:54       ` Qais Yousef
2024-07-03 14:54         ` Vincent Guittot
2024-07-04 23:56           ` Qais Yousef
2024-07-05  6:25             ` Dietmar Eggemann
2024-07-16 11:56             ` Xuewen Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox