public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Fix a couple of corner cases in feec() when using uclamp_max
@ 2023-08-21 22:45 Qais Yousef
  2023-08-21 22:45 ` [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0 Qais Yousef
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Qais Yousef @ 2023-08-21 22:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia, Qais Yousef

Changes in v4:

	* Added Reviewed-by Vincent Guittot.
	* Updated sched_compute_energy_tp() to include  max_util and busy_time
	  as requested by Lukasz.

Changes in v3:

	* Fix sign comparison problem in patch 1 (Thanks Vincent!)
	* Simplify comparison and remove function in patch 2 (Thanks Dietmar!)

Changes in v2:

	* Use long instead of unsigned long to keep the comparison simple
	  in spite of being inconsistent with how capacity type.
	* Fix missing termination parenthesis that caused build error.
	* Rebase on latest tip/sched/core and Vincent v5 of Unlink misift patch.

v1 link: https://lore.kernel.org/lkml/20230129161444.1674958-1-qyousef@layalina.io/
v2 link: https://lore.kernel.org/lkml/20230205224318.2035646-1-qyousef@layalina.io/
v3 link: https://lore.kernel.org/lkml/20230717215717.309174-1-qyousef@layalina.io/

In v2 Dietmar has raised concerns about limitation in current EM calculations
that can end up packing more tasks on a cluster. While this is not ideal
situation and we need to fix it, but it is another independent problem that is
not introduced by this fix. I don't see a reason why we should couple them
rather than work on each problem independently. The packing behavior in
practice is actually not bad as if something is capped really hard, there's
a desire to keep them on this less performant clusters.

Patch 1 addresses a bug because forcing a task on a small CPU to honour
uclamp_max hint means we can end up with spare_capacity = 0; but the logic is
constructed such that spare_capacity = 0 leads to ignoring this CPU as
a candidate to compute_energy().

Patch 2 addresses a bug due to an optimization in feec() that could lead to
ignoring tasks whose uclamp_max = 0 but task_util(0) != 0.

Patch 3 adds a new tracepoint in compute_energy() as it was helpful in
debugging these two problems.

This is based on tip/sched/core.

Qais Yousef (3):
  sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  sched/uclamp: Ignore (util == 0) optimization in feec() when
    p_util_max = 0
  sched/tp: Add new tracepoint to track compute energy computation

 include/trace/events/sched.h |  5 +++++
 kernel/sched/core.c          |  1 +
 kernel/sched/fair.c          | 36 ++++++++++++------------------------
 3 files changed, 18 insertions(+), 24 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  2023-08-21 22:45 [PATCH v4 0/3] Fix a couple of corner cases in feec() when using uclamp_max Qais Yousef
@ 2023-08-21 22:45 ` Qais Yousef
  2023-08-23 10:30   ` Dietmar Eggemann
  2023-08-21 22:45 ` [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0 Qais Yousef
  2023-08-21 22:45 ` [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation Qais Yousef
  2 siblings, 1 reply; 10+ messages in thread
From: Qais Yousef @ 2023-08-21 22:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia, Qais Yousef

When uclamp_max is being used, the util of the task could be higher than
the spare capacity of the CPU, but due to uclamp_max value we force fit
it there.

The way the condition for checking for max_spare_cap in
find_energy_efficient_cpu() was constructed; it ignored any CPU that has
its spare_cap less than or _equal_ to max_spare_cap. Since we initialize
max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and
hence ending up never performing compute_energy() for this cluster and
missing an opportunity for a better energy efficient placement to honour
uclamp_max setting.

	max_spare_cap = 0;
	cpu_cap = capacity_of(cpu) - task_util(p);  // 0 if task_util(p) is high

	...

	util_fits_cpu(...);		// will return true if uclamp_max forces it to fit

	...

	// this logic will fail to update max_spare_cap_cpu if cpu_cap is 0
	if (cpu_cap > max_spare_cap) {
		max_spare_cap = cpu_cap;
		max_spare_cap_cpu = cpu;
	}

prev_spare_cap suffers from a similar problem.

Fix the logic by converting the variables into long and treating -1
value as 'not populated' instead of 0 which is a viable and correct
spare capacity value. We need to be careful signed comparison is used
when comparing with cpu_cap in one of the conditions.

Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 kernel/sched/fair.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b7445cd5af9..5da6538ed220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7707,11 +7707,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	for (; pd; pd = pd->next) {
 		unsigned long util_min = p_util_min, util_max = p_util_max;
 		unsigned long cpu_cap, cpu_thermal_cap, util;
-		unsigned long cur_delta, max_spare_cap = 0;
+		long prev_spare_cap = -1, max_spare_cap = -1;
 		unsigned long rq_util_min, rq_util_max;
-		unsigned long prev_spare_cap = 0;
+		unsigned long cur_delta, base_energy;
 		int max_spare_cap_cpu = -1;
-		unsigned long base_energy;
 		int fits, max_fits = -1;
 
 		cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
@@ -7774,7 +7773,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 				prev_spare_cap = cpu_cap;
 				prev_fits = fits;
 			} else if ((fits > max_fits) ||
-				   ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
+				   ((fits == max_fits) && ((long)cpu_cap > max_spare_cap))) {
 				/*
 				 * Find the CPU with the maximum spare capacity
 				 * among the remaining CPUs in the performance
@@ -7786,7 +7785,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 			}
 		}
 
-		if (max_spare_cap_cpu < 0 && prev_spare_cap == 0)
+		if (max_spare_cap_cpu < 0 && prev_spare_cap < 0)
 			continue;
 
 		eenv_pd_busy_time(&eenv, cpus, p);
@@ -7794,7 +7793,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 		base_energy = compute_energy(&eenv, pd, cpus, p, -1);
 
 		/* Evaluate the energy impact of using prev_cpu. */
-		if (prev_spare_cap > 0) {
+		if (prev_spare_cap > -1) {
 			prev_delta = compute_energy(&eenv, pd, cpus, p,
 						    prev_cpu);
 			/* CPU utilization has changed */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0
  2023-08-21 22:45 [PATCH v4 0/3] Fix a couple of corner cases in feec() when using uclamp_max Qais Yousef
  2023-08-21 22:45 ` [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0 Qais Yousef
@ 2023-08-21 22:45 ` Qais Yousef
  2023-08-23 10:30   ` Dietmar Eggemann
  2023-08-21 22:45 ` [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation Qais Yousef
  2 siblings, 1 reply; 10+ messages in thread
From: Qais Yousef @ 2023-08-21 22:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia, Qais Yousef

find_energy_efficient_cpu() bails out early if effective util of the
task is 0 as the delta at this point will be zero and there's nothing
for EAS to do. When uclamp is being used, this could lead to wrong
decisions when uclamp_max is set to 0. In this case the task is capped
to performance point 0, but it is actually running and consuming energy
and we can benefit from EAS energy calculations.

Rework the condition so that it bails out for when util is actually 0 or
uclamp_min is requesting a higher performance point.

We can do that without needing to use uclamp_task_util(); remove it.

Fixes: d81304bc6193 ("sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition")
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 kernel/sched/fair.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5da6538ed220..e19a36e7b433 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4571,22 +4571,6 @@ static inline unsigned long task_util_est(struct task_struct *p)
 	return max(task_util(p), _task_util_est(p));
 }
 
-#ifdef CONFIG_UCLAMP_TASK
-static inline unsigned long uclamp_task_util(struct task_struct *p,
-					     unsigned long uclamp_min,
-					     unsigned long uclamp_max)
-{
-	return clamp(task_util_est(p), uclamp_min, uclamp_max);
-}
-#else
-static inline unsigned long uclamp_task_util(struct task_struct *p,
-					     unsigned long uclamp_min,
-					     unsigned long uclamp_max)
-{
-	return task_util_est(p);
-}
-#endif
-
 static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
 				    struct task_struct *p)
 {
@@ -7699,7 +7683,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	target = prev_cpu;
 
 	sync_entity_load_avg(&p->se);
-	if (!uclamp_task_util(p, p_util_min, p_util_max))
+	if (!task_util_est(p) && p_util_min == 0)
 		goto unlock;
 
 	eenv_task_busy_time(&eenv, p, prev_cpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation
  2023-08-21 22:45 [PATCH v4 0/3] Fix a couple of corner cases in feec() when using uclamp_max Qais Yousef
  2023-08-21 22:45 ` [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0 Qais Yousef
  2023-08-21 22:45 ` [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0 Qais Yousef
@ 2023-08-21 22:45 ` Qais Yousef
  2023-08-23 10:30   ` Dietmar Eggemann
  2 siblings, 1 reply; 10+ messages in thread
From: Qais Yousef @ 2023-08-21 22:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia, Qais Yousef

It was useful to track feec() placement decision and debug the spare
capacity and optimization issues vs uclamp_max.

Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 include/trace/events/sched.h | 5 +++++
 kernel/sched/core.c          | 1 +
 kernel/sched/fair.c          | 7 ++++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index fbb99a61f714..a13d5d06be9d 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -735,6 +735,11 @@ DECLARE_TRACE(sched_update_nr_running_tp,
 	TP_PROTO(struct rq *rq, int change),
 	TP_ARGS(rq, change));
 
+DECLARE_TRACE(sched_compute_energy_tp,
+	TP_PROTO(struct task_struct *p, int dst_cpu, unsigned long energy,
+		 unsigned long max_util, unsigned long busy_time),
+	TP_ARGS(p, dst_cpu, energy, max_util, busy_time));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index efe3848978a0..36c60ad9966a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -114,6 +114,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e19a36e7b433..779c285203e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7604,11 +7604,16 @@ compute_energy(struct energy_env *eenv, struct perf_domain *pd,
 {
 	unsigned long max_util = eenv_pd_max_util(eenv, pd_cpus, p, dst_cpu);
 	unsigned long busy_time = eenv->pd_busy_time;
+	unsigned long energy;
 
 	if (dst_cpu >= 0)
 		busy_time = min(eenv->pd_cap, busy_time + eenv->task_busy_time);
 
-	return em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
+	energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
+
+	trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
+
+	return energy;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  2023-08-21 22:45 ` [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0 Qais Yousef
@ 2023-08-23 10:30   ` Dietmar Eggemann
  2023-08-26 20:41     ` Qais Yousef
  0 siblings, 1 reply; 10+ messages in thread
From: Dietmar Eggemann @ 2023-08-23 10:30 UTC (permalink / raw)
  To: Qais Yousef, Ingo Molnar, Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia

On 22/08/2023 00:45, Qais Yousef wrote:
> When uclamp_max is being used, the util of the task could be higher than
> the spare capacity of the CPU, but due to uclamp_max value we force fit
> it there.
> 
> The way the condition for checking for max_spare_cap in
> find_energy_efficient_cpu() was constructed; it ignored any CPU that has
> its spare_cap less than or _equal_ to max_spare_cap. Since we initialize
> max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and
> hence ending up never performing compute_energy() for this cluster and
> missing an opportunity for a better energy efficient placement to honour
> uclamp_max setting.
> 
> 	max_spare_cap = 0;
> 	cpu_cap = capacity_of(cpu) - task_util(p);  // 0 if task_util(p) is high

Nitpick:

s/task_util(p)/cpu_util(cpu, p, cpu, ...) which is

max(cpu_util + task_util, cpu_util_est + task_util_est)

> 
> 	...
> 
> 	util_fits_cpu(...);		// will return true if uclamp_max forces it to fit
> 
> 	...
> 
> 	// this logic will fail to update max_spare_cap_cpu if cpu_cap is 0
> 	if (cpu_cap > max_spare_cap) {
> 		max_spare_cap = cpu_cap;
> 		max_spare_cap_cpu = cpu;
> 	}
> 
> prev_spare_cap suffers from a similar problem.
> 
> Fix the logic by converting the variables into long and treating -1
> value as 'not populated' instead of 0 which is a viable and correct
> spare capacity value. We need to be careful signed comparison is used
> when comparing with cpu_cap in one of the conditions.
> 
> Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> ---
>  kernel/sched/fair.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0b7445cd5af9..5da6538ed220 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7707,11 +7707,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  	for (; pd; pd = pd->next) {
>  		unsigned long util_min = p_util_min, util_max = p_util_max;
>  		unsigned long cpu_cap, cpu_thermal_cap, util;
> -		unsigned long cur_delta, max_spare_cap = 0;
> +		long prev_spare_cap = -1, max_spare_cap = -1;
>  		unsigned long rq_util_min, rq_util_max;
> -		unsigned long prev_spare_cap = 0;
> +		unsigned long cur_delta, base_energy;
>  		int max_spare_cap_cpu = -1;
> -		unsigned long base_energy;
>  		int fits, max_fits = -1;
>  
>  		cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
> @@ -7774,7 +7773,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  				prev_spare_cap = cpu_cap;
>  				prev_fits = fits;
>  			} else if ((fits > max_fits) ||
> -				   ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
> +				   ((fits == max_fits) && ((long)cpu_cap > max_spare_cap))) {
>  				/*
>  				 * Find the CPU with the maximum spare capacity
>  				 * among the remaining CPUs in the performance
> @@ -7786,7 +7785,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  			}
>  		}
>  
> -		if (max_spare_cap_cpu < 0 && prev_spare_cap == 0)
> +		if (max_spare_cap_cpu < 0 && prev_spare_cap < 0)
>  			continue;
>  
>  		eenv_pd_busy_time(&eenv, cpus, p);
> @@ -7794,7 +7793,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  		base_energy = compute_energy(&eenv, pd, cpus, p, -1);
>  
>  		/* Evaluate the energy impact of using prev_cpu. */
> -		if (prev_spare_cap > 0) {
> +		if (prev_spare_cap > -1) {
>  			prev_delta = compute_energy(&eenv, pd, cpus, p,
>  						    prev_cpu);
>  			/* CPU utilization has changed */

We still need a solution to deal with situations in which `pd + task
contribution` > `pd_capacity`:

  compute_energy()

    if (dst_cpu >= 0)
     busy_time = min(pd_capacity, pd_busy_time + task_busy_time);
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                  pd + task contribution

busy_time is based on util (ENERGY_UTIL), not on the uclamp values
(FREQUENCY_UTIL) we try to fit into a PD (and finally onto a CPU).

With that as a reminder for us and the change in the cover letter:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0
  2023-08-21 22:45 ` [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0 Qais Yousef
@ 2023-08-23 10:30   ` Dietmar Eggemann
  0 siblings, 0 replies; 10+ messages in thread
From: Dietmar Eggemann @ 2023-08-23 10:30 UTC (permalink / raw)
  To: Qais Yousef, Ingo Molnar, Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia

On 22/08/2023 00:45, Qais Yousef wrote:
> find_energy_efficient_cpu() bails out early if effective util of the
> task is 0 as the delta at this point will be zero and there's nothing
> for EAS to do. When uclamp is being used, this could lead to wrong
> decisions when uclamp_max is set to 0. In this case the task is capped

Does uclamp_max plays a role here? We check util and uclamp_min in this
condition.

> to performance point 0, but it is actually running and consuming energy
> and we can benefit from EAS energy calculations.
> 
> Rework the condition so that it bails out for when util is actually 0 or
> uclamp_min is requesting a higher performance point.

I do get the condition:

> +	if (!task_util_est(p) && p_util_min == 0)
>  		goto unlock;

which is !(task_util_est(p) || p_util_min)

But the text then should be '... bails out for when util is actually 0
and uclamp_min is 0 too'? Or 'uclamp_min is not requesting ...'.

> We can do that without needing to use uclamp_task_util(); remove it.
> 
> Fixes: d81304bc6193 ("sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition")
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> ---
>  kernel/sched/fair.c | 18 +-----------------
>  1 file changed, 1 insertion(+), 17 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5da6538ed220..e19a36e7b433 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4571,22 +4571,6 @@ static inline unsigned long task_util_est(struct task_struct *p)
>  	return max(task_util(p), _task_util_est(p));
>  }
>  
> -#ifdef CONFIG_UCLAMP_TASK
> -static inline unsigned long uclamp_task_util(struct task_struct *p,
> -					     unsigned long uclamp_min,
> -					     unsigned long uclamp_max)
> -{
> -	return clamp(task_util_est(p), uclamp_min, uclamp_max);
> -}
> -#else
> -static inline unsigned long uclamp_task_util(struct task_struct *p,
> -					     unsigned long uclamp_min,
> -					     unsigned long uclamp_max)
> -{
> -	return task_util_est(p);
> -}
> -#endif
> -
>  static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
>  				    struct task_struct *p)
>  {
> @@ -7699,7 +7683,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  	target = prev_cpu;
>  
>  	sync_entity_load_avg(&p->se);
> -	if (!uclamp_task_util(p, p_util_min, p_util_max))
> +	if (!task_util_est(p) && p_util_min == 0)
>  		goto unlock;
>  
>  	eenv_task_busy_time(&eenv, p, prev_cpu);

With the question about the content of the patch header in mind:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation
  2023-08-21 22:45 ` [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation Qais Yousef
@ 2023-08-23 10:30   ` Dietmar Eggemann
  0 siblings, 0 replies; 10+ messages in thread
From: Dietmar Eggemann @ 2023-08-23 10:30 UTC (permalink / raw)
  To: Qais Yousef, Ingo Molnar, Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Lukasz Luba, Wei Wang, Xuewen Yan, Hank,
	Jonathan JMChen, Hongyan Xia

On 22/08/2023 00:45, Qais Yousef wrote:
> It was useful to track feec() placement decision and debug the spare
> capacity and optimization issues vs uclamp_max.
> 
> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> ---
>  include/trace/events/sched.h | 5 +++++
>  kernel/sched/core.c          | 1 +
>  kernel/sched/fair.c          | 7 ++++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index fbb99a61f714..a13d5d06be9d 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -735,6 +735,11 @@ DECLARE_TRACE(sched_update_nr_running_tp,
>  	TP_PROTO(struct rq *rq, int change),
>  	TP_ARGS(rq, change));
>  
> +DECLARE_TRACE(sched_compute_energy_tp,
> +	TP_PROTO(struct task_struct *p, int dst_cpu, unsigned long energy,
> +		 unsigned long max_util, unsigned long busy_time),
> +	TP_ARGS(p, dst_cpu, energy, max_util, busy_time));
> +
>  #endif /* _TRACE_SCHED_H */
>  
>  /* This part must be outside protection */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index efe3848978a0..36c60ad9966a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -114,6 +114,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
>  
>  DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e19a36e7b433..779c285203e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7604,11 +7604,16 @@ compute_energy(struct energy_env *eenv, struct perf_domain *pd,
>  {
>  	unsigned long max_util = eenv_pd_max_util(eenv, pd_cpus, p, dst_cpu);
>  	unsigned long busy_time = eenv->pd_busy_time;
> +	unsigned long energy;
>  
>  	if (dst_cpu >= 0)
>  		busy_time = min(eenv->pd_cap, busy_time + eenv->task_busy_time);
>  
> -	return em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
> +	energy = em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
> +
> +	trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
> +
> +	return energy;
>  }
>  
>  /*

I will make sure that this gets integrated into our trace module in Lisa
https://github.com/ARM-software/lisa .

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  2023-08-23 10:30   ` Dietmar Eggemann
@ 2023-08-26 20:41     ` Qais Yousef
  2023-09-14  6:39       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Qais Yousef @ 2023-08-26 20:41 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, linux-kernel,
	Lukasz Luba, Wei Wang, Xuewen Yan, Hank, Jonathan JMChen,
	Hongyan Xia

On 08/23/23 12:30, Dietmar Eggemann wrote:
> On 22/08/2023 00:45, Qais Yousef wrote:
> > When uclamp_max is being used, the util of the task could be higher than
> > the spare capacity of the CPU, but due to uclamp_max value we force fit
> > it there.
> > 
> > The way the condition for checking for max_spare_cap in
> > find_energy_efficient_cpu() was constructed; it ignored any CPU that has
> > its spare_cap less than or _equal_ to max_spare_cap. Since we initialize
> > max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and
> > hence ending up never performing compute_energy() for this cluster and
> > missing an opportunity for a better energy efficient placement to honour
> > uclamp_max setting.
> > 
> > 	max_spare_cap = 0;
> > 	cpu_cap = capacity_of(cpu) - task_util(p);  // 0 if task_util(p) is high
> 
> Nitpick:
> 
> s/task_util(p)/cpu_util(cpu, p, cpu, ...) which is
> 
> max(cpu_util + task_util, cpu_util_est + task_util_est)
> 
> > 
> > 	...
> > 
> > 	util_fits_cpu(...);		// will return true if uclamp_max forces it to fit
> > 
> > 	...
> > 
> > 	// this logic will fail to update max_spare_cap_cpu if cpu_cap is 0
> > 	if (cpu_cap > max_spare_cap) {
> > 		max_spare_cap = cpu_cap;
> > 		max_spare_cap_cpu = cpu;
> > 	}
> > 
> > prev_spare_cap suffers from a similar problem.
> > 
> > Fix the logic by converting the variables into long and treating -1
> > value as 'not populated' instead of 0 which is a viable and correct
> > spare capacity value. We need to be careful signed comparison is used
> > when comparing with cpu_cap in one of the conditions.
> > 
> > Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
> > ---
> >  kernel/sched/fair.c | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0b7445cd5af9..5da6538ed220 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7707,11 +7707,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >  	for (; pd; pd = pd->next) {
> >  		unsigned long util_min = p_util_min, util_max = p_util_max;
> >  		unsigned long cpu_cap, cpu_thermal_cap, util;
> > -		unsigned long cur_delta, max_spare_cap = 0;
> > +		long prev_spare_cap = -1, max_spare_cap = -1;
> >  		unsigned long rq_util_min, rq_util_max;
> > -		unsigned long prev_spare_cap = 0;
> > +		unsigned long cur_delta, base_energy;
> >  		int max_spare_cap_cpu = -1;
> > -		unsigned long base_energy;
> >  		int fits, max_fits = -1;
> >  
> >  		cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
> > @@ -7774,7 +7773,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >  				prev_spare_cap = cpu_cap;
> >  				prev_fits = fits;
> >  			} else if ((fits > max_fits) ||
> > -				   ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
> > +				   ((fits == max_fits) && ((long)cpu_cap > max_spare_cap))) {
> >  				/*
> >  				 * Find the CPU with the maximum spare capacity
> >  				 * among the remaining CPUs in the performance
> > @@ -7786,7 +7785,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >  			}
> >  		}
> >  
> > -		if (max_spare_cap_cpu < 0 && prev_spare_cap == 0)
> > +		if (max_spare_cap_cpu < 0 && prev_spare_cap < 0)
> >  			continue;
> >  
> >  		eenv_pd_busy_time(&eenv, cpus, p);
> > @@ -7794,7 +7793,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >  		base_energy = compute_energy(&eenv, pd, cpus, p, -1);
> >  
> >  		/* Evaluate the energy impact of using prev_cpu. */
> > -		if (prev_spare_cap > 0) {
> > +		if (prev_spare_cap > -1) {
> >  			prev_delta = compute_energy(&eenv, pd, cpus, p,
> >  						    prev_cpu);
> >  			/* CPU utilization has changed */
> 
> We still need a solution to deal with situations in which `pd + task
> contribution` > `pd_capacity`:
> 
>   compute_energy()
> 
>     if (dst_cpu >= 0)
>      busy_time = min(pd_capacity, pd_busy_time + task_busy_time);
>                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                                   pd + task contribution
> 
> busy_time is based on util (ENERGY_UTIL), not on the uclamp values
> (FREQUENCY_UTIL) we try to fit into a PD (and finally onto a CPU).
> 
> With that as a reminder for us and the change in the cover letter:

This is not being ignored, but I don't see this as an urgent problem too. There
are more pressing issues that make uclamp_max not effective in practice, and
this ain't a bottleneck yet. Actually it might be doing a good thing as there's
a desire to keep those tasks away on smallest CPU. But we shall visit this
later for sure, don't worry :-) Ultimately we want EAS algorithm to be the
judge of best placement for sure.

I hope to send patches to address load balancer and max aggregation issues in
the coming weeks.

> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

Thanks for the review!

I will wait for the maintainers to see if they would like a v5 to address the
nitpicks or it's actually good enough and happy to pick this up. I think the
commit messages explain the problem clear enough and doesn't warrant sending
a new version. But happy to do so if there's insistence :-)


Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  2023-08-26 20:41     ` Qais Yousef
@ 2023-09-14  6:39       ` Ingo Molnar
  2023-09-16 19:27         ` Qais Yousef
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2023-09-14  6:39 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Dietmar Eggemann, Peter Zijlstra, Vincent Guittot, linux-kernel,
	Lukasz Luba, Wei Wang, Xuewen Yan, Hank, Jonathan JMChen,
	Hongyan Xia


* Qais Yousef <qyousef@layalina.io> wrote:

> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> Thanks for the review!
> 
> I will wait for the maintainers to see if they would like a v5 to address 
> the nitpicks or it's actually good enough and happy to pick this up. I 
> think the commit messages explain the problem clear enough and doesn't 
> warrant sending a new version. But happy to do so if there's insistence 
> :-)

Yeah, please always do that: sensible review replies with actionable 
feedback cause a semi-atomatic "mark this thread as read, there will be a 
next version" reflexive action from maintainers, especially if a series is 
in its 4th iteration already...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0
  2023-09-14  6:39       ` Ingo Molnar
@ 2023-09-16 19:27         ` Qais Yousef
  0 siblings, 0 replies; 10+ messages in thread
From: Qais Yousef @ 2023-09-16 19:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dietmar Eggemann, Peter Zijlstra, Vincent Guittot, linux-kernel,
	Lukasz Luba, Wei Wang, Xuewen Yan, Hank, Jonathan JMChen,
	Hongyan Xia

On 09/14/23 08:39, Ingo Molnar wrote:
> 
> * Qais Yousef <qyousef@layalina.io> wrote:
> 
> > > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > 
> > Thanks for the review!
> > 
> > I will wait for the maintainers to see if they would like a v5 to address 
> > the nitpicks or it's actually good enough and happy to pick this up. I 
> > think the commit messages explain the problem clear enough and doesn't 
> > warrant sending a new version. But happy to do so if there's insistence 
> > :-)
> 
> Yeah, please always do that: sensible review replies with actionable 
> feedback cause a semi-atomatic "mark this thread as read, there will be a 
> next version" reflexive action from maintainers, especially if a series is 
> in its 4th iteration already...

Apologies. I did realize that and intended to send a new version last weekend,
but failed to get to it. I hope to be able to do so today or tomorrow.


Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-16 19:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 22:45 [PATCH v4 0/3] Fix a couple of corner cases in feec() when using uclamp_max Qais Yousef
2023-08-21 22:45 ` [PATCH v4 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0 Qais Yousef
2023-08-23 10:30   ` Dietmar Eggemann
2023-08-26 20:41     ` Qais Yousef
2023-09-14  6:39       ` Ingo Molnar
2023-09-16 19:27         ` Qais Yousef
2023-08-21 22:45 ` [PATCH v4 2/3] sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0 Qais Yousef
2023-08-23 10:30   ` Dietmar Eggemann
2023-08-21 22:45 ` [PATCH v4 3/3] sched/tp: Add new tracepoint to track compute energy computation Qais Yousef
2023-08-23 10:30   ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox