* [PATCH v6 0/3] sched/fair: Limit access to overutilized
@ 2024-03-07 8:57 Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-07 8:57 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot
Cc: sshegde, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
When running an ISV workload on a large system (240 Cores, SMT8), it was
observed from perf profile that newidle_balance and enqueue_task_fair
were consuming more cycles. Perf annotate showed that most of the time
was spent on accessing overutilized field of root domain. Similar perf profile
was simulated by making some changes to stress-ng --wait. Both
newidle_balance and enqueue_task_fair consume close to 5-7%.
Overutilized was added for EAS(Energy aware scheduler) to decide whether
to do load balance or not. Simultaneous access to overutilized by
multiple CPUs lead cache invalidations due to true sharing. Updating
overutilized is not required for non-EAS platforms. Since overutilized and
overload are part of the same cacheline, there is false sharing as well.
Patch 1/3 - Main patch. It helps in reducing the above said issue.
Both the functions don't show up in the profile. With patch comparison is in
changelog. With the patch stated problem in the ISV workload also got
solved and throughput has improved.
Patch 2/3 - Code refactoring to use the helper function instead of
direct access of the field. Keeping this patch so patch 3/3 becomes
easier to understand. Depends on 1/3.
Patch 3/3 - Refactoring the code since most of the patterns are observed
are eas && !overutilzed. Changed the helper function accordingly.
Depends on 2/3.
More details can be found in cover letter of v1.
v5 -> v6:
- Mades minor changes for !CONFIG_SMP cases as pointed out by Vincent.
v4 -> v5:
- Added EAS check inside helper functions instead of sprinkling it
around. EAS check is static branch at best. So that keeps the code
simpler.
- added EAS check inside cpu_overutilized as suggested by Qais.
- Added patch 3 since most of the code does eas && !overutilized
pattern as suggested by Vincent.
v3 -> v4:
- corrected a mistake where EAS check was missed.
v2 -> v3:
- Pierre and Dietmar suggested we could add one more EAS check before
calling cpu_overutilized. That makes sense since that value is not used
anyway in Non-EAS case.
- Refactored the code as dietmar suggested to avoid additional call to
sched_energy_enabled().
- Minor edits to change log.
v1 -> v2:
Chen Yu pointed out minor issue in code. Corrected that code and updated
the changelog.
v1: https://lore.kernel.org/lkml/20240223150707.410417-1-sshegde@linux.ibm.com/
v2: https://lore.kernel.org/lkml/20240228071621.602596-1-sshegde@linux.ibm.com/
v3: https://lore.kernel.org/lkml/20240229104010.747411-1-sshegde@linux.ibm.com/
v4: https://lore.kernel.org/lkml/20240301151725.874604-1-sshegde@linux.ibm.com/
v5: https://lore.kernel.org/lkml/20240306102454.341014-1-sshegde@linux.ibm.com/
Shrikanth Hegde (3):
sched/fair: Add EAS checks before updating overutilized
sched/fair: Use helper function to access rd->overutilized
sched/fair: Combine EAS check with overutilized access
kernel/sched/fair.c | 72 ++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 27 deletions(-)
--
2.39.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized
2024-03-07 8:57 [PATCH v6 0/3] sched/fair: Limit access to overutilized Shrikanth Hegde
@ 2024-03-07 8:57 ` Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Add EAS checks before updating root_domain::overutilized tip-bot2 for Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized Shrikanth Hegde
` (2 subsequent siblings)
3 siblings, 2 replies; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-07 8:57 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot
Cc: sshegde, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
Overutilized field of root domain is only used for EAS(energy aware scheduler)
to decide whether to do load balance or not. It is not used if EAS
not possible.
Currently enqueue_task_fair and task_tick_fair accesses, sometime updates
this field. In update_sd_lb_stats it is updated often. This causes cache
contention due to true sharing and burns a lot of cycles. overload and
overutilized are part of the same cacheline. Updating it often invalidates
the cacheline. That causes access to overload to slow down due to
false sharing. Hence add EAS check before accessing/updating this field.
EAS check is optimized at compile time or it is a static branch.
Hence it shouldn't cost much.
With the patch, both enqueue_task_fair and newidle_balance don't show
up as hot routines in perf profile.
6.8-rc4:
7.18% swapper [kernel.vmlinux] [k] enqueue_task_fair
6.78% s [kernel.vmlinux] [k] newidle_balance
+patch:
0.14% swapper [kernel.vmlinux] [k] enqueue_task_fair
0.00% swapper [kernel.vmlinux] [k] newidle_balance
Minor change: trace_sched_overutilized_tp expect that second argument to
be bool. So do a int to bool conversion for that.
Fixes: 2802bf3cd936 ("sched/fair: Add over-utilization/tipping point indicator")
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 53 +++++++++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a16129f9a5c..5aa6e918598c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6663,22 +6663,42 @@ static inline void hrtick_update(struct rq *rq)
#ifdef CONFIG_SMP
static inline bool cpu_overutilized(int cpu)
{
- unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
- unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
+ unsigned long rq_util_min, rq_util_max;
+
+ if (!sched_energy_enabled())
+ return false;
+
+ rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
+ rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
/* Return true only if the utilization doesn't fit CPU's capacity */
return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
}
-static inline void update_overutilized_status(struct rq *rq)
+static inline void set_rd_overutilized_status(struct root_domain *rd,
+ unsigned int status)
{
- if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
- WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
- }
+ if (!sched_energy_enabled())
+ return;
+
+ WRITE_ONCE(rd->overutilized, status);
+ trace_sched_overutilized_tp(rd, !!status);
+}
+
+static inline void check_update_overutilized_status(struct rq *rq)
+{
+ /*
+ * overutilized field is used for load balancing decisions only
+ * if energy aware scheduler is being used
+ */
+ if (!sched_energy_enabled())
+ return;
+
+ if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
+ set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
}
#else
-static inline void update_overutilized_status(struct rq *rq) { }
+static inline void check_update_overutilized_status(struct rq *rq) { }
#endif
/* Runqueue only has SCHED_IDLE tasks enqueued */
@@ -6779,7 +6799,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* and the following generally works well enough in practice.
*/
if (!task_new)
- update_overutilized_status(rq);
+ check_update_overutilized_status(rq);
enqueue_throttle:
assert_list_leaf_cfs_rq(rq);
@@ -10596,19 +10616,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
if (!env->sd->parent) {
- struct root_domain *rd = env->dst_rq->rd;
-
/* update overload indicator if we are at root domain */
- WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
+ WRITE_ONCE(env->dst_rq->rd->overload, sg_status & SG_OVERLOAD);
/* Update over-utilization (tipping point, U >= 0) indicator */
- WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
+ set_rd_overutilized_status(env->dst_rq->rd,
+ sg_status & SG_OVERUTILIZED);
} else if (sg_status & SG_OVERUTILIZED) {
- struct root_domain *rd = env->dst_rq->rd;
-
- WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
+ set_rd_overutilized_status(env->dst_rq->rd, SG_OVERUTILIZED);
}
update_idle_cpu_scan(env, sum_util);
@@ -12609,7 +12624,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
task_tick_numa(rq, curr);
update_misfit_status(curr, rq);
- update_overutilized_status(task_rq(curr));
+ check_update_overutilized_status(task_rq(curr));
task_tick_core(rq, curr);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized
2024-03-07 8:57 [PATCH v6 0/3] sched/fair: Limit access to overutilized Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
@ 2024-03-07 8:57 ` Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Introduce is_rd_overutilized() helper function to access root_domain::overutilized tip-bot2 for Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access Shrikanth Hegde
2024-03-14 16:47 ` [PATCH v6 0/3] sched/fair: Limit access to overutilized Valentin Schneider
3 siblings, 2 replies; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-07 8:57 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot
Cc: sshegde, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
Overutilized field is accessed directly in multiple places.
So it could use a helper function. That way one might be more
informed that it needs to be used only in case of EAS.
No change in functionality intended.
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5aa6e918598c..87e08a252f94 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6675,6 +6675,15 @@ static inline bool cpu_overutilized(int cpu)
return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
}
+/*
+ * Ensure that caller can do EAS. overutilized value
+ * make sense only if EAS is enabled
+ */
+static inline int is_rd_overutilized(struct root_domain *rd)
+{
+ return READ_ONCE(rd->overutilized);
+}
+
static inline void set_rd_overutilized_status(struct root_domain *rd,
unsigned int status)
{
@@ -6694,7 +6703,7 @@ static inline void check_update_overutilized_status(struct rq *rq)
if (!sched_energy_enabled())
return;
- if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
+ if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
}
#else
@@ -7980,7 +7989,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
rcu_read_lock();
pd = rcu_dereference(rd->pd);
- if (!pd || READ_ONCE(rd->overutilized))
+ if (!pd || is_rd_overutilized(rd))
goto unlock;
/*
@@ -10863,7 +10872,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
if (sched_energy_enabled()) {
struct root_domain *rd = env->dst_rq->rd;
- if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
+ if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
goto out_balanced;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-07 8:57 [PATCH v6 0/3] sched/fair: Limit access to overutilized Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized Shrikanth Hegde
@ 2024-03-07 8:57 ` Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 7:58 ` Ingo Molnar
2024-03-14 16:47 ` [PATCH v6 0/3] sched/fair: Limit access to overutilized Valentin Schneider
3 siblings, 2 replies; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-07 8:57 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot
Cc: sshegde, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
Access to overutilized is always used with sched_energy_enabled in
the pattern:
if (sched_energy_enabled && !overutilized)
do something
So modify the helper function to return this pattern. This is more
readable code as it would say, do something when root domain is not
overutilized.
No change in functionality intended.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 87e08a252f94..bcda18a2ccfe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6676,12 +6676,11 @@ static inline bool cpu_overutilized(int cpu)
}
/*
- * Ensure that caller can do EAS. overutilized value
- * make sense only if EAS is enabled
+ * overutilized value make sense only if EAS is enabled
*/
-static inline int is_rd_overutilized(struct root_domain *rd)
+static inline int is_rd_not_overutilized(struct root_domain *rd)
{
- return READ_ONCE(rd->overutilized);
+ return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
}
static inline void set_rd_overutilized_status(struct root_domain *rd,
@@ -6700,10 +6699,8 @@ static inline void check_update_overutilized_status(struct rq *rq)
* overutilized field is used for load balancing decisions only
* if energy aware scheduler is being used
*/
- if (!sched_energy_enabled())
- return;
- if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
+ if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
}
#else
@@ -7989,7 +7986,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
rcu_read_lock();
pd = rcu_dereference(rd->pd);
- if (!pd || is_rd_overutilized(rd))
+ if (!pd)
goto unlock;
/*
@@ -8192,7 +8189,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
cpumask_test_cpu(cpu, p->cpus_ptr))
return cpu;
- if (sched_energy_enabled()) {
+ if (is_rd_not_overutilized(this_rq()->rd)) {
new_cpu = find_energy_efficient_cpu(p, prev_cpu);
if (new_cpu >= 0)
return new_cpu;
@@ -10869,12 +10866,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
if (busiest->group_type == group_misfit_task)
goto force_balance;
- if (sched_energy_enabled()) {
- struct root_domain *rd = env->dst_rq->rd;
-
- if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
- goto out_balanced;
- }
+ if (is_rd_not_overutilized(env->dst_rq->rd) &&
+ rcu_dereference(env->dst_rq->rd->pd))
+ goto out_balanced;
/* ASYM feature bypasses nice load balance check */
if (busiest->group_type == group_asym_packing)
--
2.39.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
@ 2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Add EAS checks before updating root_domain::overutilized tip-bot2 for Shrikanth Hegde
1 sibling, 0 replies; 15+ messages in thread
From: Vincent Guittot @ 2024-03-07 16:50 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: mingo, peterz, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
On Thu, 7 Mar 2024 at 09:57, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> Overutilized field of root domain is only used for EAS(energy aware scheduler)
> to decide whether to do load balance or not. It is not used if EAS
> not possible.
>
> Currently enqueue_task_fair and task_tick_fair accesses, sometime updates
> this field. In update_sd_lb_stats it is updated often. This causes cache
> contention due to true sharing and burns a lot of cycles. overload and
> overutilized are part of the same cacheline. Updating it often invalidates
> the cacheline. That causes access to overload to slow down due to
> false sharing. Hence add EAS check before accessing/updating this field.
> EAS check is optimized at compile time or it is a static branch.
> Hence it shouldn't cost much.
>
> With the patch, both enqueue_task_fair and newidle_balance don't show
> up as hot routines in perf profile.
>
> 6.8-rc4:
> 7.18% swapper [kernel.vmlinux] [k] enqueue_task_fair
> 6.78% s [kernel.vmlinux] [k] newidle_balance
> +patch:
> 0.14% swapper [kernel.vmlinux] [k] enqueue_task_fair
> 0.00% swapper [kernel.vmlinux] [k] newidle_balance
>
> Minor change: trace_sched_overutilized_tp expect that second argument to
> be bool. So do a int to bool conversion for that.
>
> Fixes: 2802bf3cd936 ("sched/fair: Add over-utilization/tipping point indicator")
> Reviewed-by: Qais Yousef <qyousef@layalina.io>
> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 53 +++++++++++++++++++++++++++++----------------
> 1 file changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6a16129f9a5c..5aa6e918598c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6663,22 +6663,42 @@ static inline void hrtick_update(struct rq *rq)
> #ifdef CONFIG_SMP
> static inline bool cpu_overutilized(int cpu)
> {
> - unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
> - unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
> + unsigned long rq_util_min, rq_util_max;
> +
> + if (!sched_energy_enabled())
> + return false;
> +
> + rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
> + rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
>
> /* Return true only if the utilization doesn't fit CPU's capacity */
> return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
> }
>
> -static inline void update_overutilized_status(struct rq *rq)
> +static inline void set_rd_overutilized_status(struct root_domain *rd,
> + unsigned int status)
> {
> - if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> - WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> - trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
> - }
> + if (!sched_energy_enabled())
> + return;
> +
> + WRITE_ONCE(rd->overutilized, status);
> + trace_sched_overutilized_tp(rd, !!status);
> +}
> +
> +static inline void check_update_overutilized_status(struct rq *rq)
> +{
> + /*
> + * overutilized field is used for load balancing decisions only
> + * if energy aware scheduler is being used
> + */
> + if (!sched_energy_enabled())
> + return;
> +
> + if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
> + set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
> }
> #else
> -static inline void update_overutilized_status(struct rq *rq) { }
> +static inline void check_update_overutilized_status(struct rq *rq) { }
> #endif
>
> /* Runqueue only has SCHED_IDLE tasks enqueued */
> @@ -6779,7 +6799,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> * and the following generally works well enough in practice.
> */
> if (!task_new)
> - update_overutilized_status(rq);
> + check_update_overutilized_status(rq);
>
> enqueue_throttle:
> assert_list_leaf_cfs_rq(rq);
> @@ -10596,19 +10616,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> env->fbq_type = fbq_classify_group(&sds->busiest_stat);
>
> if (!env->sd->parent) {
> - struct root_domain *rd = env->dst_rq->rd;
> -
> /* update overload indicator if we are at root domain */
> - WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
> + WRITE_ONCE(env->dst_rq->rd->overload, sg_status & SG_OVERLOAD);
>
> /* Update over-utilization (tipping point, U >= 0) indicator */
> - WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
> - trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
> + set_rd_overutilized_status(env->dst_rq->rd,
> + sg_status & SG_OVERUTILIZED);
> } else if (sg_status & SG_OVERUTILIZED) {
> - struct root_domain *rd = env->dst_rq->rd;
> -
> - WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> - trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
> + set_rd_overutilized_status(env->dst_rq->rd, SG_OVERUTILIZED);
> }
>
> update_idle_cpu_scan(env, sum_util);
> @@ -12609,7 +12624,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
> task_tick_numa(rq, curr);
>
> update_misfit_status(curr, rq);
> - update_overutilized_status(task_rq(curr));
> + check_update_overutilized_status(task_rq(curr));
>
> task_tick_core(rq, curr);
> }
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized
2024-03-07 8:57 ` [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized Shrikanth Hegde
@ 2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Introduce is_rd_overutilized() helper function to access root_domain::overutilized tip-bot2 for Shrikanth Hegde
1 sibling, 0 replies; 15+ messages in thread
From: Vincent Guittot @ 2024-03-07 16:50 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: mingo, peterz, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
On Thu, 7 Mar 2024 at 09:58, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> Overutilized field is accessed directly in multiple places.
> So it could use a helper function. That way one might be more
> informed that it needs to be used only in case of EAS.
>
> No change in functionality intended.
>
> Reviewed-by: Qais Yousef <qyousef@layalina.io>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5aa6e918598c..87e08a252f94 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6675,6 +6675,15 @@ static inline bool cpu_overutilized(int cpu)
> return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
> }
>
> +/*
> + * Ensure that caller can do EAS. overutilized value
> + * make sense only if EAS is enabled
> + */
> +static inline int is_rd_overutilized(struct root_domain *rd)
> +{
> + return READ_ONCE(rd->overutilized);
> +}
> +
> static inline void set_rd_overutilized_status(struct root_domain *rd,
> unsigned int status)
> {
> @@ -6694,7 +6703,7 @@ static inline void check_update_overutilized_status(struct rq *rq)
> if (!sched_energy_enabled())
> return;
>
> - if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
> + if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
> }
> #else
> @@ -7980,7 +7989,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>
> rcu_read_lock();
> pd = rcu_dereference(rd->pd);
> - if (!pd || READ_ONCE(rd->overutilized))
> + if (!pd || is_rd_overutilized(rd))
> goto unlock;
>
> /*
> @@ -10863,7 +10872,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> if (sched_energy_enabled()) {
> struct root_domain *rd = env->dst_rq->rd;
>
> - if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
> + if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
> goto out_balanced;
> }
>
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-07 8:57 ` [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access Shrikanth Hegde
@ 2024-03-07 16:50 ` Vincent Guittot
2024-03-26 7:58 ` Ingo Molnar
1 sibling, 0 replies; 15+ messages in thread
From: Vincent Guittot @ 2024-03-07 16:50 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: mingo, peterz, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
On Thu, 7 Mar 2024 at 09:58, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> Access to overutilized is always used with sched_energy_enabled in
> the pattern:
>
> if (sched_energy_enabled && !overutilized)
> do something
>
> So modify the helper function to return this pattern. This is more
> readable code as it would say, do something when root domain is not
> overutilized.
>
> No change in functionality intended.
>
> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 24 +++++++++---------------
> 1 file changed, 9 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 87e08a252f94..bcda18a2ccfe 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6676,12 +6676,11 @@ static inline bool cpu_overutilized(int cpu)
> }
>
> /*
> - * Ensure that caller can do EAS. overutilized value
> - * make sense only if EAS is enabled
> + * overutilized value make sense only if EAS is enabled
> */
> -static inline int is_rd_overutilized(struct root_domain *rd)
> +static inline int is_rd_not_overutilized(struct root_domain *rd)
> {
> - return READ_ONCE(rd->overutilized);
> + return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
> }
>
> static inline void set_rd_overutilized_status(struct root_domain *rd,
> @@ -6700,10 +6699,8 @@ static inline void check_update_overutilized_status(struct rq *rq)
> * overutilized field is used for load balancing decisions only
> * if energy aware scheduler is being used
> */
> - if (!sched_energy_enabled())
> - return;
>
> - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
> }
> #else
> @@ -7989,7 +7986,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>
> rcu_read_lock();
> pd = rcu_dereference(rd->pd);
> - if (!pd || is_rd_overutilized(rd))
> + if (!pd)
> goto unlock;
>
> /*
> @@ -8192,7 +8189,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> cpumask_test_cpu(cpu, p->cpus_ptr))
> return cpu;
>
> - if (sched_energy_enabled()) {
> + if (is_rd_not_overutilized(this_rq()->rd)) {
> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> if (new_cpu >= 0)
> return new_cpu;
> @@ -10869,12 +10866,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> if (busiest->group_type == group_misfit_task)
> goto force_balance;
>
> - if (sched_energy_enabled()) {
> - struct root_domain *rd = env->dst_rq->rd;
> -
> - if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
> - goto out_balanced;
> - }
> + if (is_rd_not_overutilized(env->dst_rq->rd) &&
> + rcu_dereference(env->dst_rq->rd->pd))
> + goto out_balanced;
>
> /* ASYM feature bypasses nice load balance check */
> if (busiest->group_type == group_asym_packing)
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 0/3] sched/fair: Limit access to overutilized
2024-03-07 8:57 [PATCH v6 0/3] sched/fair: Limit access to overutilized Shrikanth Hegde
` (2 preceding siblings ...)
2024-03-07 8:57 ` [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access Shrikanth Hegde
@ 2024-03-14 16:47 ` Valentin Schneider
3 siblings, 0 replies; 15+ messages in thread
From: Valentin Schneider @ 2024-03-14 16:47 UTC (permalink / raw)
To: Shrikanth Hegde, mingo, peterz, vincent.guittot
Cc: sshegde, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, pierre.gondois, qyousef
On 07/03/24 14:27, Shrikanth Hegde wrote:
> When running an ISV workload on a large system (240 Cores, SMT8), it was
> observed from perf profile that newidle_balance and enqueue_task_fair
> were consuming more cycles. Perf annotate showed that most of the time
> was spent on accessing overutilized field of root domain. Similar perf profile
> was simulated by making some changes to stress-ng --wait. Both
> newidle_balance and enqueue_task_fair consume close to 5-7%.
>
> Overutilized was added for EAS(Energy aware scheduler) to decide whether
> to do load balance or not. Simultaneous access to overutilized by
> multiple CPUs lead cache invalidations due to true sharing. Updating
> overutilized is not required for non-EAS platforms. Since overutilized and
> overload are part of the same cacheline, there is false sharing as well.
>
> Patch 1/3 - Main patch. It helps in reducing the above said issue.
> Both the functions don't show up in the profile. With patch comparison is in
> changelog. With the patch stated problem in the ISV workload also got
> solved and throughput has improved.
> Patch 2/3 - Code refactoring to use the helper function instead of
> direct access of the field. Keeping this patch so patch 3/3 becomes
> easier to understand. Depends on 1/3.
> Patch 3/3 - Refactoring the code since most of the patterns are observed
> are eas && !overutilzed. Changed the helper function accordingly.
> Depends on 2/3.
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-07 8:57 ` [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
@ 2024-03-26 7:58 ` Ingo Molnar
2024-03-26 8:26 ` Vincent Guittot
1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2024-03-26 7:58 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: peterz, vincent.guittot, yu.c.chen, dietmar.eggemann,
linux-kernel, nysal, aboorvad, srikar, vschneid, pierre.gondois,
qyousef
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> /*
> - * Ensure that caller can do EAS. overutilized value
> - * make sense only if EAS is enabled
> + * overutilized value make sense only if EAS is enabled
> */
> -static inline int is_rd_overutilized(struct root_domain *rd)
> +static inline int is_rd_not_overutilized(struct root_domain *rd)
> {
> - return READ_ONCE(rd->overutilized);
> + return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
> }
While adding the sched_energy_enabled() condition looks OK, the _not prefix
This is silly: putting logical operators into functions names is far less
readable than a !fn()...
> - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
Especially since we already have cpu_overutilized(). It's far more coherent
to have the same basic attribute functions and put any negation into
*actual* logical operators.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [tip: sched/core] sched/fair: Introduce is_rd_overutilized() helper function to access root_domain::overutilized
2024-03-07 8:57 ` [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
@ 2024-03-26 8:06 ` tip-bot2 for Shrikanth Hegde
1 sibling, 0 replies; 15+ messages in thread
From: tip-bot2 for Shrikanth Hegde @ 2024-03-26 8:06 UTC (permalink / raw)
To: linux-tip-commits
Cc: Shrikanth Hegde, Ingo Molnar, Qais Yousef, Vincent Guittot, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: d0f5d3cefc259f498456338d319098dc84393b24
Gitweb: https://git.kernel.org/tip/d0f5d3cefc259f498456338d319098dc84393b24
Author: Shrikanth Hegde <sshegde@linux.ibm.com>
AuthorDate: Thu, 07 Mar 2024 14:27:24 +05:30
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 26 Mar 2024 08:58:59 +01:00
sched/fair: Introduce is_rd_overutilized() helper function to access root_domain::overutilized
The root_domain::overutilized field is READ_ONCE() accessed in
multiple places, which could be simplified with a helper function.
This might also make it more apparent that it needs to be used
only in case of EAS.
No change in functionality intended.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240307085725.444486-3-sshegde@linux.ibm.com
---
kernel/sched/fair.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1afa4f8..24a7530 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6685,6 +6685,15 @@ static inline bool cpu_overutilized(int cpu)
return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
}
+/*
+ * Ensure that caller can do EAS. overutilized value
+ * make sense only if EAS is enabled
+ */
+static inline int is_rd_overutilized(struct root_domain *rd)
+{
+ return READ_ONCE(rd->overutilized);
+}
+
static inline void set_rd_overutilized_status(struct root_domain *rd,
unsigned int status)
{
@@ -6704,7 +6713,7 @@ static inline void check_update_overutilized_status(struct rq *rq)
if (!sched_energy_enabled())
return;
- if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
+ if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
}
#else
@@ -7990,7 +7999,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
rcu_read_lock();
pd = rcu_dereference(rd->pd);
- if (!pd || READ_ONCE(rd->overutilized))
+ if (!pd || is_rd_overutilized(rd))
goto unlock;
/*
@@ -10897,7 +10906,7 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
if (sched_energy_enabled()) {
struct root_domain *rd = env->dst_rq->rd;
- if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
+ if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
goto out_balanced;
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [tip: sched/core] sched/fair: Add EAS checks before updating root_domain::overutilized
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
@ 2024-03-26 8:06 ` tip-bot2 for Shrikanth Hegde
1 sibling, 0 replies; 15+ messages in thread
From: tip-bot2 for Shrikanth Hegde @ 2024-03-26 8:06 UTC (permalink / raw)
To: linux-tip-commits
Cc: Shrikanth Hegde, Ingo Molnar, Qais Yousef, Srikar Dronamraju,
Vincent Guittot, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: be3a51e68f2f1b17250ce40d8872c7645b7a2991
Gitweb: https://git.kernel.org/tip/be3a51e68f2f1b17250ce40d8872c7645b7a2991
Author: Shrikanth Hegde <sshegde@linux.ibm.com>
AuthorDate: Thu, 07 Mar 2024 14:27:23 +05:30
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 26 Mar 2024 08:58:59 +01:00
sched/fair: Add EAS checks before updating root_domain::overutilized
root_domain::overutilized is only used for EAS(energy aware scheduler)
to decide whether to do load balance or not. It is not used if EAS
not possible.
Currently enqueue_task_fair and task_tick_fair accesses, sometime updates
this field. In update_sd_lb_stats it is updated often. This causes cache
contention due to true sharing and burns a lot of cycles. ::overload and
::overutilized are part of the same cacheline. Updating it often invalidates
the cacheline. That causes access to ::overload to slow down due to
false sharing. Hence add EAS check before accessing/updating this field.
EAS check is optimized at compile time or it is a static branch.
Hence it shouldn't cost much.
With the patch, both enqueue_task_fair and newidle_balance don't show
up as hot routines in perf profile.
6.8-rc4:
7.18% swapper [kernel.vmlinux] [k] enqueue_task_fair
6.78% s [kernel.vmlinux] [k] newidle_balance
+patch:
0.14% swapper [kernel.vmlinux] [k] enqueue_task_fair
0.00% swapper [kernel.vmlinux] [k] newidle_balance
While at it: trace_sched_overutilized_tp expect that second argument to
be bool. So do a int to bool conversion for that.
Fixes: 2802bf3cd936 ("sched/fair: Add over-utilization/tipping point indicator")
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240307085725.444486-2-sshegde@linux.ibm.com
---
kernel/sched/fair.c | 53 ++++++++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dbf4f1c..1afa4f8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6673,22 +6673,42 @@ static inline void hrtick_update(struct rq *rq)
#ifdef CONFIG_SMP
static inline bool cpu_overutilized(int cpu)
{
- unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
- unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
+ unsigned long rq_util_min, rq_util_max;
+
+ if (!sched_energy_enabled())
+ return false;
+
+ rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
+ rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
/* Return true only if the utilization doesn't fit CPU's capacity */
return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
}
-static inline void update_overutilized_status(struct rq *rq)
+static inline void set_rd_overutilized_status(struct root_domain *rd,
+ unsigned int status)
{
- if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
- WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED);
- }
+ if (!sched_energy_enabled())
+ return;
+
+ WRITE_ONCE(rd->overutilized, status);
+ trace_sched_overutilized_tp(rd, !!status);
+}
+
+static inline void check_update_overutilized_status(struct rq *rq)
+{
+ /*
+ * overutilized field is used for load balancing decisions only
+ * if energy aware scheduler is being used
+ */
+ if (!sched_energy_enabled())
+ return;
+
+ if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
+ set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
}
#else
-static inline void update_overutilized_status(struct rq *rq) { }
+static inline void check_update_overutilized_status(struct rq *rq) { }
#endif
/* Runqueue only has SCHED_IDLE tasks enqueued */
@@ -6789,7 +6809,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* and the following generally works well enough in practice.
*/
if (!task_new)
- update_overutilized_status(rq);
+ check_update_overutilized_status(rq);
enqueue_throttle:
assert_list_leaf_cfs_rq(rq);
@@ -10630,19 +10650,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
if (!env->sd->parent) {
- struct root_domain *rd = env->dst_rq->rd;
-
/* update overload indicator if we are at root domain */
- WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD);
+ WRITE_ONCE(env->dst_rq->rd->overload, sg_status & SG_OVERLOAD);
/* Update over-utilization (tipping point, U >= 0) indicator */
- WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED);
+ set_rd_overutilized_status(env->dst_rq->rd,
+ sg_status & SG_OVERUTILIZED);
} else if (sg_status & SG_OVERUTILIZED) {
- struct root_domain *rd = env->dst_rq->rd;
-
- WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
- trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
+ set_rd_overutilized_status(env->dst_rq->rd, SG_OVERUTILIZED);
}
update_idle_cpu_scan(env, sum_util);
@@ -12667,7 +12682,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
task_tick_numa(rq, curr);
update_misfit_status(curr, rq);
- update_overutilized_status(task_rq(curr));
+ check_update_overutilized_status(task_rq(curr));
task_tick_core(rq, curr);
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-26 7:58 ` Ingo Molnar
@ 2024-03-26 8:26 ` Vincent Guittot
2024-03-26 12:25 ` Shrikanth Hegde
0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-03-26 8:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Shrikanth Hegde, peterz, yu.c.chen, dietmar.eggemann,
linux-kernel, nysal, aboorvad, srikar, vschneid, pierre.gondois,
qyousef
On Tue, 26 Mar 2024 at 08:58, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> > /*
> > - * Ensure that caller can do EAS. overutilized value
> > - * make sense only if EAS is enabled
> > + * overutilized value make sense only if EAS is enabled
> > */
> > -static inline int is_rd_overutilized(struct root_domain *rd)
> > +static inline int is_rd_not_overutilized(struct root_domain *rd)
> > {
> > - return READ_ONCE(rd->overutilized);
> > + return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
> > }
>
> While adding the sched_energy_enabled() condition looks OK, the _not prefix
> This is silly: putting logical operators into functions names is far less
> readable than a !fn()...
>
> > - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> > + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>
> Especially since we already have cpu_overutilized(). It's far more coherent
> to have the same basic attribute functions and put any negation into
> *actual* logical operators.
I was concerned by the || in this case that could defeat the purpose
of sched_energy_enabled() but it will return early anyway
return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-26 8:26 ` Vincent Guittot
@ 2024-03-26 12:25 ` Shrikanth Hegde
2024-03-26 14:12 ` Vincent Guittot
0 siblings, 1 reply; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-26 12:25 UTC (permalink / raw)
To: Vincent Guittot, Ingo Molnar
Cc: peterz, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
On 3/26/24 1:56 PM, Vincent Guittot wrote:
> On Tue, 26 Mar 2024 at 08:58, Ingo Molnar <mingo@kernel.org> wrote:
>>
>>
>> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>
>>> /*
>>> - * Ensure that caller can do EAS. overutilized value
>>> - * make sense only if EAS is enabled
>>> + * overutilized value make sense only if EAS is enabled
>>> */
>>> -static inline int is_rd_overutilized(struct root_domain *rd)
>>> +static inline int is_rd_not_overutilized(struct root_domain *rd)
>>> {
>>> - return READ_ONCE(rd->overutilized);
>>> + return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
>>> }
>>
>> While adding the sched_energy_enabled() condition looks OK, the _not prefix
>> This is silly: putting logical operators into functions names is far less
>> readable than a !fn()...
>>
>>> - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>>> + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>>
>> Especially since we already have cpu_overutilized(). It's far more coherent
>> to have the same basic attribute functions and put any negation into
>> *actual* logical operators.
>
> I was concerned by the || in this case that could defeat the purpose
> of sched_energy_enabled() but it will return early anyway
>
> return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
I think this would work.
>
>>
>> Thanks,
>>
>> Ingo
If EAS - false, then is_rd_overutilized -> would be true always and all users of it do !is_rd_overutilized(). so No operation.
If EAS - true, it reads rd->overutilized value.
Does this look correct?
-------------------------------------------------------------------------------------
From 3adc0d58f87d8a2e96196a0f47bcd0d2afd057ae Mon Sep 17 00:00:00 2001
From: Shrikanth Hegde <sshegde@linux.ibm.com>
Date: Wed, 6 Mar 2024 03:58:58 -0500
Subject: [PATCH v7 3/3] sched/fair: Combine EAS check with overutilized access
Access to overutilized is always used with sched_energy_enabled in
the pattern:
if (sched_energy_enabled && !overutilized)
do something
So modify the helper function to return this pattern. This is more
readable code as it would say, do something when root domain is not
overutilized. This function always return true when EAS is disabled.
No change in functionality intended.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 24a7530a7d3f..e222e3ad4cfe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6686,12 +6686,11 @@ static inline bool cpu_overutilized(int cpu)
}
/*
- * Ensure that caller can do EAS. overutilized value
- * make sense only if EAS is enabled
+ * overutilized value make sense only if EAS is enabled
*/
static inline int is_rd_overutilized(struct root_domain *rd)
{
- return READ_ONCE(rd->overutilized);
+ return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
}
static inline void set_rd_overutilized_status(struct root_domain *rd,
@@ -6710,8 +6709,6 @@ static inline void check_update_overutilized_status(struct rq *rq)
* overutilized field is used for load balancing decisions only
* if energy aware scheduler is being used
*/
- if (!sched_energy_enabled())
- return;
if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
@@ -7999,7 +7996,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
rcu_read_lock();
pd = rcu_dereference(rd->pd);
- if (!pd || is_rd_overutilized(rd))
+ if (!pd)
goto unlock;
/*
@@ -8202,7 +8199,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
cpumask_test_cpu(cpu, p->cpus_ptr))
return cpu;
- if (sched_energy_enabled()) {
+ if (!is_rd_overutilized(this_rq()->rd)) {
new_cpu = find_energy_efficient_cpu(p, prev_cpu);
if (new_cpu >= 0)
return new_cpu;
@@ -10903,12 +10900,9 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
if (busiest->group_type == group_misfit_task)
goto force_balance;
- if (sched_energy_enabled()) {
- struct root_domain *rd = env->dst_rq->rd;
-
- if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
- goto out_balanced;
- }
+ if (!is_rd_overutilized(env->dst_rq->rd) &&
+ rcu_dereference(env->dst_rq->rd->pd))
+ goto out_balanced;
/* ASYM feature bypasses nice load balance check */
if (busiest->group_type == group_asym_packing)
--
2.39.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-26 12:25 ` Shrikanth Hegde
@ 2024-03-26 14:12 ` Vincent Guittot
2024-03-26 14:49 ` Shrikanth Hegde
0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2024-03-26 14:12 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Ingo Molnar, peterz, yu.c.chen, dietmar.eggemann, linux-kernel,
nysal, aboorvad, srikar, vschneid, pierre.gondois, qyousef
On Tue, 26 Mar 2024 at 13:26, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
>
>
> On 3/26/24 1:56 PM, Vincent Guittot wrote:
> > On Tue, 26 Mar 2024 at 08:58, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >>
> >> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> >>
> >>> /*
> >>> - * Ensure that caller can do EAS. overutilized value
> >>> - * make sense only if EAS is enabled
> >>> + * overutilized value make sense only if EAS is enabled
> >>> */
> >>> -static inline int is_rd_overutilized(struct root_domain *rd)
> >>> +static inline int is_rd_not_overutilized(struct root_domain *rd)
> >>> {
> >>> - return READ_ONCE(rd->overutilized);
> >>> + return sched_energy_enabled() && !READ_ONCE(rd->overutilized);
> >>> }
> >>
> >> While adding the sched_energy_enabled() condition looks OK, the _not prefix
> >> This is silly: putting logical operators into functions names is far less
> >> readable than a !fn()...
> >>
> >>> - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> >>> + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> >>
> >> Especially since we already have cpu_overutilized(). It's far more coherent
> >> to have the same basic attribute functions and put any negation into
> >> *actual* logical operators.
> >
> > I was concerned by the || in this case that could defeat the purpose
> > of sched_energy_enabled() but it will return early anyway
> >
>
> > return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
>
> I think this would work.
>
> >
> >>
> >> Thanks,
> >>
> >> Ingo
>
>
> If EAS - false, then is_rd_overutilized -> would be true always and all users of it do !is_rd_overutilized(). so No operation.
> If EAS - true, it reads rd->overutilized value.
>
> Does this look correct?
yes looks good to me
> -------------------------------------------------------------------------------------
> From 3adc0d58f87d8a2e96196a0f47bcd0d2afd057ae Mon Sep 17 00:00:00 2001
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> Date: Wed, 6 Mar 2024 03:58:58 -0500
> Subject: [PATCH v7 3/3] sched/fair: Combine EAS check with overutilized access
>
> Access to overutilized is always used with sched_energy_enabled in
> the pattern:
>
> if (sched_energy_enabled && !overutilized)
> do something
>
> So modify the helper function to return this pattern. This is more
> readable code as it would say, do something when root domain is not
> overutilized. This function always return true when EAS is disabled.
>
> No change in functionality intended.
>
> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> kernel/sched/fair.c | 20 +++++++-------------
> 1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 24a7530a7d3f..e222e3ad4cfe 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6686,12 +6686,11 @@ static inline bool cpu_overutilized(int cpu)
> }
>
> /*
> - * Ensure that caller can do EAS. overutilized value
> - * make sense only if EAS is enabled
> + * overutilized value make sense only if EAS is enabled
> */
> static inline int is_rd_overutilized(struct root_domain *rd)
> {
> - return READ_ONCE(rd->overutilized);
> + return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
> }
>
> static inline void set_rd_overutilized_status(struct root_domain *rd,
> @@ -6710,8 +6709,6 @@ static inline void check_update_overutilized_status(struct rq *rq)
> * overutilized field is used for load balancing decisions only
> * if energy aware scheduler is being used
> */
> - if (!sched_energy_enabled())
> - return;
>
> if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
> set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
> @@ -7999,7 +7996,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>
> rcu_read_lock();
> pd = rcu_dereference(rd->pd);
> - if (!pd || is_rd_overutilized(rd))
> + if (!pd)
> goto unlock;
>
> /*
> @@ -8202,7 +8199,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> cpumask_test_cpu(cpu, p->cpus_ptr))
> return cpu;
>
> - if (sched_energy_enabled()) {
> + if (!is_rd_overutilized(this_rq()->rd)) {
> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> if (new_cpu >= 0)
> return new_cpu;
> @@ -10903,12 +10900,9 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
> if (busiest->group_type == group_misfit_task)
> goto force_balance;
>
> - if (sched_energy_enabled()) {
> - struct root_domain *rd = env->dst_rq->rd;
> -
> - if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
> - goto out_balanced;
> - }
> + if (!is_rd_overutilized(env->dst_rq->rd) &&
> + rcu_dereference(env->dst_rq->rd->pd))
> + goto out_balanced;
>
> /* ASYM feature bypasses nice load balance check */
> if (busiest->group_type == group_asym_packing)
> --
> 2.39.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access
2024-03-26 14:12 ` Vincent Guittot
@ 2024-03-26 14:49 ` Shrikanth Hegde
0 siblings, 0 replies; 15+ messages in thread
From: Shrikanth Hegde @ 2024-03-26 14:49 UTC (permalink / raw)
To: Vincent Guittot, Ingo Molnar
Cc: peterz, yu.c.chen, dietmar.eggemann, linux-kernel, nysal,
aboorvad, srikar, vschneid, pierre.gondois, qyousef
On 3/26/24 7:42 PM, Vincent Guittot wrote:
> On Tue, 26 Mar 2024 at 13:26, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>>
>>
>>
>>>>
>>>> While adding the sched_energy_enabled() condition looks OK, the _not prefix
>>>> This is silly: putting logical operators into functions names is far less
>>>> readable than a !fn()...
>>>>
>>>>> - if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>>>>> + if (is_rd_not_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>>>>
>>>> Especially since we already have cpu_overutilized(). It's far more coherent
>>>> to have the same basic attribute functions and put any negation into
>>>> *actual* logical operators.
>>>
>>> I was concerned by the || in this case that could defeat the purpose
>>> of sched_energy_enabled() but it will return early anyway
>>>
>>
>>> return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
>>
>> I think this would work.
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Ingo
>>
>>
>> If EAS - false, then is_rd_overutilized -> would be true always and all users of it do !is_rd_overutilized(). so No operation.
>> If EAS - true, it reads rd->overutilized value.
>>
>> Does this look correct?
>
> yes looks good to me
>
Thank you. Let me send it as a patch
>> -------------------------------------------------------------------------------------
>> From 3adc0d58f87d8a2e96196a0f47bcd0d2afd057ae Mon Sep 17 00:00:00 2001
>> From: Shrikanth Hegde <sshegde@linux.ibm.com>
>> Date: Wed, 6 Mar 2024 03:58:58 -0500
>> Subject: [PATCH v7 3/3] sched/fair: Combine EAS check with overutilized access
>>
>> Access to overutilized is always used with sched_energy_enabled in
>> the pattern:
>>
>> if (sched_energy_enabled && !overutilized)
>> do something
>>
>> So modify the helper function to return this pattern. This is more
>> readable code as it would say, do something when root domain is not
>> overutilized. This function always return true when EAS is disabled.
>>
>> No change in functionality intended.
>>
>> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> kernel/sched/fair.c | 20 +++++++-------------
>> 1 file changed, 7 insertions(+), 13 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 24a7530a7d3f..e222e3ad4cfe 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6686,12 +6686,11 @@ static inline bool cpu_overutilized(int cpu)
>> }
>>
>> /*
>> - * Ensure that caller can do EAS. overutilized value
>> - * make sense only if EAS is enabled
>> + * overutilized value make sense only if EAS is enabled
>> */
>> static inline int is_rd_overutilized(struct root_domain *rd)
>> {
>> - return READ_ONCE(rd->overutilized);
>> + return !sched_energy_enabled() || READ_ONCE(rd->overutilized);
>> }
>>
>> static inline void set_rd_overutilized_status(struct root_domain *rd,
>> @@ -6710,8 +6709,6 @@ static inline void check_update_overutilized_status(struct rq *rq)
>> * overutilized field is used for load balancing decisions only
>> * if energy aware scheduler is being used
>> */
>> - if (!sched_energy_enabled())
>> - return;
>>
>> if (!is_rd_overutilized(rq->rd) && cpu_overutilized(rq->cpu))
>> set_rd_overutilized_status(rq->rd, SG_OVERUTILIZED);
>> @@ -7999,7 +7996,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>>
>> rcu_read_lock();
>> pd = rcu_dereference(rd->pd);
>> - if (!pd || is_rd_overutilized(rd))
>> + if (!pd)
>> goto unlock;
>>
>> /*
>> @@ -8202,7 +8199,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> cpumask_test_cpu(cpu, p->cpus_ptr))
>> return cpu;
>>
>> - if (sched_energy_enabled()) {
>> + if (!is_rd_overutilized(this_rq()->rd)) {
>> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>> if (new_cpu >= 0)
>> return new_cpu;
>> @@ -10903,12 +10900,9 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
>> if (busiest->group_type == group_misfit_task)
>> goto force_balance;
>>
>> - if (sched_energy_enabled()) {
>> - struct root_domain *rd = env->dst_rq->rd;
>> -
>> - if (rcu_dereference(rd->pd) && !is_rd_overutilized(rd))
>> - goto out_balanced;
>> - }
>> + if (!is_rd_overutilized(env->dst_rq->rd) &&
>> + rcu_dereference(env->dst_rq->rd->pd))
>> + goto out_balanced;
>>
>> /* ASYM feature bypasses nice load balance check */
>> if (busiest->group_type == group_asym_packing)
>> --
>> 2.39.3
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-03-26 14:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-07 8:57 [PATCH v6 0/3] sched/fair: Limit access to overutilized Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 1/3] sched/fair: Add EAS checks before updating overutilized Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Add EAS checks before updating root_domain::overutilized tip-bot2 for Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 2/3] sched/fair: Use helper function to access rd->overutilized Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 8:06 ` [tip: sched/core] sched/fair: Introduce is_rd_overutilized() helper function to access root_domain::overutilized tip-bot2 for Shrikanth Hegde
2024-03-07 8:57 ` [PATCH v6 3/3] sched/fair: Combine EAS check with overutilized access Shrikanth Hegde
2024-03-07 16:50 ` Vincent Guittot
2024-03-26 7:58 ` Ingo Molnar
2024-03-26 8:26 ` Vincent Guittot
2024-03-26 12:25 ` Shrikanth Hegde
2024-03-26 14:12 ` Vincent Guittot
2024-03-26 14:49 ` Shrikanth Hegde
2024-03-14 16:47 ` [PATCH v6 0/3] sched/fair: Limit access to overutilized Valentin Schneider
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox