* [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork
@ 2025-07-17 6:20 Adam Li
2025-07-17 6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.
{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
* * * * * * * * * * * *
When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be
crowded with newly forked threads, at the same time CPU 0-7 can be idle.
The first patch 'Only update stats of allowed CPUs when looking for dst
group' *alone* can fix this imbalance issue. With this patch, performance
significantly improved for workload with frequent task fork, if the
workload is set to use part of CPUs in a schedule group.
And I think the second patch also makes sense in this scenario. If group
weight includes CPUs a task cannot use, group classification can be
incorrect.
Peter mentioned [1] that the second patch might also apply to
update_sg_lb_stats(). The third patch counts group weight from 'env->cpus'
(active CPUs). Group classification can be incorrect if group weight
includes inactive CPUs.
Peter also mentioned that update_sg_wakeup_stats() and update_sg_lb_stats()
are very similar, that they might be unified. The RFC patches 4-6 try to
refactor the two functions. The common logic is unified to a new function
update_sg_stats().
I tested with Specjbb workload on arm64 server. The patch set does not
introduce observable performance change. But the test cannot cover every
code path. Please review.
v2:
Follow Peter's suggestions:
1) Apply the second patch to update_sg_lb_stats().
2) Refactor and unify update_sg_wakeup_stats() and update_sg_lb_stats().
v1:
https://lore.kernel.org/lkml/20250701024549.40166-1-adamli@os.amperecomputing.com/
links:
[1]: https://lore.kernel.org/lkml/20250704091758.GG2001818@noisy.programming.kicks-ass.net/
Adam Li (6):
sched/fair: Only update stats for allowed CPUs when looking for dst
group
sched/fair: Only count group weight for allowed CPUs when looking for
dst group
sched/fair: Only count group weight for CPUs doing load balance when
looking for src group
sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL
pointers
sched/fair: Introduce update_sg_stats()
sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()
kernel/sched/fair.c | 274 ++++++++++++++++++++++++--------------------
1 file changed, 148 insertions(+), 126 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
@ 2025-07-17 6:20 ` Adam Li
2025-07-17 6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
A task may not use all the CPUs in a schedule group due to CPU affinity.
Only update schedule group statistics for allowed CPUs.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..78a3d9b78e07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10693,7 +10693,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu(i, sched_group_span(group)) {
+ for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
struct rq *rq = cpu_rq(i);
unsigned int local;
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/6] sched/fair: Only count group weight for allowed CPUs when looking for dst group
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
2025-07-17 6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
@ 2025-07-17 6:20 ` Adam Li
2025-07-17 6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
A task may not use all the CPUs in a schedule group due to CPU affinity.
If group weight includes CPUs not allowed to run, group classification
may be incorrect.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 78a3d9b78e07..452e2df961b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10722,7 +10722,9 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
sgs->group_capacity = group->sgc->capacity;
- sgs->group_weight = group->group_weight;
+ /* Only count group_weight if p can run on these cpus */
+ sgs->group_weight = cpumask_weight_and(sched_group_span(group),
+ p->cpus_ptr);
sgs->group_type = group_classify(sd->imbalance_pct, group, sgs);
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
2025-07-17 6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
2025-07-17 6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
@ 2025-07-17 6:20 ` Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
Load balancing is limitted to a set of CPUs, such as active CPUs.
Group classification may be incorrect if group weight counts inactive CPUs.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 452e2df961b9..db9ec6a6acdf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10427,7 +10427,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs->group_capacity = group->sgc->capacity;
- sgs->group_weight = group->group_weight;
+ sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
/* Check if dst CPU is idle and preferred to this group */
if (!local_group && env->idle && sgs->sum_h_nr_running &&
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
` (2 preceding siblings ...)
2025-07-17 6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
@ 2025-07-17 6:20 ` Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
update_sg_wakeup_stats() uses a set of helper functions:
cpu_load_without(struct task_struct *p),
cpu_runnable_without(struct task_struct *p),
cpu_util_without(struct task_struct *p),
task_running_on_cpu(struct task_struct *p),
idle_cpu_without(struct task_struct *p).
update_sg_lb_stats() uses similar helper functions, without the 'p'
argument: cpu_load(), cpu_runnable(), cpu_util_cfs(), idle_cpu().
Make update_sg_wakeup_stats() helper functions handle the case when
'p==NULL'. So update_sg_lb_stats() can use the same helper functions.
This is the first step to unify update_sg_wakeup_stats() and
update_sg_lb_stats().
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 95 ++++++++++++++++++++++++---------------------
1 file changed, 50 insertions(+), 45 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db9ec6a6acdf..69dac5b337d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7250,7 +7250,8 @@ static unsigned long cpu_load_without(struct rq *rq, struct task_struct *p)
unsigned int load;
/* Task has no contribution or is new */
- if (cpu_of(rq) != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+ if (!p || cpu_of(rq) != task_cpu(p) ||
+ !READ_ONCE(p->se.avg.last_update_time))
return cpu_load(rq);
cfs_rq = &rq->cfs;
@@ -7273,7 +7274,8 @@ static unsigned long cpu_runnable_without(struct rq *rq, struct task_struct *p)
unsigned int runnable;
/* Task has no contribution or is new */
- if (cpu_of(rq) != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+ if (!p || cpu_of(rq) != task_cpu(p) ||
+ !READ_ONCE(p->se.avg.last_update_time))
return cpu_runnable(rq);
cfs_rq = &rq->cfs;
@@ -7285,6 +7287,51 @@ static unsigned long cpu_runnable_without(struct rq *rq, struct task_struct *p)
return runnable;
}
+/*
+ * task_running_on_cpu - return 1 if @p is running on @cpu.
+ */
+
+static unsigned int task_running_on_cpu(int cpu, struct task_struct *p)
+{
+ /* Task has no contribution or is new */
+ if (!p || cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+ return 0;
+
+ if (task_on_rq_queued(p))
+ return 1;
+
+ return 0;
+}
+
+/**
+ * idle_cpu_without - would a given CPU be idle without p ?
+ * @cpu: the processor on which idleness is tested.
+ * @p: task which should be ignored.
+ *
+ * Return: 1 if the CPU would be idle. 0 otherwise.
+ */
+static int idle_cpu_without(int cpu, struct task_struct *p)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ if (!p)
+ return idle_cpu(cpu);
+
+ if (rq->curr != rq->idle && rq->curr != p)
+ return 0;
+
+ /*
+ * rq->nr_running can't be used but an updated version without the
+ * impact of p on cpu must be used instead. The updated nr_running
+ * be computed and tested before calling idle_cpu_without().
+ */
+
+ if (rq->ttwu_pending)
+ return 0;
+
+ return 1;
+}
+
static unsigned long capacity_of(int cpu)
{
return cpu_rq(cpu)->cpu_capacity;
@@ -8099,7 +8146,7 @@ unsigned long cpu_util_cfs_boost(int cpu)
static unsigned long cpu_util_without(int cpu, struct task_struct *p)
{
/* Task has no contribution or is new */
- if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+ if (!p || cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
p = NULL;
return cpu_util(cpu, p, -1, 0);
@@ -10631,48 +10678,6 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq)
struct sg_lb_stats;
-/*
- * task_running_on_cpu - return 1 if @p is running on @cpu.
- */
-
-static unsigned int task_running_on_cpu(int cpu, struct task_struct *p)
-{
- /* Task has no contribution or is new */
- if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
- return 0;
-
- if (task_on_rq_queued(p))
- return 1;
-
- return 0;
-}
-
-/**
- * idle_cpu_without - would a given CPU be idle without p ?
- * @cpu: the processor on which idleness is tested.
- * @p: task which should be ignored.
- *
- * Return: 1 if the CPU would be idle. 0 otherwise.
- */
-static int idle_cpu_without(int cpu, struct task_struct *p)
-{
- struct rq *rq = cpu_rq(cpu);
-
- if (rq->curr != rq->idle && rq->curr != p)
- return 0;
-
- /*
- * rq->nr_running can't be used but an updated version without the
- * impact of p on cpu must be used instead. The updated nr_running
- * be computed and tested before calling idle_cpu_without().
- */
-
- if (rq->ttwu_pending)
- return 0;
-
- return 1;
-}
-
/*
* update_sg_wakeup_stats - Update sched_group's statistics for wakeup.
* @sd: The sched_domain level to look for idlest group.
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats()
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
` (3 preceding siblings ...)
2025-07-17 6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
@ 2025-07-17 6:20 ` Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
Unify common logic in update_sg_lb_stats() and update_sg_wakeup_stats()
into function update_sg_stats().
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69dac5b337d8..f4ab520951a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10398,6 +10398,121 @@ sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)
return check_cpu_capacity(rq, sd);
}
+struct sg_lb_stat_env {
+ /* true: find src group, false: find dst group */
+ bool find_src_sg;
+ struct cpumask *cpus;
+ struct sched_domain *sd;
+ struct task_struct *p;
+ bool *sg_overloaded;
+ bool *sg_overutilized;
+ int local_group;
+ struct lb_env *lb_env;
+};
+
+static inline void update_sg_stats(struct sg_lb_stats *sgs,
+ struct sched_group *group,
+ struct sg_lb_stat_env *env)
+{
+ bool find_src_sg = env->find_src_sg;
+ int i, sd_flags = env->sd->flags;
+ bool balancing_at_rd = !env->sd->parent;
+ struct task_struct *p = env->p;
+ enum cpu_idle_type idle;
+
+ if (env->lb_env)
+ idle = env->lb_env->idle;
+
+ for_each_cpu_and(i, sched_group_span(group), env->cpus) {
+ struct rq *rq = cpu_rq(i);
+ unsigned int local, load = cpu_load_without(rq, p);
+ int nr_running;
+
+ sgs->group_load += load;
+ sgs->group_util += cpu_util_without(i, p);
+ sgs->group_runnable += cpu_runnable_without(rq, p);
+ local = task_running_on_cpu(i, p);
+ sgs->sum_h_nr_running += rq->cfs.h_nr_runnable - local;
+
+ nr_running = rq->nr_running - local;
+ sgs->sum_nr_running += nr_running;
+
+ if (find_src_sg && cpu_overutilized(i))
+ *env->sg_overutilized = 1;
+
+ /*
+ * No need to call idle_cpu_without() if nr_running is not 0
+ */
+ if (!nr_running && idle_cpu_without(i, p)) {
+ sgs->idle_cpus++;
+ /* Idle cpu can't have mistfit task */
+ continue;
+ }
+
+ if (!find_src_sg) {
+ /* Check if task fits in the CPU */
+ if (sd_flags & SD_ASYM_CPUCAPACITY &&
+ sgs->group_misfit_task_load &&
+ task_fits_cpu(p, i))
+ sgs->group_misfit_task_load = 0;
+
+ /* We are done if to find dst(idlest) group */
+ continue;
+ }
+
+ /* Overload indicator is only updated at root domain */
+ if (balancing_at_rd && nr_running > 1)
+ *env->sg_overloaded = 1;
+
+#ifdef CONFIG_NUMA_BALANCING
+ /* Only fbq_classify_group() uses this to classify NUMA groups */
+ if (sd_flags & SD_NUMA) {
+ sgs->nr_numa_running += rq->nr_numa_running;
+ sgs->nr_preferred_running += rq->nr_preferred_running;
+ }
+#endif
+ if (env->local_group)
+ continue;
+
+ if (sd_flags & SD_ASYM_CPUCAPACITY) {
+ /* Check for a misfit task on the cpu */
+ if (sgs->group_misfit_task_load < rq->misfit_task_load) {
+ sgs->group_misfit_task_load = rq->misfit_task_load;
+ *env->sg_overloaded = 1;
+ }
+ } else if (idle && sched_reduced_capacity(rq, env->sd)) {
+ /* Check for a task running on a CPU with reduced capacity */
+ if (sgs->group_misfit_task_load < load)
+ sgs->group_misfit_task_load = load;
+ }
+ }
+
+ sgs->group_capacity = group->sgc->capacity;
+
+ /* Only count group_weight for allowed cpus */
+ sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
+
+ /* Check if dst CPU is idle and preferred to this group */
+ if (find_src_sg && !env->local_group && idle && sgs->sum_h_nr_running &&
+ sched_group_asym(env->lb_env, sgs, group))
+ sgs->group_asym_packing = 1;
+
+ /* Check for loaded SMT group to be balanced to dst CPU */
+ if (find_src_sg && !env->local_group && smt_balance(env->lb_env, sgs, group))
+ sgs->group_smt_balance = 1;
+
+ sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
+
+ /*
+ * Computing avg_load makes sense only when group is fully busy or
+ * overloaded
+ */
+ if (sgs->group_type == group_fully_busy ||
+ sgs->group_type == group_overloaded)
+ sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
+ sgs->group_capacity;
+}
+
/**
* update_sg_lb_stats - Update sched_group's statistics for load balancing.
* @env: The load balancing environment.
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
` (4 preceding siblings ...)
2025-07-17 6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
@ 2025-07-17 6:20 ` Adam Li
5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17 6:20 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
linux-kernel, patches, shkaushik, Adam Li
The two functions call common function update_sg_stats(), with
different context.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
kernel/sched/fair.c | 136 ++++++--------------------------------------
1 file changed, 18 insertions(+), 118 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f4ab520951a8..96a2ca4fa880 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10529,83 +10529,20 @@ static inline void update_sg_lb_stats(struct lb_env *env,
bool *sg_overloaded,
bool *sg_overutilized)
{
- int i, nr_running, local_group, sd_flags = env->sd->flags;
- bool balancing_at_rd = !env->sd->parent;
+ struct sg_lb_stat_env stat_env = {
+ .find_src_sg = true,
+ .cpus = env->cpus,
+ .sd = env->sd,
+ .p = NULL,
+ .sg_overutilized = sg_overutilized,
+ .sg_overloaded = sg_overloaded,
+ .local_group = group == sds->local,
+ .lb_env = env,
+ };
memset(sgs, 0, sizeof(*sgs));
- local_group = group == sds->local;
-
- for_each_cpu_and(i, sched_group_span(group), env->cpus) {
- struct rq *rq = cpu_rq(i);
- unsigned long load = cpu_load(rq);
-
- sgs->group_load += load;
- sgs->group_util += cpu_util_cfs(i);
- sgs->group_runnable += cpu_runnable(rq);
- sgs->sum_h_nr_running += rq->cfs.h_nr_runnable;
-
- nr_running = rq->nr_running;
- sgs->sum_nr_running += nr_running;
-
- if (cpu_overutilized(i))
- *sg_overutilized = 1;
-
- /*
- * No need to call idle_cpu() if nr_running is not 0
- */
- if (!nr_running && idle_cpu(i)) {
- sgs->idle_cpus++;
- /* Idle cpu can't have misfit task */
- continue;
- }
-
- /* Overload indicator is only updated at root domain */
- if (balancing_at_rd && nr_running > 1)
- *sg_overloaded = 1;
-
-#ifdef CONFIG_NUMA_BALANCING
- /* Only fbq_classify_group() uses this to classify NUMA groups */
- if (sd_flags & SD_NUMA) {
- sgs->nr_numa_running += rq->nr_numa_running;
- sgs->nr_preferred_running += rq->nr_preferred_running;
- }
-#endif
- if (local_group)
- continue;
-
- if (sd_flags & SD_ASYM_CPUCAPACITY) {
- /* Check for a misfit task on the cpu */
- if (sgs->group_misfit_task_load < rq->misfit_task_load) {
- sgs->group_misfit_task_load = rq->misfit_task_load;
- *sg_overloaded = 1;
- }
- } else if (env->idle && sched_reduced_capacity(rq, env->sd)) {
- /* Check for a task running on a CPU with reduced capacity */
- if (sgs->group_misfit_task_load < load)
- sgs->group_misfit_task_load = load;
- }
- }
-
- sgs->group_capacity = group->sgc->capacity;
-
- sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
-
- /* Check if dst CPU is idle and preferred to this group */
- if (!local_group && env->idle && sgs->sum_h_nr_running &&
- sched_group_asym(env, sgs, group))
- sgs->group_asym_packing = 1;
-
- /* Check for loaded SMT group to be balanced to dst CPU */
- if (!local_group && smt_balance(env, sgs, group))
- sgs->group_smt_balance = 1;
-
- sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
-
- /* Computing avg_load makes sense only when group is overloaded */
- if (sgs->group_type == group_overloaded)
- sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
- sgs->group_capacity;
+ update_sg_stats(sgs, group, &stat_env);
}
/**
@@ -10805,7 +10742,12 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
struct sg_lb_stats *sgs,
struct task_struct *p)
{
- int i, nr_running;
+ struct sg_lb_stat_env stat_env = {
+ .find_src_sg = false,
+ .cpus = (struct cpumask *)p->cpus_ptr,
+ .sd = sd,
+ .p = p,
+ };
memset(sgs, 0, sizeof(*sgs));
@@ -10813,49 +10755,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
if (sd->flags & SD_ASYM_CPUCAPACITY)
sgs->group_misfit_task_load = 1;
- for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
- struct rq *rq = cpu_rq(i);
- unsigned int local;
-
- sgs->group_load += cpu_load_without(rq, p);
- sgs->group_util += cpu_util_without(i, p);
- sgs->group_runnable += cpu_runnable_without(rq, p);
- local = task_running_on_cpu(i, p);
- sgs->sum_h_nr_running += rq->cfs.h_nr_runnable - local;
-
- nr_running = rq->nr_running - local;
- sgs->sum_nr_running += nr_running;
-
- /*
- * No need to call idle_cpu_without() if nr_running is not 0
- */
- if (!nr_running && idle_cpu_without(i, p))
- sgs->idle_cpus++;
-
- /* Check if task fits in the CPU */
- if (sd->flags & SD_ASYM_CPUCAPACITY &&
- sgs->group_misfit_task_load &&
- task_fits_cpu(p, i))
- sgs->group_misfit_task_load = 0;
-
- }
-
- sgs->group_capacity = group->sgc->capacity;
-
- /* Only count group_weight if p can run on these cpus */
- sgs->group_weight = cpumask_weight_and(sched_group_span(group),
- p->cpus_ptr);
-
- sgs->group_type = group_classify(sd->imbalance_pct, group, sgs);
-
- /*
- * Computing avg_load makes sense only when group is fully busy or
- * overloaded
- */
- if (sgs->group_type == group_fully_busy ||
- sgs->group_type == group_overloaded)
- sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
- sgs->group_capacity;
+ update_sg_stats(sgs, group, &stat_env);
}
static bool update_pick_idlest(struct sched_group *idlest,
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-17 6:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
2025-07-17 6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
2025-07-17 6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
2025-07-17 6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
2025-07-17 6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).